One-line Summary: RISC-V JIT on LLVM 14 can be made to work well with some additional patches – use this branch if you want to do so: https://github.com/gmarkall/llvm-project/commits/riscv-llvm-14
Whilst working on ORCJITv2 support for llvmlite and testing on RISC-V, I noticed that JIT works on RISC-V in LLVM 15 out of the box, but it didn’t seem to work on LLVM 14 at all. For example, even the HowToUseLLJIT
example fails:
$ ./bin/HowToUseLLJIT
Unsupported CPU type!
UNREACHABLE executed at
llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
:1069!
PLEASE submit a bug report to <url>
and include the crash backtrace.
Stack dump:
0. Program arguments: ./bin/HowToUseLLJIT
#0 0x0000002ace21364e PrintStackTraceSignalHandler(void*)
Signals.cpp:0:0
Aborted
I assumed that perhaps some of the pieces necessary for JIT on RISC-V made it into LLVM 14, but not enough for complete operation. Whilst researching ORCJITv2 I came across a Mesa llvmpipe MR that adds support for ORCJITv2 and RISC-V. In amongst the discussion are some comments suggesting that it was being tested and appeared to work on RISC-V with LLVM 14 (the latest version at the time). How could this be?
One possible option I came across in the LLVM Phabricator was a diff adding minimal RISC-V support to RuntimeDyld – this seems to make sense because the issue in the trace above comes from RuntimeDyld. However, the diff wasn’t merged.
There are two JIT linkers in LLVM – RuntimeDyld (used as a component with MCJIT) and JITLink (used with ORC’s ObjectLinkingLayer). It even looked like support for RISC-V had been added to JITLink in this commit from 2021. So in the HowToUseLLJIT
example above, why are we seeing errors from RuntimeDyld?
It turned out that LLJIT was not using JITLink by default on RISC-V – this later commit changed LLJIT so that it would use JITLink on RISC-V. So if we apply that change on top of an LLVM 14 release and rebuild, what happens when we run HowToUseLLJIT
?
$ ./bin/HowToUseLLJIT
add1(42) = 43
Success!
Well, it looks like success. Using JITLink on RISC-V in LLVM 14 covers this simple “Hello World”-type use case, but falls apart for more complex operations – I tried running the llvmlite test suite and got:
$ python -m unittest \
llvmlite.tests.test_binding.TestOrcLLJIT
......JIT session error: Unsupported riscv relocation:52
Fatal Python error: Aborted
So it “works” but doesn’t support all of the necessary relocations. Between LLVM 14 and 15 there were a few patches towards correcting and increasing support for RISC-V relocations:
- [JITLink] Fix the incorrect relocation behavior for R_RISCV_BRANCH
- [JITLink][RISCV] fix the extractBits behavior and add R_RISCV_JAL relocation.
- [JITLink] Add R_RISCV_SUB6 relocation
- Some changes that aren’t direct fixes for anything, but make it easier to apply later patches and debug issues:
- [JITLink][RISCV] Ignore R_RISCV_RELAX and check R_RISCV_ALIGN
If we further apply these on our LLVM 14 branch and rebuild, then when re-testing llvmlite we now get:
$ python -m unittest \
llvmlite.tests.test_binding.TestOrcLLJIT
........JIT session error: Symbols not found:
[ Py_GetVersion ]
...EE.....
=====================================================
ERROR: test_object_cache_getbuffer
(llvmlite.tests.test_binding.TestOrcLLJIT)
------------------------------------------------------
Traceback (most recent call last):
File "llvmlite/tests/test_binding.py",
line 1265, in test_object_cache_getbuffer
lljit.set_object_cache(notify, getbuffer)
File "llvmlite/binding/orcjit.py",
line 86, in set_object_cache
ffi.lib.LLVMPY_SetObjectCache(self, self._object_cache)
File "llvmlite/binding/ffi.py", line 153,
in __call__
return self._cfn(*args, **kwargs)
ctypes.ArgumentError: argument 1: <class 'TypeError'>:
expected LP_LLVMExecutionEngine instance instead of
LP_LLVMOrcLLJITRef
======================================================
ERROR: test_object_cache_notify
(llvmlite.tests.test_binding.TestOrcLLJIT)
------------------------------------------------------
Traceback (most recent call last):
File "llvmlite/tests/test_binding.py",
line 1234, in test_object_cache_notify
lljit.set_object_cache(notify)
File "llvmlite/binding/orcjit.py",
line 86, in set_object_cache
ffi.lib.LLVMPY_SetObjectCache(self, self._object_cache)
File "llvmlite/binding/ffi.py",
line 153, in __call__
return self._cfn(*args, **kwargs)
ctypes.ArgumentError: argument 1: <class 'TypeError'>:
expected LP_LLVMExecutionEngine instance instead of
LP_LLVMOrcLLJITRef
------------------------------------------------------
Ran 18 tests in 1.160s
FAILED (errors=2)
Everything bar a couple of items is working – the error messages we see are due to llvmlite:
- Support for the object cache is not yet implemented – it can be seen from the the traceback that I’m passing LLJIT objects into functions that are expecting MCJIT ExecutionEngines.
- The JIT Session Error mentioning a missing
Py_GetVersion
symbol is emitted during a test of an error condition – it’s expected that linking toPy_GetVersion
should fail in that test, but I haven’t understood how to suppress the emission of that error message.
If you want to use JIT on RISC-V in LLVM 14, you can grab my branch with all the fixes above applied:
https://github.com/gmarkall/llvm-project/commits/riscv-llvm-14