Using RISC-V JIT on LLVM 14

January 26, 2023 ~ gmarkall ~ Leave a comment

One-line Summary: RISC-V JIT on LLVM 14 can be made to work well with some additional patches – use this branch if you want to do so: https://github.com/gmarkall/llvm-project/commits/riscv-llvm-14

Whilst working on ORCJITv2 support for llvmlite and testing on RISC-V, I noticed that JIT works on RISC-V in LLVM 15 out of the box, but it didn’t seem to work on LLVM 14 at all. For example, even the HowToUseLLJIT example fails:

$ ./bin/HowToUseLLJIT
Unsupported CPU type!
UNREACHABLE executed at 
  llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp
    :1069!
PLEASE submit a bug report to <url>
and include the crash backtrace.

Stack dump:
0.      Program arguments: ./bin/HowToUseLLJIT
#0 0x0000002ace21364e PrintStackTraceSignalHandler(void*) 
                      Signals.cpp:0:0
Aborted

I assumed that perhaps some of the pieces necessary for JIT on RISC-V made it into LLVM 14, but not enough for complete operation. Whilst researching ORCJITv2 I came across a Mesa llvmpipe MR that adds support for ORCJITv2 and RISC-V. In amongst the discussion are some comments suggesting that it was being tested and appeared to work on RISC-V with LLVM 14 (the latest version at the time). How could this be?

One possible option I came across in the LLVM Phabricator was a diff adding minimal RISC-V support to RuntimeDyld – this seems to make sense because the issue in the trace above comes from RuntimeDyld. However, the diff wasn’t merged.

There are two JIT linkers in LLVM – RuntimeDyld (used as a component with MCJIT) and JITLink (used with ORC’s ObjectLinkingLayer). It even looked like support for RISC-V had been added to JITLink in this commit from 2021. So in the HowToUseLLJIT example above, why are we seeing errors from RuntimeDyld?

It turned out that LLJIT was not using JITLink by default on RISC-V – this later commit changed LLJIT so that it would use JITLink on RISC-V. So if we apply that change on top of an LLVM 14 release and rebuild, what happens when we run HowToUseLLJIT?

$ ./bin/HowToUseLLJIT
add1(42) = 43

Success!

Well, it looks like success. Using JITLink on RISC-V in LLVM 14 covers this simple “Hello World”-type use case, but falls apart for more complex operations – I tried running the llvmlite test suite and got:

$ python -m unittest \
    llvmlite.tests.test_binding.TestOrcLLJIT
......JIT session error: Unsupported riscv relocation:52
Fatal Python error: Aborted

So it “works” but doesn’t support all of the necessary relocations. Between LLVM 14 and 15 there were a few patches towards correcting and increasing support for RISC-V relocations:

[JITLink] Fix the incorrect relocation behavior for R_RISCV_BRANCH
[JITLink][RISCV] fix the extractBits behavior and add R_RISCV_JAL relocation.
[JITLink] Add R_RISCV_SUB6 relocation
Some changes that aren’t direct fixes for anything, but make it easier to apply later patches and debug issues:
[JITLink][RISCV] Ignore R_RISCV_RELAX and check R_RISCV_ALIGN

If we further apply these on our LLVM 14 branch and rebuild, then when re-testing llvmlite we now get:

$ python -m unittest \
    llvmlite.tests.test_binding.TestOrcLLJIT
........JIT session error: Symbols not found:
                           [ Py_GetVersion ]
...EE.....
=====================================================
ERROR: test_object_cache_getbuffer
       (llvmlite.tests.test_binding.TestOrcLLJIT)
------------------------------------------------------
Traceback (most recent call last):
  File "llvmlite/tests/test_binding.py", 
        line 1265, in test_object_cache_getbuffer
    lljit.set_object_cache(notify, getbuffer)
  File "llvmlite/binding/orcjit.py",
        line 86, in set_object_cache
    ffi.lib.LLVMPY_SetObjectCache(self, self._object_cache)
  File "llvmlite/binding/ffi.py", line 153,
        in __call__
    return self._cfn(*args, **kwargs)
ctypes.ArgumentError: argument 1: <class 'TypeError'>: 
    expected LP_LLVMExecutionEngine instance instead of
    LP_LLVMOrcLLJITRef

======================================================
ERROR: test_object_cache_notify
       (llvmlite.tests.test_binding.TestOrcLLJIT)
------------------------------------------------------
Traceback (most recent call last):
  File "llvmlite/tests/test_binding.py",
        line 1234, in test_object_cache_notify
    lljit.set_object_cache(notify)
  File "llvmlite/binding/orcjit.py",
        line 86, in set_object_cache
    ffi.lib.LLVMPY_SetObjectCache(self, self._object_cache)
  File "llvmlite/binding/ffi.py",
        line 153, in __call__
    return self._cfn(*args, **kwargs)
ctypes.ArgumentError: argument 1: <class 'TypeError'>:
    expected LP_LLVMExecutionEngine instance instead of
    LP_LLVMOrcLLJITRef

------------------------------------------------------
Ran 18 tests in 1.160s

FAILED (errors=2)

Everything bar a couple of items is working – the error messages we see are due to llvmlite:

Support for the object cache is not yet implemented – it can be seen from the the traceback that I’m passing LLJIT objects into functions that are expecting MCJIT ExecutionEngines.
The JIT Session Error mentioning a missing Py_GetVersion symbol is emitted during a test of an error condition – it’s expected that linking to Py_GetVersion should fail in that test, but I haven’t understood how to suppress the emission of that error message.

If you want to use JIT on RISC-V in LLVM 14, you can grab my branch with all the fixes above applied:

https://github.com/gmarkall/llvm-project/commits/riscv-llvm-14

GCC and LLVM builds on the VisionFive 2 Debian Image

January 21, 2023 ~ gmarkall ~ Leave a comment

I’ve been experimenting with the performance and stability of the VisionFive 2 by building GCC and LLVM on the Debian Image provided by StarFive (Google Drive link). I found that:

It is quite fast (relative to my previous toolchain work done under qemu) – I could build LLVM in under 4 hours.
It seems rock-solid – I have kept all 4 cores at 100% for hours on end, with no apparent issues. Running the GCC testsuite identified no issues.

Overall I’m really impressed with the board and I think it’s a great platform for RISC-V toolchain development.

This post outlines:

What I did to build the toolchains, and
Some measurements and observations.

LLVM

I built LLVM first because I’m actually going to need to use it to work on Numba. I used the LLVM 15.0.6 sources (the latest release version at the time of writing). First I installed dependencies:

sudo apt install \ 
  cmake ninja-build chrpath texinfo sharutils libelf-dev \
  libffi-dev lsb-release patchutils diffstat xz-utils \
  python3-dev libedit-dev libncurses5-dev swig \
  python3-six python3-sphinx binutils-dev libxml2-dev \
  libjsoncpp-dev pkg-config lcov procps help2man \
  zlib1g-dev libjs-mathjax python3-recommonmark \
  doxygen gfortran libpfm4-dev python3-setuptools \
  libz3-dev libcurl4-openssl-dev libgrpc++-dev \
  protobuf-compiler-grpc libprotobuf-dev \
  protobuf-compiler

I would usually just use:

sudo apt build-dep llvm-15

(or something like that) but I couldn’t easily work out how to add a deb-src line to the APT sources that actually worked, so I derived that list of packages by looking at the source package instead. Following dependency installation, I configured for RISC-V only (because I am mainly interested in working on JIT compilation) a release build with assertions – I wasn’t sure if I had enough disk space for a debug build, and release with assertions is a reasonable compromise:

mkdir build 
cd build 
cmake ../llvm -G Ninja \
              -DCMAKE_INSTALL_PREFIX=/data/opt/llvm/15.0.6 \
              -DLLVM_TARGETS_TO_BUILD=RISCV \
              -DCMAKE_BUILD_TYPE=Release \
              -DLLVM_ENABLE_ASSERTIONS=ON 
ninja -j 3
ninja install

I had to use 3 cores because 8GB RAM is not enough to link without running out of RAM when buiding with 4 cores. I was using ld from binutils, and I wonder if using an alternative linker would be more efficient – I need to check whether gold, mold, or lld support RISC-V first.

The build was pretty fast:

real	230m55.985s
user	671m8.705s
sys	    20m51.997s

The build completed in just under 4 hours, which for an SBC is pretty good – it’s quick enough for it not to be a complete pain to build toolchains. Compared to an emulated system I was using before where I had to wait all day even though I was emulating 8 cores on a Xeon Gold 6128, it’s blazing fast!

Since I built LLVM I’ve been using it for llvmlite development without issues. llvmlite is primarily for compiling and JITting code – LLVM JIT works “out of the box” in LLVM on RISC-V (HowToUseLLJIT is an example program included with the LLVM source distribution that demonstrates how to use the ORCv2 LLJIT class):

$ ninja HowToUseLLJIT
[1/1] Linking CXX executable bin/HowToUseLLJIT
$ ./bin/HowToUseLLJIT
add1(42) = 43

GCC

Next I tried building GCC 12.2 – this was mainly a curiosity for me because I don’t have a real use for it right now (other than the generally good “principle” of using up-to-date software), but I’ve spent a lot of time working on GCC-based toolchains, particularly for RISC-V, so it seemed interesting. I installed a few dependencies – possibly not all of these are required, but I was pretty certain this would cover everything needed:

sudo apt install \
  libc6-dev m4 libtool gawk lzma xz-utils patchutils \
  gettext texinfo locales-all sharutils procps \
  dejagnu coreutils chrpath lsb-release time pkg-config \
  libgc-dev libmpfr-dev libgmp-dev libmpc-dev \
  flex yacc bison

Then to configure and build GCC:

mkdir build-gcc-12.2.0
cd build-gcc-12.2.0
../gcc/configure \
  --enable-languages=c,c++,fortran \
  --prefix=/data/gmarkall/opt/gcc/12.2.0 \
  --disable-multilib \
  --with-arch=rv64gc \
  --with-abi=lp64d
make -j4

There’s one problem building GCC with the Debian image on the VisionFive 2, which is that the GCC RISC-V target seems not to be correctly set up for Debian multiarch – building will eventually fail with a message like:

/usr/include/stdio.h:27:10: fatal error:
  bits/libc-header-start.h: No such file or directory
  27 | #include <bits/libc-header-start.h>
     |          ^~~~~~~~~~~~~~~~~~~~~~~~~~

as it fails to find headers because it’s looking in the wrong location – /usr/include instead of the architecture-specific /usr/include/riscv64-linux-gnu.

The stage 1 xgcc doesn’t seem to know about multiarch:

$ ./gcc/xgcc -print-multiarch
# (no output)

A small patch works around this:

diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
index 216d2776a18..f714026b3cc 100644
--- a/gcc/config/riscv/t-linux
+++ b/gcc/config/riscv/t-linux
@@ -1,3 +1,4 @@
 # Only XLEN and ABI affect Linux multilib dir names, e.g. /lib32/ilp32d/
 MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst rv64%,lib64,$(MULTILIB_DIRNAMES)))
 MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
+MULTIARCH_DIRNAME = $(call if_multiarch,riscv64-linux-gnu)

Running the build again after making this change succeeds. Also, xgcc now knows about multiarch:

$ ./gcc/xgcc -print-multiarch
riscv64-linux-gnu

I think I need to report this and maybe submit a proper patch upstream – the above patch is good enough to work around the issue for building now, but isn’t a general solution – for example, it’s guaranteed not to work on a riscv32 system!

The build takes a bit longer than LLVM:

real 331m22.192s
user 1192m59.381s
sys 24m57.932s

At first I was surprised that it took longer than LLVM, but it has the bootstrapping process to go through, which goes some way towards increasing the build time.

I also ran the GCC test suite:

make check -j 4

Which takes a little while, but is worth the wait:

real    245m35.888s 
user    773m59.513s
sys     133m17.827s

The results are not 100% passing, but look pretty good:

		=== gcc Summary ===

# of expected passes		138016
# of unexpected failures	461
# of unexpected successes	4
# of expected failures		1035
# of unresolved testcases	48
# of unsupported tests		2888

		=== g++ Summary ===

# of expected passes		221581
# of unexpected failures	544
# of unexpected successes	4
# of expected failures		1929
# of unresolved testcases	40
# of unsupported tests		10418

        === gfortran Summary ===

# of expected passes		65099
# of unexpected failures	13
# of expected failures		262
# of unsupported tests		165

None of the problems look like any kind of real issue – many asan tests failed, which I guess might be due to lack of support (or it not having been enabled correctly by default) and many other issues are checks that scan the output for particular patterns (which can be fragile) or tests for excess error messages / warnings. I’d have to look in depth to be certain, but I have high confidence that all is well here.

Conclusion

Building GCC and LLVM on the VisionFive 2 was relatively pain-free, especially for a very new SBC with a relatively new software ecosystem. If you’re into RISC-V toolchain development and want to work on a native system (as opposed to e.g. using qemu under linux, or targeting very small embedded systems that need cross compilation anyway), it’s a great choice!

I’m expecting to have a productive time working on Numba and llvmlite for RISC-V with this board.

VisionFive 2 hardware setup and costings

January 5, 2023January 5, 2023 ~ gmarkall ~ Leave a comment

StarFive’s VisionFive 2 is a quad-core riscv64 board with up to 8GB of RAM. I bought one to work one RISC-V support in Numba. There’s some choices to make when buying the board and accessories – this post describes my setup and the rationale behind my decisions. I’ve also provided costings and links to purchase at the end, to aid anyone wanting a similar setup (especially in the UK).

Board: Super Early Bird (Rev 1.2A) with 8GB of RAM and WiFi

The VisionFive 2 comes in 2GB, 4GB, and 8GB variants. I picked 8GB because compiling and linking LLVM, which is used by Numba, is quite memory-intensive. Mine is a Super Early Bird version, which has one 10/100Mbit eithernet port and one gigabit ethernet port. Later versions will have two gigabit ports, but I don’t need two gigabit ports, and I didn’t want to wait longer to start working with the board.

There is also an option to include a supported WiFi adapter (based on the ESWIN 6600U) – I took this as it only adds a small amount to the price and it could come in handy in future. I’m not using it right now because it seems to not work out of the box with the StarFive Debian build, and it’s not essential for me.

I ordered from the WayPonDEV store on Amazon on December 16th, and it arrived on the 29th!

NVMe SSD: Kioxia Exceria 500GB

Nothing special about this – it was a cheap NVMe SSD with decent capacity. It works fine in the VF2. I also tried a Samsung SSD 980 temporarily (to make sure the M.2 slot worked before I ordered the Kioxia) that I borrowed from my Jetson AGX Xavier.

I had previously tried an old Intel SSD with a SATA interface, which did not work – as another user mentioned on the forum, the B/M keyed SATA SSD drives are not compatible.

I ordered the drive from Scan UK.

Heatsink and fan: ODroid XU4 heatsink / fan

It doesn’t seem to be easy to locate a compatible heatsink and fan, but I noticed from a mechanical drawing of the ODroid XU4 that it should use a heatsink with the same dimensions and hole distance. I ordered from the ODroid UK store. This ended up being quite pricey, but had the advantage of actually being obtainable in the UK within a couple of days.

One difference between the XU4 and the VF2 is the size of the fan connector – it is much larger on the VF2.

ODroid XU4 fan alongside connector compatible with the VisionFive 2

I had a spare connector of the correct size handy, so I was able to replace it. After unscrewing the fan and peeling back the sticker, the solder points for the cable are visible:

Then the heatsink can be fitted to the board. I applied thermal paste and pushed the pins through. I did this with the SSD removed, because one of the through-holes is beneath the SSD, and the pin suddenly pushing through could hit it a bit hard.

Finally I added the fan back and plugged it in:

Power supply: Lenovo 65W USB-C

I didn’t have a PSU ready when I got the board so I borrowed the one from my laptop. It’s been rock-solid even under heavy load with this PSU, so rather than taking a risk with some other power supply or Pi supply, I decided to just order another of the same model from the Lenovo UK store.

I note that the VF2 Data Sheet (Section 4.1) states that at least 9V / 2A is required, which further made me nervous about using a regular Pi supply. However, the Quick Start Guide (Section 1.2) says it will take “5V up to 30W (minimum 3A)” – I’m not quite sure how to interpret that though!

Costings / purchase links

None of the links are affiliate links – they are simply provided for convenience

VisionFive 2 Super Early Bird 8GB with WiFi:
- $143.97 with shipping and taxes, came to £117.86
- Item link: https://www.amazon.com/dp/B0BGM1KQXQ?psc=1&ref=ppx_yo2ov_dt_b_product_details
SSD:
- £34.47 including shipping
- Item link: https://www.scan.co.uk/products/500gb-kioxia-exceria-m2-2280-pcie-30-x4-nvme-ssd-1700mb-s-read-1600mb-s-write-350k-400k-iops
Heatsink and fan:
- £29.24 after shipping and low order fee added
- Although this seems a bit pricey, I could quickly have spent hours searching and days waiting for something else, which wouldn’t be worth the saving in the end.
- Item link: https://www.odroid.co.uk/odroid-accessories/odroid-cooling/odroid-xu4-blue-fan
Power supply:
- £22.74 including delivery
- Item link: https://www.lenovo.com/gb/en/p/accessories-and-software/chargers-and-batteries/chargers/4x20m26276

Total: £204.31 – Not bad for a RISC-V SBC with this much power and potential, in my opinion.

Continuing…

I’ve been using this setup for building GCC and LLVM toolchains. I hope to write about that in a future post, but in the meantime I’m posting progress updates and other notes on my Mastodon account: https://mastodon.social/@gmarkall

Mug

April 30, 2022 ~ gmarkall ~ Leave a comment

I use this mug, made by Janet Hadley, every day.

Do you have a favourite mug?

How do we describe the nature of Open Source Software (OSS) contributions?

April 29, 2022 ~ gmarkall ~ Leave a comment

Background thinking around this question: There are instances of OSS users levelling expectations at OSS maintainers that they provide a certain level of service, whilst OSS maintainers generally have no obligations to users. Jacob Tomlinson provides a good summary and discussion of this issue in “Don’t be that open-source user, don’t be me”.

Existing terms: The term voluntary is often used to describe OSS contributions, and contributors referred to as volunteers. In particular, these terms are sometimes used to remind users that maintainers aren’t generally obligated to do anything in particular. However, I find that these terms don’t always accurately convey the nature of contributions, particularly when contributors are employed to work on them – the idea of a volunteer can carry the connotation that any contributions are made in free time, or that no reward is being received for them.

An alternative: Discretionary contributions are those that are left to the maintainer’s choice or judgment – there may be remuneration for their work, but in terms of the relationship between the user and the maintainer, there is no particular obligation. In these cases, is discretionary contribution a more appropriate term than voluntary contribution?

The power of leadership

March 14, 2021 ~ gmarkall ~ Leave a comment

You receive a fortune cookie.

Break it open.

Inside:

Acorn: a world in pixels

March 7, 2021 ~ gmarkall ~ Leave a comment

I saw Acorn: a world in pixels and had to buy it immediately:

Just bought one, will be barking at the postman every morning until it arrives! https://t.co/8iqLc1PkD0
— Graham Markall (@gmarkall) January 30, 2021

Purchase made, approximately 2 hours after availability announced

A few days later it turned up, packaged for pristine condition:

Sturdy cardboard, foam-wrapped, and corners protected

A shiny sleeve encapsulates:

The sleeve design matches the cover design

Inside the covers:

Hundreds of pages of interviews, images, descriptions, and data:

Every page is full of detail and crafted with care.

It truly is a labour of love.

Empower the User

May 10, 2020May 10, 2020 ~ gmarkall ~ Leave a comment

This popped up on Twitter:

There are a lot of funny replies, but my serious attempt to answer it would be: Empower the User – because that is who the software is for. Empowering the user should be the primary concern of software engineering, and often it isn’t.

I’ve been thinking about my long-term career goal a lot lately. I’ve concluded it is:

Make powerful software tools accessible to as many people as possible

Empowering the user is central to this goal, which I settled on because it’s what I enjoy doing. When I see that something I’ve built, managed, designed, or otherwise had input in helps someone get something done more easily, efficiently, or in a novel way, I get some intrinsic pleasure.

A friend who works in ERP told me the mantra in manufacturing is “Genchi Genbutsu”, which he characterised as “be where your users are”. There are other interpretations of the mantra (see the Wikipedia article), but I like the concept of understanding users by being with them. Understanding their needs is crucial to building empowering tools.

The Kitchen Craft Le’Xpress Coffee Grinder

April 27, 2020 ~ gmarkall ~ Leave a comment

If you want to cut down your coffee consumption, this is the grinder for you.

It works perfectly well – the grind is fine and consistent, and the screw that locks the handle stays put throughout the grinding process.

As long as you can hold on to its tiny handle and work it round for ten minutes, you can get enough ground to make a whole cup of coffee. Your hands will be sore from the handle digging into them, and your arms will be tense from many tiny, stiff cycles. You might be short of breath. Perhaps you knocked a few things onto the floor when you inevitably slipped. But success is possible, and you will know that you earned each brew.

The mechanism doesn’t disassemble completely, so it is hard to clean and dry. I usually leave it in the airing cupboard after cleaning it.

If you like coffee and you like a struggle, then you can pick up one of these arduous grinders for £7 from John Lewis.

A sharp-angled handle ready to blister your fingers

The small fruits of hard labour sit in the chamber

Linker scripting to compute addresses in shellcode

March 2, 2020March 8, 2020 ~ gmarkall ~ Leave a comment

When working on Lockbox, a demonstration application for compiler-based Stack Erase (a security measure – talk / slides), I wanted to be able to demonstrate in an obvious way that code execution on the device has been obtained. One way to do this is to write a shellcode that prints a message to the LCD display. This can be done with just a few instructions, because we can reuse functions in the program being exploited. All the shellcode needs to do is to call the LCD printing functions with suitable parameters – a pointer to the LiquidCrystal instance, and arguments to the method being called. Some RISC-V assembly code that does this is:

shellcode:
  # Call lcd.setCursor(int row, int column)
  li a2, 0           ; Set column = 0
  li a1, 0           ; Set row = 0
  li a0, 0x80000444  ; LiquidCrystal instance (the "this" pointer)
  li a5, 0x20401E08  ; LiquidCrystal.setCursor() code address
  jalr ra, a5        ; Call function with args in registers

  # Call lcd.print(const char* str)
  li a0, 0x80000444  ; LiquidCrystal instance
  lla a1, pwstr      ; Load address of the string to print
  li a5, 0x20402c62  ; LiquidCrystal.print() code address
  jalr ra, a5        ; Call function with args in registers
pwstr:
  .ascii "pwned\n"   ; A string to be printed

We can assemble this code:

riscv64-unknown-elf-as -march=rv32imac shellcode.s -o shellcode.o

but the assembler output cannot be used as-is – if we dump the assembled code using objdump (with -d for disassemble, and -r for printing relocations):

riscv64-unknown-elf-objdump -dr shellcode.o

Then we get the following disassembly (edited slightly for clarity):

00000000 <shellcode>:
   0:	4601                	li	a2,0
   2:	4581                	li	a1,0
   4:	80000537          	lui	a0,0x80000
   8:	44450513          	addi	a0,a0,1092 # 80000444
   c:	204027b7          	lui	a5,0x20402
  10:	e0878793          	addi	a5,a5,-504 # 20401e08
  14:	000780e7          	jalr	a5
  18:	80000537          	lui	a0,0x80000
  1c:	44450513          	addi	a0,a0,1092 # 80000444
  20:	00000597          	auipc	a1,0x0
			20: R_RISCV_PCREL_HI20	pwstr
			20: R_RISCV_RELAX	*ABS*
  24:	00058593          	mv	a1,a1
			24: R_RISCV_PCREL_LO12_I	.L0 
			24: R_RISCV_RELAX	*ABS*
  28:	204037b7          	lui	a5,0x20403
  2c:	c6278793          	addi	a5,a5,-926 # 20402c62
  30:	000780e7          	jalr	a5

00000034 <pwstr>:
  34:	7770656e0a64            "pwned\n"

The instructions that refer to absolute addresses, like 0x80000444 (for the LiquidCrystal instance) appear in the disassembly, and are part of the encoding of the instruction. However, the address for pwstr appears to be 0x0 in the disassembly – why?

Relocations

The address of pwstr will depend on where the code will eventually be loaded, and the assembler does not know where this will be – we don’t have this problem for the LiquidCrystal instance and code, because their addresses are absolute in the already-linked Lockbox program.

In a normal build process, the assembler generates machine code with placeholders where the linker needs to insert addresses once it has determined them (a.k.a Relocations). During linking, the linker places all symbols in the output object, which assigns them addresses in memory. It can then fix up the relocations using these actual addresses.

In the above dump, R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_I are the relocation types that are used for the address of pwstr – on RISC-V, a 32-bit address is loaded as a combination of two instructions, first loading the upper 20 bits into a register then adding the lower 12 bits. The different relocation types specify different methods for calculating the bits that represent the target address to insert into the instruction encoding. For RISC-V, all relocations are defined in the RISC-V ELF psABI document.

I could have manually worked out what bits needed placing, and hexedited them into the object code. I’d have needed to calculate the distance of the pwstr constant from the PC value when the lla a1, pwstr instruction executes, split out the top 20 and bottom 12 bits, and insert them into the encoded instruction. I wasn’t very keen on doing this, because:

It would be a little bit fiddly,
I’d have to manually re-do it every time I changed the shellcode (I didn’t get the above code right first time – it took a few attempts and tweaks), and
I’d probably make a mistake 50% of the time I did it by hand anyway.

Automation with the linker

So, rather than struggle with this, the GNU Linker can be used to automate this task, because it is controlled by a script that tells it how to place symbols – the purpose of the script is to enable it to emulate a wide range of different linkers, but we can also write a script to automate locating shellcode.

I knew that the exploit would always place shellcode at 0x80003fb, so the trick was to write a linker script that would accept the shellcode, and place it at this location. Then, the output object would have the correct value to load pwstr, and would be ready to write into the target device’s memory without further modification. A linker script that accomplishes this is:

OUTPUT_ARCH( "riscv" )

ENTRY( _start )

MEMORY
{
  flash (rxai!w) : ORIGIN = 0x80003fb0, LENGTH = 512M
}

PHDRS
{
  flash PT_LOAD;
  ram_init PT_LOAD;
  ram PT_NULL;
}

SECTIONS
{
  .text           :
  {
    *(.text .text.*)
  } >flash AT>flash :flash

}

This is a fairly minimal example of a linker script – all it does is define a block of memory (flash) beginning at 0x80003fb0 (ORIGIN = 0x80003fb0), and places all data from the .text section into this memory (the directives inside the SECTIONS command). It doesn’t matter that the memory begins elsewhere (e.g. 0x80000000) with other code and data before where we start linking, because we only need to compute relocations within the shellcode.

When the object code is linked with this script and we disassemble the linked object:

riscv64-unknown-elf-ld -melf32lriscv -Tshellcode.ld shellcode.o \
                       -o shellcode.exe
riscv64-unknown-elf-objdump -dr shellcode.exe

then we see the instructions for loading the address of pwstr (at 0x80003fd0 and 0x80003fd4) have correct values:

... earlier output omitted ...
80003fd0:  00000597  auipc   a1,0x0
80003fd4:  01458593  addi    a1,a1,20 # 80003fe4 <pwstr>
80003fd0:  00000597  auipc   a1,0x0
80003fd4:  01458593  addi    a1,a1,20 # 80003fe4 <pwstr>
80003fd8:  204037b7  lui     a5,0x20403
80003fdc:  c6278793  addi    a5,a5,-926 # 20402c62
80003fe0:  000780e7  jalr    a5

80003fe4 <pwstr>:

Explanation: the address of pwstr is 20 bytes on from the PC when the auipc instruction executes (0x80003fe4 – 0x80003fd4) so 0 needs adding to the upper 20 bits of the PC, and 20 needs adding to the lower 12 bits.

Not also how in the linked binary, the addresses of all instructions are known, and displayed in the dump. In the original disassembly above, all addresses are relative to the beginning of the object.

Success!

After extracting the .text section from the linked binary and delivering it to the device with the exploit, the result is visible on the display:

Unfortunately I forgot to zero-terminate the string, so there is an “artifact” in the printing – the symbol that looks like two fishes.

Thoughts

The method described above is one way to automate the location of shellcode in a binary – there may be better (perhaps less obtuse, easier to implement) ways to do this for some platforms – I’m not familiar with these methods, as I have limited expertise in shellcoding – I would be interested in hearing if there are better ways of doing things!

That said, this technique can be used on any platform supported by the GNU linker, of which there are quite a few – a list of them can be seen in the directory containing the emulation templates in the source repository – these include (to choose a random few, amongst others) AArch64, Motorola 68000, MIPS, PowerPC, Xtensa, X86, etc… So perhaps if you’re working with a platform with little other tooling (especially for reverse engineering / exploitation), the linker scripting technique may be useful.