Although common on x86, it was initially believed that it was not possible to make alphanumeric shellcode for ARM. Later it turned out it was.
Similar to that, I wondered if it was possible to make alphanumeric shell-code for RISC-V.
(Basic shellcode in RISC-V Linux provides a good introduction to shellcode for RISC-V, including how to avoid NUL bytes.)
First, I enumerated all the possible instructions that could be formed from these characters with a little Rust program and generated some statistics.
These are all the valid instructions in the RV32 ISA whose binary representation can be represented using alphanumeric characters ('a'..'z', 'A'..'Z', '0'..'9'), summarized.
These require the 'RV32C' extension.
addi 01xxxxxx0xxxxx01
fld 0011xxxx0xxxxxx0
flw 011xxxxx0xxxxxx0
jal 0011xxxx0xxxxx01
lui 011xxxxx0xxxxx01
lw 010xxxxx0xxxxxx0
Covers 'RV32G' (a shorthand for 'RV32IMAFD').
bgt 0xxxxxxx0xxxxxxx0100xxxx01100011
bgtu 0xxxxxxx0xxxxxxx0110xxxx01100011
bgtz 0xxxxxxx0xx100000100xxxx01100011
ble 0xxxxxxx0xxxxxxx0101xxxx01100011
bleu 0xxxxxxx0xxxxxxx0111xxxx01100011
blez 0xxxxxxx0xx100000101xxxx01100011
csrrc 0xxxxxxx0xxxxxxx0011xxxx01110011
csrrci 0xxxxxxx0xxxxxxx0111xxxx01110011
csrrsi 0xxxxxxx0xxxxxxx0110xxxx01110011
csrrwi 0xxxxxxx0xxxxxxx0101xxxx01110011
fcvt.d.q 010000100011xxxx0xxxxxxx01010011
fmadd.d 0xxxx01x0xxxxxxx0xxxxxxx01000011
fmadd.q 0xxxx11x0xxxxxxx0xxxxxxx01000011
fmadd.s 0xxxx00x0xxxxxxx0xxxxxxx01000011
fmsub.d 0xxxx01x0xxxxxxx0xxxxxxx01000111
fmsub.q 0xxxx11x0xxxxxxx0xxxxxxx01000111
fmsub.s 0xxxx00x0xxxxxxx0xxxxxxx01000111
fnmadd.d 0xxxx01x0xxxxxxx0xxxxxxx01001111
fnmadd.q 0xxxx11x0xxxxxxx0xxxxxxx01001111
fnmadd.s 0xxxx00x0xxxxxxx0xxxxxxx01001111
fnmsub.d 0xxxx01x0xxxxxxx0xxxxxxx01001011
fnmsub.q 0xxxx11x0xxxxxxx0xxxxxxx01001011
fnmsub.s 0xxxx00x0xxxxxxx0xxxxxxx01001011
j 0xxxxxxx0xxxxxxx0xx1000001101111
jal 0xxxxxxx0xxxxxxx0xxxxxxx01101111
lui 0xxxxxxx0xxxxxxx0xxxxxxx00110111
sra 010000010xxxxxxx0101xxxx00110011
The reachable instructions on RISC-V 64 bit are the same except for patterns in the compressed instructions space which have a different meaning:
addi 01xxxxxx0xxxxx01
+addiw 0011xxxx0xxxxx01
fld 0011xxxx0xxxxxx0
-flw 011xxxxx0xxxxxx0
-jal 0011xxxx0xxxxx01
+ld 011xxxxx0xxxxxx0
lui 011xxxxx0xxxxx01
lw 010xxxxx0xxxxxx0
My first response was "oh no! no store instructions", and although there are some instructions to get values into registers, and some to make jumps, that leaves no way
- to do a system call directly—needs 0x73 and 0x00
- to write a 'ecall' instruction on the stack to jump to
Approaches that might work, are:
- if an existing offset of an ecall instruction is known it can jump there!
- in a similar way, if it knows the offset of a memory write+ret, it could use that
This goes into Return Oriented Programming (ROP) territory, this means that the resulting shell code can never be self-contained but depends on the host application.
That's disappointing! Maybe someone has a better idea, some trick up their sleeve, but I don't see it.