Hello all, I’m Jiaxiang Zhou from China. I was lucky to be selected as a participant of Radare2 project this year. My main work was to integrate SLEIGH as a disassembly backend into Radare2. r2ghidra-dec was my main working repository, aiming to delivering Ghidra’s decompiler to Radare2. It could be renamed as
r2ghidra since it would become not only a decompiler but a complete bridge between Radare2 and Ghidra after this project.
Here’s the slides made for r2con2020.
SLEIGH disassembler has been deeply embedded into C++ codebase of decompiler. So the solution is clear:
To get full access of
Sleigh and low level spec file and interfaces, I implemented a class(
SleighAsm) just like lite version of
Architecture . This class will export P-codes and registers’ info parsed from spec file. It enable us to disassemble all valid instructions on demand:
SLEIGH will give out P-codes as IR to describe what instruction does. So I had to analysis on P-codes to extract control flow, type info. When it came to ESIL, things got more tricky because P-code’s model and ESIL are different. What’s more, P-codes support float number operation, which ESIL doesn’t.
Ghidra’s C++ codebase concentrate on decompiler, so it focus on function-level analysis. There’s classes like
Funcdata to analysis intra-function flow. But instruction-level analysis tool only exists in JAVA codebase. So I had to port
SleighInstructionPrototype from JAVA to C++. This enables control flow info extraction on
Constructors (lower than P-code). This port work was tough to minimize the changes on original Ghidra codebase. And I eventually managed to port whole
SleighInstructionPrototype with only two private fields exported!
diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/context.hh b/Ghidra/Features/Decompiler/src/decompile/cpp/context.hh
And this is control flow extracted only from P-code results:
Ghidra’s emulate system based on P-code is quite different from ESIL. First, it’s not stack-based, and it can randomly access to any middle variables.
Here’s my design:
PICKto take values from stack to stack top
I will leave middle variables intentionally on stack and retrieve them when this middle variable is needed.
GETto retrieve register’s value to stack
Sometimes register’s name is just one char, ESIL will confuse if it’s compared with a number. Most importantly, I need
GETto retrieve float number to stack without changing ESIL’s original codes in Radare2.
=to store value from stack back to register
This is to pair with
GETfor float number handling. You will notice that all element except register(used as destination) are immediate value.
to read float number from memory
When a float number is written into a register/memory location, I will record its register name/memory address to track until it’s overwritten by anything except float number.
=to store float number to memory
Add serials of float operation.
When real translation is running, plugin will employ a stack to emulate middle variables left on stack. This will help calculate offset of middle variables(unique varnodedata) used as argument of current P-code.
SLEIGH only provide P-codes. But P-codes doesn’t tell what the instruction is. And the multi-arch support of SLEIGH make thing even more complex.
I made an overview on all
R_ANAL_OP_TYPE_* and summary their patterns. Hope to do the pattern match based on P-codes and know what type the instruction is. Sounds crazy, but I not only typed instructions successfully, I also managed to recover arguments of associated instructions!
RAsm and RAnal plugins are both workable. Ready to provide information recovered from Ghidra’s SLEIGH disassembler.