Colton1skees/Dna
GitHub: Colton1skees/Dna
Stars: 345 | Forks: 31
# Dna
`Dna` is a static binary analysis framework built on top of LLVM. Notably it's written almost entirely in C#, including managed bindings for LLVM, Remill, and Souper.
# Functionality
`Dna` implements an iterative control flow graph reconstruction inspired heavily by the [SATURN](https://arxiv.org/pdf/1909.01752) paper. It iteratively applies recursive descent, lifting (using remill), and path solving until the complete control flow graph is recovered. In the case of jump tables, we use a recursive algorithm based on `Souper` and z3 to solve the set of possible jump table targets. You can find the iterative exploration algorithm [here](https://github.com/Colton1skees/Dna/blob/e70b48b1da4c9b3666cc2a218138c050ab6f9d8b/Dna.BinaryTranslator/Unsafe/IterativeFunctionTranslator.cs#L48), and the jump table solving algorithm [here](https://github.com/Colton1skees/Dna/blob/master/Dna.BinaryTranslator/JmpTables/Precise/SouperJumpTableSolver.cs#L41).
Once a control flow graph has been fully explored, it can then be recompiled to x86 and reinserted into the binary using the algorithms from [here](https://github.com/Colton1skees/Dna/blob/master/Dna.BinaryTranslator/Safe/SafeFunctionTranslator.cs#L46) and [here](https://github.com/Colton1skees/Dna/blob/master/Dna.BinaryTranslator/Safe/FunctionGroupCompiler.cs#L27). Though the compiled code is not pretty by *any* means, it should run so long as the recovered control flow graph is correct. That being said, it is still a research prototype - bugs and edge cases are expected. Control flow graph exploration may fail in the case of e.g. unbounded jump tables or unliftable instructions.
Some other notable features:
- Supports *most* jump tables, including MSVC's nested or so-called compressed jump tables.
- Supports lifting code with SEH to LLVM IR. When SEH is present, `try`/`catch` statements and `filter` intrinsics are inserted into the control flow graph. Though the recompiler does not (yet) support SEH (the SEH entries are not fixed up), so exceptions will cause crashes.
- Includes a strong API for writing LLVM passes natively in C#. We have bindings for e.g. `MemorySSA`, `LoopInfo`, dominator trees, pass pipeline management, etc.
- Graph visualization for LLVM IR and binary control flow graphs using graphviz or alternatively a script generator for binary ninja.
Some caveats:
- Only x86_64 is supported
- Recompiled code is not CET compliant
# Dependencies
- LLVM/LLVMSharp
- Remill
- Souper
- AsmResolver
- Rivers
Note that `Dna` is currently based on LLVM 17.
# VMProtect
`Dna` contains a VMProtect devirtualization plugin located in `Dna.BinaryTranslator/VMProtect`. See [this PR](https://github.com/Colton1skees/Dna/pull/8) for more info.
# Building
Dna currently targets LLVM 17 and is expected to be built on Windows x64 with Visual Studio 2022.
Build `Dna.LLVMInterop` in **Release** mode; the native dependency tree is Release-built and Debug interop builds are not supported.
## Prerequisites
- Visual Studio 2022 with C++/MSBuild tools
- CMake
- Ninja
- clang-cl / LLVM tools available from the VS toolchain
- Rust/Cargo, for the EqSat simplifier DLL
- .NET SDK 8+
Run the commands below from a VS x64 developer shell, or another shell with the VS C++ tools on `PATH`.
## 1. Build native dependencies
The dependency superbuild installs LLVM 17, Remill, Z3, XED, gflags/glog, and related native libraries into `Dna.LLVMInterop/dependencies/install`.
cmake -S Dna.LLVMInterop/dependencies `
-B Dna.LLVMInterop/dependencies/build `
-G Ninja `
-DCMAKE_BUILD_TYPE=Release `
-DCMAKE_C_COMPILER=clang-cl `
-DCMAKE_CXX_COMPILER=clang-cl
cmake --build Dna.LLVMInterop/dependencies/build
If changing compiler, build type, or CRT settings, delete both `Dna.LLVMInterop/dependencies/build` and `Dna.LLVMInterop/dependencies/install` before reconfiguring.
## 2. Build the Rust simplifier DLL
`Dna.Example` and the simplifier projects copy `eq_sat.dll` from the Cargo release output.
cargo build --manifest-path Simplifier/EqSat/Cargo.toml --release
## 3. Build the solution