# BINocular - 通用二进制分析框架





BINocular 是一个用于静态分析编译二进制文件的 Python 包,
通过通用的 API 层实现。它是不同
反汇编器之间的抽象层,并提供:
- 通用二进制分析原语和概念的反汇编器无关表示
* 汇编指令
* 中间表示(例如 pcode)
* 函数
- 已编译
- 源代码
* 控制流图
- 用于安装支持的反汇编器的 CLI 和 API
- 概念的序列化/反序列化(例如函数、基本块、指令)
- 将对象持久化存储到 SQL 数据库
## 反汇编器后端支持
### [Ghidra](https://www.ghidra-sre.org/)
### [Rizin](https://rizin.re/)
## 安装
`pip install BINocular`
## CLI 使用示例
**列出可安装的 Ghidra 版本**
```
$ binocular install ghidra -l
11.1.1
11.1
11.0.3
11.0.2
11.0.1
11.0
10.4
10.3.3
10.3.2
10.3.1
10.3
```
**通过命令行安装 Ghidra**
```
$ binocular install ghidra -v 11.1 -p ~/Documents/ghidra_install_location
2024-06-15 13:41:04 binocular.ghidra[472653] INFO Installing Ghidra 11.1 to /home/brandon/Documents/ghidra_install_location
2024-06-15 13:41:27 binocular.ghidra[472653] INFO Extracting Ghidra
2024-06-15 13:41:31 pyhidra.javac[472653] INFO WARNING
2024-06-15 13:41:32 pyhidra.launcher[472653] INFO Installed plugin: pyhidra 1.1.0
```
**解析二进制文件并将其加载到 SQLite 数据库**
```
$ binocular parse ./test/example rizin --uri sqlite:///$(pwd)/example.db
2024-06-15 13:46:23 binocular.disassembler[473064] INFO [Rizin] Analyzing test/example
2024-06-15 13:46:23 binocular.disassembler[473064] INFO [Rizin] Analysis Complete: 0.03s
2024-06-15 13:46:23 binocular.disassembler[473064] INFO [Rizin] Binary Data Loaded: 0.00s
2024-06-15 13:46:25 binocular.disassembler[473064] INFO [Rizin] 49 Basic Blocks Loaded
2024-06-15 13:46:25 binocular.disassembler[473064] INFO [Rizin] 18 Functions Loaded
2024-06-15 13:46:25 binocular.disassembler[473064] INFO [Rizin] Function Data Loaded: 2.26s
2024-06-15 13:46:25 binocular.disassembler[473064] INFO [Rizin] Ave Function Load Time: 0.13s
2024-06-15 13:46:25 binocular.disassembler[473064] INFO [Rizin] Parsing Complete: 2.26s
Binary:
Name: example
Arch: x86
Bits: 64
Endian: Endian.LITTLE
SHA256: a7f9141c1781c20d13b8442f24fcddba4b75b4b73ae04e734a92a79fcf0869c3
Size: 18088
Num Functions: 18
Inserting to DB
```
## Python 使用示例
### 在 commit `dee48e9` 处安装 Ghidra
这假设您已经拥有构建 Ghidra 的所有构建依赖项(其他反汇编器同理)。
```
from binocular import Ghidra
install_dir = "./test_install"
if not Ghidra.is_installed(install_dir=install_dir):
# Install Ghidra @ commit dee48e9 if Ghidra isn't installed already
# This make take a while since it does build Ghidra from scratch
Ghidra.install(version='dee48e9', install_dir=install_dir, build=True)
```
### 序列化对象
所有基本原语(如 `Instruction`、`Basic Block` 和 `NativeFunction`)均基于 [Pydantic](https://docs.pydantic.dev/latest/) 并使用 Python 类型提示构建。这意味着我们获得了 Pydantic 的所有优势,如类型验证和 JSON 序列化。
```
from binocular import Ghidra
with Ghidra() as g:
g.load("./test/example")
b = g.binary
f = g.function_sym("fib")
bb = list(f.basic_blocks)[0]
print(bb.model_dump_json())
```
**输出(经 jq 管道处理后)**
```
{
"endianness": 0,
"architecture": "x86",
"bitness": 64,
"address": 1053275,
"pie": 3,
"instructions": [
{
"endianness": 0,
"architecture": "x86",
"bitness": 64,
"address": 1053275,
"data": "837dec01",
"asm": "CMP",
"comment": "",
"ir": {
"lang_name": 2,
"data": "(unique, 0x4400, 8) INT_ADD (register, 0x28, 8) , (const, 0xffffffffffffffec, 8);(unique, 0xdb00, 4) LOAD (const, 0x1b1, 4) , (unique, 0x4400, 8);(unique, 0x27600, 4) COPY (unique, 0xdb00, 4);(register, 0x200, 1) INT_LESS (unique, 0x27600, 4) , (const, 0x1, 4);(register, 0x20b, 1) INT_SBORROW (unique, 0x27600, 4) , (const, 0x1, 4);(unique, 0x27700, 4) INT_SUB (unique, 0x27600, 4) , (const, 0x1, 4);(register, 0x207, 1) INT_SLESS (unique, 0x27700, 4) , (const, 0x0, 4);(register, 0x206, 1) INT_EQUAL (unique, 0x27700, 4) , (const, 0x0, 4);(unique, 0x15080, 4) INT_AND (unique, 0x27700, 4) , (const, 0xff, 4);(unique, 0x15100, 1) POPCOUNT (unique, 0x15080, 4);(unique, 0x15180, 1) INT_AND (unique, 0x15100, 1) , (const, 0x1, 1);(register, 0x202, 1) INT_EQUAL (unique, 0x15180, 1) , (const, 0x0, 1)"
}
},
{
"endianness": 0,
"architecture": "x86",
"bitness": 64,
"address": 1053279,
"data": "7507",
"asm": "JNZ",
"comment": "",
"ir": {
"lang_name": 2,
"data": "(unique, 0xe480, 1) BOOL_NEGATE (register, 0x206, 1); --- CBRANCH (ram, 0x101268, 8) , (unique, 0xe480, 1)"
}
}
],
"branches": [
{
"btype": 1,
"target": 1053288
},
{
"btype": 1,
"target": 1053281
}
],
"is_prologue": false,
"is_epilogue": false,
"xrefs": [
{
"from_": 1053279,
"to": 1053288,
"type": 1
}
]
}
```
### 加载二进制文件并上传到数据库
每个原语都有一个对应的 [SQLAlchemy](https://www.sqlalchemy.org/) ORM 类,其后缀为 "ORM"。(例如 `NativeFunctionORM`、`BinaryORM`)。
```
from sqlalchemy.orm import Session
from binocular import Ghidra, Backend, FunctionSource
Backend.set_engine('sqlite:////home/brandon/Documents/BINocular/example.db')
# 如果未指定 install_dir 参数,将使用内置默认路径(位于 python 包内部)
with Ghidra() as g:
g.load("./test/example")
b = g.binary
for f in b.functions:
name = f.names[0]
# Auto parse the source code and associate the functions within
# the source to the parsed functions that Ghidra has found
src = FunctionSource.from_file(name, './test/example.c')
if src is not None:
f.sources.add(src)
# Load the entire binary to the database set in line 4
with Session(Backend.engine) as s:
b.db_add(s)
s.commit()
```
### 从数据库查询数据
这是一个按名称查询二进制文件的示例。这完全基于 SQL/SQLAlchemy,因此您可以执行任何想要的查询。
使用 `.from_orm()` 函数将 ORM 对象还原为 Pydantic BaseModel 对象。
```
from sqlalchemy import select
from sqlalchemy.orm import Session
from binocular import Backend, Binary
from binocular.db import BinaryORM, NameORM
Backend.set_engine('sqlite:////home/brandon/Documents/BINocular/example.db')
with Session(Backend.engine) as session:
# Select a binary whoes file name has been "example"
binary = session.execute(
select(BinaryORM).join(NameORM, BinaryORM.names).where(NameORM.name == 'example')
).all()
binary = [b[0] for b in binary][0]
# Convert the BinaryORM object to a Binary Object
# and get all its functions
funcs = Binary.from_orm(binary).functions
print(f"example has {len(funcs)} functions")
```