reverseame/BinTopsy

GitHub: reverseame/BinTopsy

一套轻量级 Python 恶意软件静态分析工具集，涵盖熵值可视化、反汇编、YARA 扫描、VirusTotal 查询、radare2 结构解析、行为分析与 ATT&CK 映射，可一键生成完整分析报告。

Stars: 3 | Forks: 0

# BinTopsy

BinTopsy Logo

**BinTopsy** (*Binary Autopsy*) 是一组轻量级的 Python 脚本集合，旨在辅助静态恶意软件分析和逆向工程的早期阶段。该工具包允许您可视化文件熵值、即时反汇编代码片段、使用 YARA 规则扫描二进制文件（针对大型转储文件提供分页功能）、通过 VirusTotal 自动化威胁情报查询，以及使用 radare2 解剖二进制文件——提取结构数据、生成控制流/调用图、追踪 API 使用情况，并通过模糊哈希对函数进行聚类。 [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) ## 工具一览 | # | 脚本 | 用途 | 核心依赖 | |----|------------------------------|---------------------------------------------------------------|-----------------------| | 1 | `entropy-viz.py` | 二进制文件的 2D Shannon 熵热力图 | numpy, matplotlib | | 2 | `disasm.py` | 针对文件和十六进制字符串的 CLI 反汇编器 (x86/x64/ARM/MIPS) | capstone | | 3 | `yara-chunk-scanner.py` | 分块递归 YARA 扫描器；支持并行目录模式 | yara-python | | 4 | `vt-folder-scan.py` | 仅基于哈希的文件夹 VirusTotal 检测，支持本地缓存 | requests, dotenv | | 5 | `r2-dissector.py` | Radare2 解析器 → JSON + CFG / 调用图 PDF + HTML 索引 | r2pipe, graphviz | | 6 | `sda-hashes.py` | 使用 TLSH / ssdeep 模糊哈希丰富解析器 JSON 输出 | python-tlsh, ssdeep | | 7 | `r2-call-tracer.py` | 暴力调用追踪器（同时解析间接 IAT 调用） | r2pipe | | 8 | `json-behavior-analyzer.py` | 行为矩阵 + MITRE ATT&CK Navigator 层 | stdlib | | 9 | `r2-call-graph.py` | 以特定函数为起点的定向调用图 (BFS 深度) | r2pipe, graphviz | | 10 | `r2-xref-grapher.py` | 调用给定 API 的每个函数的 CFG | r2pipe, graphviz | | 11 | `bintopsy-cluster.py` | 跨丰富化 JSON 聚类相似函数 | python-tlsh / ssdeep | | 12 | `bintopsy-diff.py` | 两个解析器 JSON 之间的结构差异对比（支持重命名检测）| python-tlsh (可选) | | 13 | `bintopsy-report.py` | 单命令流水线 → 独立的 HTML 报告 | 以上所有项 | ## 安装说明 ### 1. 系统依赖某些 Python 包封装了原生库，必须在 `pip install` **之前**安装它们： | 工具 / 库 | 被依赖于 | Debian / Ubuntu | macOS (Homebrew) | |----------------|----------------------------------------------|------------------------------|--------------------------| | Radare2 | `r2-dissector.py`, `r2-call-*.py`, `r2-xref-grapher.py` | `apt install radare2` | `brew install radare2` | | Graphviz | `r2-dissector.py`, `r2-call-graph.py`, `r2-xref-grapher.py` | `apt install graphviz` | `brew install graphviz` | | libfuzzy | `sda-hashes.py` (ssdeep) | `apt install libfuzzy-dev` | `brew install ssdeep` | ### 2. 克隆并安装 Python 依赖 ``` git clone https://github.com/reverseame/BinTopsy.git cd BinTopsy pip install -r requirements.txt ``` ### 3. 配置 VirusTotal（可选）仅当您打算使用 `vt-folder-scan.py` 时才需要。在仓库根目录中创建一个名为 `secrets.env` 的文件（已被 Git 忽略）： ``` VT_API_KEY=your_64_character_api_key_here ``` ## 用法示例 ### 1. 熵值可视化工具 (`entropy-viz.py`) **A. 加壳可执行文件的快速概览** — 默认 4 KB 窗口，默认 64 列网格： ``` python entropy-viz.py samples/malware.exe -o overview.pdf ``` **B. 高分辨率分析（shellcode / 隐写术）** — 256 字节窗口加上更宽的网格以获得更密集的图谱： ``` python entropy-viz.py suspicious_image.png -w 256 -W 128 -o high_res_map.pdf ``` ### 2. Capstone 反汇编器 (`disasm.py`) **A. 来自十六进制转储的 shellcode** — 直接内联反汇编，无需文件： ``` python disasm.py -s "55 48 89 e5 48 83 ec 20" -a x64 ``` **B. IoT 固件（MIPS，大端序）** — 设置基地址和字节序： ``` python disasm.py -f firmware_bootloader.bin -a mips --big-endian --base 0x80001000 ``` **C. 限制输出** — 仅输出前 30 条指令： ``` python disasm.py -f sample.bin -a x86 -n 30 ``` ### 3. 基于分块的 YARA 扫描器 (`yara-chunk-scanner.py`) **A. 内存转储** — 以 4 MB 分块扫描 8 GB 原始转储文件： ``` python yara-chunk-scanner.py memory_dump.raw ./rules/malware.yar -p 4194304 ``` **B. 批量目录扫描** — 将目录中的文件与另一个目录中的规则进行递归比对，在 8 个 worker 中并行运行： ``` python yara-chunk-scanner.py ./extracted_files/ ./rules_repo/ -p 4096 -j 8 ``` **C. 禁用跨块重叠** — 当规则只包含短字符串时用于提高性能： ``` python yara-chunk-scanner.py ./big_blob.bin ./rules.yar -p 65536 -O 0 ``` ### 4. VirusTotal 文件夹扫描器 (`vt-folder-scan.py`) **A. 检测 `Downloads` 文件夹**，忽略常见的媒体文件： ``` python vt-folder-scan.py ~/Downloads --avoid .jpg .jpeg .png .mp4 .log .txt ``` **B. 高级 API 密钥 + JSON 报告**： ``` python vt-folder-scan.py ./incident_response_data --premium -o triage.json ``` **C. 根据文件大小限制**以节省配额： ``` python vt-folder-scan.py ./suspicious --max-size 10485760 # skip > 10 MB ``` **D. 缓存** — 结果会持久化到 `.vt_cache.json`（默认 TTL 为 7 天），因此在同一文件夹上重新运行是免费的。可以使用 `--cache-ttl 30` 进行调整，或使用 `--no-cache` 禁用。 ### 5. Radare2 解析器 (`r2-dissector.py`) **A. 完整 JSON 导出** — 元数据、熵值以及每个函数的基本块： ``` python r2-dissector.py samples/malware.exe --blocks > structural_analysis.json ``` **B. 可视化（调用图）** — “谁调用了谁”的全局映射： ``` python r2-dissector.py samples/malware.exe --call-graph -od ./graphs ``` **C. 单个函数的 CFG**： ``` python r2-dissector.py samples/malware.exe -g sym.main -od ./graphs ``` ### 6. 模糊哈希丰富器 (`sda-hashes.py`) 向 `r2-dissector.py` 生成的 JSON 添加 TLSH 和 ssdeep 哈希： ``` python sda-hashes.py structural_analysis.json # → 写入 structural_analysis_enriched.json ``` 仅使用一种算法： ``` python sda-hashes.py structural_analysis.json --tlsh ``` ### 7. 暴力调用追踪器 (`r2-call-tracer.py`) 遍历每个函数并记录直接和间接（IAT）调用： ``` python r2-call-tracer.py samples/obfuscated.bin > call_trace.json ``` 紧凑型 JSON（体积更小，无缩进）： ``` python r2-call-tracer.py samples/obfuscated.bin --compact > call_trace.json ``` ### 8. 行为矩阵 (`json-behavior-analyzer.py`) **A. 检测分类** — 仅显示具有可疑功能的函数： ``` python json-behavior-analyzer.py call_trace.json ``` 输出高亮显示（颜色）： - 黄色 → 文件 + 网络（下载器 / 投放器） - 红色 → 文件 + 加密（类似勒索软件） - 品红色 → 进程注入（`VirtualAllocEx`、`WriteProcessMemory` 等） - 青色 → 严重的反分析行为（≥ 3 个此类 API） **B. 详细审计** — 显示每个函数，包括仅包含 GUI / 通用的函数： ``` python json-behavior-analyzer.py call_trace.json --verbose ``` **C. ATT&CK Navigator 层 + 机器可读的发现结果**： ``` python json-behavior-analyzer.py call_trace.json \ --json findings.json --attack-layer attack_layer.json ``` 将 `attack_layer.json` 拖入以可视化在样本中观察到的技术。 ### 9. 定向调用图 (`r2-call-graph.py`) 生成以特定函数为起点的定向调用图。它是 `r2-dissector.py --call-graph` 的补充，后者生成的是*全局*图。 **A. 从 `main` 开始，深度为 4**： ``` python r2-call-graph.py samples/malware.exe main_subgraph.pdf -s main -d 4 ``` **B. 从某个地址开始**： ``` python r2-call-graph.py samples/malware.exe entry.pdf -s 0x401000 -d 5 ``` 节点根据圈复杂度 (CC) 着色：绿色 = 根节点，蓝色 ≤ 10，黄色 > 10，橙色 > 20，红色 > 50。 ### 10. API 交叉引用图生成器 (`r2-xref-grapher.py`) **A. 列出每个导入项**及其交叉引用计数以及其调用者的最大 CC（用于快速分类“危险在哪里？”）： ``` python r2-xref-grapher.py samples/malware.exe -l ``` **B. 为特定 API 的每个调用者生成 CFG**： ``` python r2-xref-grapher.py samples/malware.exe VirtualAllocEx -o ./xref_graphs ``` **C. 子字符串匹配**（例如，同时捕获 `RegOpenKey` 和 `RegOpenKeyEx`）： ``` python r2-xref-grapher.py samples/malware.exe RegOpenKey --partial ``` ### 11. 函数相似度聚类 (`bintopsy-cluster.py`) 跨一个或多个丰富化的 JSON（`sda-hashes.py` 的输出）将近乎重复的函数进行分组。有助于发现恶意软件变种间的共享代码，或检测静态链接的库。 **A. 单一二进制文件聚类** — 在一个样本内查找重复函数： ``` python bintopsy-cluster.py samples/malware_enriched.json -t 60 ``` **B. 跨样本家族检测** — 提供多个丰富化的 JSON： ``` python bintopsy-cluster.py family/*_enriched.json -t 70 \ --csv shared_code.csv --dendrogram tree.pdf ``` 跨样本的聚类被标记为 `(cross-sample)`，优先显示最有趣的匹配项。`--dendrogram` 标志是可选的，并且需要 SciPy + Matplotlib。 ### 12. 二进制文件间的结构差异对比 (`bintopsy-diff.py`) 并排比较两个解析器 JSON：已添加 / 已移除 / 已修改的函数。如果两个输入都使用 TLSH 进行了丰富化，该脚本还将标记可能的**重命名**（一对移除和添加操作，其 TLSH 距离足够近，可被视为名称不同的同一段代码）。 ``` python bintopsy-diff.py samples/v1_enriched.json samples/v2_enriched.json ``` 将差异保存为 JSON 以供下游工具使用： ``` python bintopsy-diff.py v1.json v2.json --json diff.json ``` ### 13. 端到端流水线 (`bintopsy-report.py`) 一条命令即可在二进制文件上运行完整的 BinTopsy 栈，并生成一个包含每个产物索引的独立 HTML 报告： ``` python bintopsy-report.py samples/malware.exe -o ./report open ./report/index.html ``` 该流水线运行：解析器 → 模糊哈希 → 调用追踪器 → 行为矩阵 + ATT&CK 层 → 熵热力图 → CFG 画廊 → 全局调用图。实用标志： - `--no-graph-all` 跳过按函数的 CFG 批量生成（在处理大型二进制文件时很慢）。 - `--no-call-graph` 跳过全局调用图。 - `--yara rules.yar` 向流水线中添加 YARA 扫描。 ## 运行测试套件 ``` pip install pytest pytest tests/ -v ``` 该套件分为两部分： - `tests/test_smoke.py` — 每个脚本必须正常运行 `--help`。 - `tests/test_functional.py` — 每次会话编译一个小型 C 程序（如果 `PATH` 中没有编译器则跳过）并对该程序运行每个工具，验证输出格式是否正确。依赖于 `radare2`、`dot` 或 `yara` 的测试在缺少这些二进制文件时会干净地跳过。 ## 建议的流水线 radare2 工具链旨在进行组合。典型的分析流程如下： ``` ┌─────────────────────┐ binary ──────────► │ r2-dissector.py │ ──► structural_analysis.json └─────────────────────┘ │ ▼ ┌─────────────────────┐ │ sda-hashes.py │ ──► *_enriched.json └─────────────────────┘ (TLSH / ssdeep per function) ┌─────────────────────┐ binary ──────────► │ r2-call-tracer.py │ ──► call_trace.json └─────────────────────┘ │ ▼ ┌────────────────────────────┐ │ json-behavior-analyzer.py │ ──► capability matrix └────────────────────────────┘ ┌─────────────────────┐ binary ──────────► │ r2-xref-grapher.py │ ──► per-API caller CFGs (PDF) └─────────────────────┘ ┌─────────────────────┐ │ r2-call-graph.py │ ──► targeted call graph (PDF) └─────────────────────┘ ``` 对于单个二进制文件的一次性分析，`bintopsy-report.py` 会编排整个树形结构，并生成一个独立的 HTML 报告： ``` binary ──► bintopsy-report.py ──► report/index.html ├── dissect.json ├── dissect_enriched.json ├── calls.json ├── behaviour.json ├── attack_layer.json ← MITRE Navigator ├── entropy.pdf └── graphs/ ← CFGs + call graph └── index.html ``` 用于比较或聚类多个样本： ``` *_enriched.json (N files) ──► bintopsy-cluster.py ──► clusters / dendrogram v1.json + v2.json ──► bintopsy-diff.py ──► added/removed/renamed ``` ## 方法论与参考 **BinTopsy** 背后的方法论详述于： #### BibTeX ``` @InProceedings{Rodriguez-ICDF2C-25, author = {Ricardo J. Rodríguez}, booktitle = {Proceedings of the 16th EAI International Conference on Digital Forensics & Cyber Crime}, title = {{Toward Structured Memory Forensics: A MITRE ATT\&CK-Aligned Workflow for Malware Investigation}}, year = {2025}, note = {Accepted for publication. To appear.}, number = {PP}, pages = {PP}, publisher = {Springer}, volume = {PP}, abstract = {Memory forensics is emerging as an essential technique for detecting malware-related volatile indicators of compromise (IoCs) that traditional disk analysis may miss. However, the lack of standardized best practices for analyzing memory-resident malware evidence continues to limit the effectiveness and reproducibility of forensic investigations. In this work, we propose a structured five-phase workflow that formalizes best practices for the extraction and analysis of malware-related IoCs, from initial evidence preservation to binary program investigation. Our methodology is explicitly aligned with the MITRE ATT\&CK framework, allowing analysts to correlate volatile memory artifacts with known adversarial tactics and techniques. Additionally, we examine technical challenges (such as paging, on-demand paging, memory inconsistencies, and runtime binary transformations) that threaten the integrity and reliability of memory evidence. We further propose practical recommendations and outline future research directions for addressing these challenges, with the goal of improving the reliability, consistency, and forensic robustness of memory-based malware analysis.}, keywords = {digital forensics, memory forensics, methodology, indicators of compromise, malware analysis}, url = {https://webdiis.unizar.es/~ricardo/files/papers/Rodriguez-ICDF2C-25.pdf}, } ``` ## AI 透明度与致谢本仓库是**人机协作**的一个范例。 * **代码：** Python 脚本由 **Google Gemini** 起草，随后由 **R. J. Rodríguez** 进行了严格的审查、重构和测试，以确保准确性和安全性。 * **视觉素材：** **BinTopsy** 标志是使用 **Google Gemini** 的图像生成功能设计和生成的。 * **架构与方法论：** 工具选择、扫描逻辑和研究方法论由 **R. J. Rodríguez** 定义。 *免责声明：虽然使用了 AI 来加速开发过程，但每一行代码都经过了作者的手动审计。由此产生的工具代表了方法论中所述概念的经过验证的实现。* ## 资金支持本研究部分得到了西班牙国家网络安全研究所 (INCIBE) 在 *Proyecto Estratégico de Ciberseguridad — CIBERSEGURIDAD EINA UNIZAR* 项目下的支持，并由欧盟 (Next Generation) 资助的复苏、转型韧性计划基金提供资金。 ![INCIBE_logos](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/c345e26fec144905.jpg)

标签：DNS枚举, Python安全工具, radare2, VirusTotal自动化, YARA扫描, 二进制分析, 二进制剖析, 云安全运维, 云资产清单, 函数聚类, 反汇编工具, 威胁情报, 安全可视化, 安全工具包, 开发者工具, 恶意软件分析工具, 控制流分析, 控制流图生成, 文件熵可视化, 模糊哈希, 网络安全工具, 逆向工具, 逆向工程, 静态恶意软件分析