wrhalpin/SandGNAT

GitHub: wrhalpin/SandGNAT

一个自动化恶意软件运行时分析环境,在隔离虚拟机中捕获行为并输出 STIX 2.1 到 PostgreSQL。

Stars: 0 | Forks: 0

SandGNAT

# SandGNAT 自动化恶意软件运行时分析环境:在隔离的 Windows 虚拟机中引爆可疑二进制文件,捕获行为工件(注册表差异、文件 I/O、网络流量、进程树),并将 STIX 2.1 对象输出到 PostgreSQL。 **完整文档:** [`docs/`](docs/) — 按 [Diátaxis](https://diataxis.fr/) 框架组织(教程、实操指南、参考、解释)。渲染地址: [wrhalpin.github.io/SandGNAT](https://wrhalpin.github.io/SandGNAT/)。 **规范设计:** [`docs/MALWARE_ANALYSIS_SYSTEM_DESIGN.md`](docs/MALWARE_ANALYSIS_SYSTEM_DESIGN.md) **快速入门:** - 新人?→ [tutorials/01-your-first-sample.md](docs/tutorials/01-your-first-sample.md) - 设置开发栈?→ [tutorials/02-local-dev-stack.md](docs/tutorials/02-local-dev-stack.md) - 架构巡览?→ [explanation/architecture.md](docs/explanation/architecture.md) - API 参考?→ [reference/http-api.md](docs/reference/http-api.md) ## 仓库结构 ``` . ├── assets/logo/ Brand assets (favicon, social card, etc.) ├── docs/ Design docs + Diátaxis documentation site ├── migrations/ Postgres schema (versioned SQL) ├── orchestrator/ Python job orchestrator (Celery) │ ├── config.py Environment-backed settings │ ├── db.py psycopg connection pool │ ├── models.py Dataclasses for job/artifact rows │ ├── schema.py Shared host <-> guest wire schema (stdlib only) │ ├── stix_builder.py STIX 2.1 object factories + bundle export │ ├── proxmox_client.py Proxmox API wrapper (VM lifecycle) │ ├── vm_pool.py DB-backed VM pool manager (lease + reap) │ ├── guest_driver.py Stages samples, publishes jobs, waits for results │ ├── analyzer.py Turns guest artifacts into STIX + normalised rows │ ├── static_analysis.py Parses Linux static-guest envelope into a bundle │ ├── trigrams.py Byte/opcode trigrams + MinHash + LSH bands (stdlib) │ ├── similarity.py LSH-banded similarity lookup + short-circuit │ ├── persistence.py Writes STIX + metadata + signatures to Postgres │ ├── intake.py Sample intake pipeline (validate/hash/VT/YARA/enqueue) │ ├── intake_api.py Flask HTTP front-end for submissions │ ├── intake_server.py CLI entry point for the intake service │ ├── vt_client.py VirusTotal v3 hash-lookup client (no upload) │ ├── yara_scanner.py Optional YARA pre-classification │ ├── export_api.py Read-only Flask blueprint for GNAT connector │ ├── tasks.py Celery tasks (analyze_malware_sample) │ ├── tasks_static.py Celery static_analyze_sample (Linux pre-stage) │ └── parsers/ Artifact parsers (ProcMon, RegShot, PCAP) ├── linux_guest_agent/ Linux static-analysis guest (stdlib + opt-in deps) │ ├── watcher.py Polls staging share for static_analysis jobs │ ├── runner.py Per-job static toolchain orchestration │ └── tools/ PE/ELF, fuzzy, strings/entropy, YARA, CAPA, trigrams ├── guest_agent/ Windows-side collector (stdlib only, PyInstaller-friendly) │ ├── config.py Env-backed guest settings │ ├── watcher.py Polls staging/pending, claims jobs atomically │ ├── runner.py Per-job capture + detonate + package pipeline │ ├── executor.py Runs samples under a hard timeout │ └── capture/ Wrappers for ProcMon, tshark, RegShot, drop detection ├── infra/ │ ├── opnsense/ Firewall rule exports / templates │ └── guest/ Windows guest prep + capture scripts ├── tests/ Unit tests (schema, parsers, analyzer, guest_driver) └── pyproject.toml Python project metadata ``` ## 快速启动(编排器开发) ``` python -m venv .venv && source .venv/bin/activate pip install -e '.[dev]' # 将架构应用于本地 Postgres(按顺序运行迁移) psql "$DATABASE_URL" -f migrations/001_initial_schema.sql psql "$DATABASE_URL" -f migrations/002_intake_and_vm_pool.sql psql "$DATABASE_URL" -f migrations/003_static_analysis.sql # 运行单元测试 pytest ``` ## 提交样本 ``` # 摄入型 API(服务器上需要设置 INTAKE_API_KEY 环境变量): curl -sS -H "X-API-Key: $INTAKE_API_KEY" \ -F "file=@/path/to/sample.exe" \ -F "priority=3" \ http://localhost:8080/submit # 响应: # {"decision": "queued", "analysis_id": "...", "sha256": "...", "priority": 3, ...} # 轮询状态 curl -sS -H "X-API-Key: $INTAKE_API_KEY" \ http://localhost:8080/jobs/ ``` ## 查询结果(GNAT 连接器接口) 只读端点,使用同一服务、同一 `X-API-Key`。以下是 `gnat.connectors.sandgnat` 连接器所消费的契约: ``` GET /analyses list + filters (sha256, status, since), paginated GET /analyses/ one job row GET /analyses//bundle full STIX 2.1 bundle (409 if not completed) GET /analyses//static static-analysis findings + fuzzy hashes GET /analyses//similar LSH + lineage neighbours (threshold, flavour, limit) ``` 示例 — 拉取最近一小时内所有已完成的分析: ``` curl -sS -H "X-API-Key: $INTAKE_API_KEY" \ "http://localhost:8080/analyses?status=completed&since=$(date -u -d '1 hour ago' +%FT%TZ)" ``` 摄入环境变量: | 变量 | 用途 | |---------------------------|--------------------------------------------------------| | `INTAKE_API_KEY` | `X-API-Key` 头部的共享密钥(必需) | | `INTAKE_BIND_HOST/PORT` | HTTP 绑定地址 | | `INTAKE_MAX_SAMPLE_BYTES` | 上传大小硬性上限(默认 128 MiB) | | `INTAKE_YARA_RULES_DIR` | 用于扫描上传文件的 `.yar` 文件目录 | | `VIRUSTOTAL_API_KEY` | 若设置,仅执行哈希级 VT 预检(不上传文件内容) | | `VM_POOL_VMID_MIN/MAX` | 分析克隆用的 Proxmox vmid 范围(默认 9100-9199) | | `VM_POOL_STALE_LEASE_SECONDS` | 心跳超期时间,用于回收租约 | ## 运行时依赖 - PostgreSQL 15+(JSONB GIN 索引、`tsvector`) - Redis 7+(Celery 消息代理) - Proxmox VE 8+(API 令牌认证) - Python 3.11+ ## 状态 已完成阶段 1–5: 1. 脚手架、PostgreSQL 模式、STIX 工厂。 2. 主机↔客户机爆破协议 + 将工件转为 STIX 的分析器。 3. 摄入服务(HTTP API + VT 哈希预检 + YARA)与数据库支持的 VM 池管理器。 4. 基于字节/操作码 Trigram MinHash 与 LSH 带状近似的 Linux 静态分析预阶段。 5. **只读导出 API**:供外部消费者使用。GNAT TIP 连接器(在独立的 `wrhalpin/GNAT` 仓库中)通过 HTTP 拉取已完成的分析结果——STIX 捆绑包、每项任务的静态发现以及相似性邻近样本——无需直接访问 PostgreSQL。 下一步:在真实的 Proxmox + PostgreSQL 上进行端到端编排测试,若批量拉取无法覆盖摄入需求,则在完成后触发推送。 ## 许可证 根据 Apache License, Version 2.0 授权。完整文本请参见 [`LICENSE`](LICENSE)。 每个源文件均包含 SPDX 声明头: ``` SPDX-License-Identifier: Apache-2.0 Copyright 2026 Bill Halpin ``` 新增到仓库的文件应在顶部包含此头(Python/PowerShell/Shell 文件使用 `#` 注释,SQL 文件使用 `--`)。
标签:API, Celery, DAST, DevStack, Diátaxis, PostgreSQL, Proxmox, Python, SEO, STIX 2.1, Windows 虚拟机, 恶意软件分析, 搜索引擎查询, 文件I/O, 文档驱动, 无后门, 沙箱, 注册表, 测试用例, 网络流量, 自动化分析, 虚拟机管理, 跨站脚本, 运行时分析, 进程树, 逆向工具, 隔离环境