wrhalpin/SandGNAT
GitHub: wrhalpin/SandGNAT
一个自动化恶意软件运行时分析环境,在隔离虚拟机中捕获行为并输出 STIX 2.1 到 PostgreSQL。
Stars: 0 | Forks: 0
# SandGNAT
自动化恶意软件运行时分析环境:在隔离的 Windows 虚拟机中引爆可疑二进制文件,捕获行为工件(注册表差异、文件 I/O、网络流量、进程树),并将 STIX 2.1 对象输出到 PostgreSQL。
**完整文档:** [`docs/`](docs/) — 按 [Diátaxis](https://diataxis.fr/) 框架组织(教程、实操指南、参考、解释)。渲染地址:
[wrhalpin.github.io/SandGNAT](https://wrhalpin.github.io/SandGNAT/)。
**规范设计:** [`docs/MALWARE_ANALYSIS_SYSTEM_DESIGN.md`](docs/MALWARE_ANALYSIS_SYSTEM_DESIGN.md)
**快速入门:**
- 新人?→ [tutorials/01-your-first-sample.md](docs/tutorials/01-your-first-sample.md)
- 设置开发栈?→ [tutorials/02-local-dev-stack.md](docs/tutorials/02-local-dev-stack.md)
- 架构巡览?→ [explanation/architecture.md](docs/explanation/architecture.md)
- API 参考?→ [reference/http-api.md](docs/reference/http-api.md)
## 仓库结构
```
.
├── assets/logo/ Brand assets (favicon, social card, etc.)
├── docs/ Design docs + Diátaxis documentation site
├── migrations/ Postgres schema (versioned SQL)
├── orchestrator/ Python job orchestrator (Celery)
│ ├── config.py Environment-backed settings
│ ├── db.py psycopg connection pool
│ ├── models.py Dataclasses for job/artifact rows
│ ├── schema.py Shared host <-> guest wire schema (stdlib only)
│ ├── stix_builder.py STIX 2.1 object factories + bundle export
│ ├── proxmox_client.py Proxmox API wrapper (VM lifecycle)
│ ├── vm_pool.py DB-backed VM pool manager (lease + reap)
│ ├── guest_driver.py Stages samples, publishes jobs, waits for results
│ ├── analyzer.py Turns guest artifacts into STIX + normalised rows
│ ├── static_analysis.py Parses Linux static-guest envelope into a bundle
│ ├── trigrams.py Byte/opcode trigrams + MinHash + LSH bands (stdlib)
│ ├── similarity.py LSH-banded similarity lookup + short-circuit
│ ├── persistence.py Writes STIX + metadata + signatures to Postgres
│ ├── intake.py Sample intake pipeline (validate/hash/VT/YARA/enqueue)
│ ├── intake_api.py Flask HTTP front-end for submissions
│ ├── intake_server.py CLI entry point for the intake service
│ ├── vt_client.py VirusTotal v3 hash-lookup client (no upload)
│ ├── yara_scanner.py Optional YARA pre-classification
│ ├── export_api.py Read-only Flask blueprint for GNAT connector
│ ├── tasks.py Celery tasks (analyze_malware_sample)
│ ├── tasks_static.py Celery static_analyze_sample (Linux pre-stage)
│ └── parsers/ Artifact parsers (ProcMon, RegShot, PCAP)
├── linux_guest_agent/ Linux static-analysis guest (stdlib + opt-in deps)
│ ├── watcher.py Polls staging share for static_analysis jobs
│ ├── runner.py Per-job static toolchain orchestration
│ └── tools/ PE/ELF, fuzzy, strings/entropy, YARA, CAPA, trigrams
├── guest_agent/ Windows-side collector (stdlib only, PyInstaller-friendly)
│ ├── config.py Env-backed guest settings
│ ├── watcher.py Polls staging/pending, claims jobs atomically
│ ├── runner.py Per-job capture + detonate + package pipeline
│ ├── executor.py Runs samples under a hard timeout
│ └── capture/ Wrappers for ProcMon, tshark, RegShot, drop detection
├── infra/
│ ├── opnsense/ Firewall rule exports / templates
│ └── guest/ Windows guest prep + capture scripts
├── tests/ Unit tests (schema, parsers, analyzer, guest_driver)
└── pyproject.toml Python project metadata
```
## 快速启动(编排器开发)
```
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
# 将架构应用于本地 Postgres(按顺序运行迁移)
psql "$DATABASE_URL" -f migrations/001_initial_schema.sql
psql "$DATABASE_URL" -f migrations/002_intake_and_vm_pool.sql
psql "$DATABASE_URL" -f migrations/003_static_analysis.sql
# 运行单元测试
pytest
```
## 提交样本
```
# 摄入型 API(服务器上需要设置 INTAKE_API_KEY 环境变量):
curl -sS -H "X-API-Key: $INTAKE_API_KEY" \
-F "file=@/path/to/sample.exe" \
-F "priority=3" \
http://localhost:8080/submit
# 响应:
# {"decision": "queued", "analysis_id": "...", "sha256": "...", "priority": 3, ...}
# 轮询状态
curl -sS -H "X-API-Key: $INTAKE_API_KEY" \
http://localhost:8080/jobs/
```
## 查询结果(GNAT 连接器接口)
只读端点,使用同一服务、同一 `X-API-Key`。以下是 `gnat.connectors.sandgnat` 连接器所消费的契约:
```
GET /analyses list + filters (sha256, status, since), paginated
GET /analyses/ one job row
GET /analyses//bundle full STIX 2.1 bundle (409 if not completed)
GET /analyses//static static-analysis findings + fuzzy hashes
GET /analyses//similar LSH + lineage neighbours (threshold, flavour, limit)
```
示例 — 拉取最近一小时内所有已完成的分析:
```
curl -sS -H "X-API-Key: $INTAKE_API_KEY" \
"http://localhost:8080/analyses?status=completed&since=$(date -u -d '1 hour ago' +%FT%TZ)"
```
摄入环境变量:
| 变量 | 用途 |
|---------------------------|--------------------------------------------------------|
| `INTAKE_API_KEY` | `X-API-Key` 头部的共享密钥(必需) |
| `INTAKE_BIND_HOST/PORT` | HTTP 绑定地址 |
| `INTAKE_MAX_SAMPLE_BYTES` | 上传大小硬性上限(默认 128 MiB) |
| `INTAKE_YARA_RULES_DIR` | 用于扫描上传文件的 `.yar` 文件目录 |
| `VIRUSTOTAL_API_KEY` | 若设置,仅执行哈希级 VT 预检(不上传文件内容) |
| `VM_POOL_VMID_MIN/MAX` | 分析克隆用的 Proxmox vmid 范围(默认 9100-9199) |
| `VM_POOL_STALE_LEASE_SECONDS` | 心跳超期时间,用于回收租约 |
## 运行时依赖
- PostgreSQL 15+(JSONB GIN 索引、`tsvector`)
- Redis 7+(Celery 消息代理)
- Proxmox VE 8+(API 令牌认证)
- Python 3.11+
## 状态
已完成阶段 1–5:
1. 脚手架、PostgreSQL 模式、STIX 工厂。
2. 主机↔客户机爆破协议 + 将工件转为 STIX 的分析器。
3. 摄入服务(HTTP API + VT 哈希预检 + YARA)与数据库支持的 VM 池管理器。
4. 基于字节/操作码 Trigram MinHash 与 LSH 带状近似的 Linux 静态分析预阶段。
5. **只读导出 API**:供外部消费者使用。GNAT TIP 连接器(在独立的 `wrhalpin/GNAT` 仓库中)通过 HTTP 拉取已完成的分析结果——STIX 捆绑包、每项任务的静态发现以及相似性邻近样本——无需直接访问 PostgreSQL。
下一步:在真实的 Proxmox + PostgreSQL 上进行端到端编排测试,若批量拉取无法覆盖摄入需求,则在完成后触发推送。
## 许可证
根据 Apache License, Version 2.0 授权。完整文本请参见 [`LICENSE`](LICENSE)。
每个源文件均包含 SPDX 声明头:
```
SPDX-License-Identifier: Apache-2.0
Copyright 2026 Bill Halpin
```
新增到仓库的文件应在顶部包含此头(Python/PowerShell/Shell 文件使用 `#` 注释,SQL 文件使用 `--`)。标签:API, Celery, DAST, DevStack, Diátaxis, PostgreSQL, Proxmox, Python, SEO, STIX 2.1, Windows 虚拟机, 恶意软件分析, 搜索引擎查询, 文件I/O, 文档驱动, 无后门, 沙箱, 注册表, 测试用例, 网络流量, 自动化分析, 虚拟机管理, 跨站脚本, 运行时分析, 进程树, 逆向工具, 隔离环境