lord-dubious/malware-analysis-pipeline
GitHub: lord-dubious/malware-analysis-pipeline
一个结合了 Cuckoo 动态沙箱、Magika 文件识别与大模型辅助评估的恶意软件自动化分析流水线,旨在提供从样本分类到检测规则生成的一站式解决方案。
Stars: 0 | Forks: 0
# 恶意软件分析流水线
结合了 Cuckoo Sandbox 动态分析、Google Magika 文件类型检测、Gemini 辅助威胁评估以及 YARA 规则生成的恶意软件分析流水线。
## 作品集展示

- **架构深入解析:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
- **演示指南:** [`docs/DEMO.md`](docs/DEMO.md)
- **审阅重点:** 分阶段分类、沙箱边界、Gemini/YARA 回退机制,以及可审查的降级模式元数据。
## 架构概述
```
flowchart TB
classDef input fill:#ecfeff,stroke:#0891b2,stroke-width:2px,color:#164e63
classDef core fill:#eef2ff,stroke:#4f46e5,stroke-width:2px,color:#312e81
classDef external fill:#fff7ed,stroke:#ea580c,stroke-width:2px,color:#7c2d12
classDef metadata fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
classDef review fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d
Sample[/Unknown sample/]:::input
Operator[/Analyst CLI or JSON run/]:::input
subgraph Triage["Triage Layer"]
Hash[Hash and file metadata]:::core
Magika[Magika file identification]:::core
end
subgraph Dynamic["Dynamic Analysis Boundary"]
Sandbox[Cuckoo sandbox client]:::core
MockSandbox[Mock sandbox report]:::metadata
Cuckoo[(Cuckoo Sandbox)]:::external
end
subgraph Intelligence["Assessment and Detection"]
Threat[Threat assessment model]:::core
Yara[YARA candidate generator]:::core
Gemini{{Gemini API}}:::external
end
subgraph Evidence["Reviewable Output"]
StageWarnings[Stage errors and warnings]:::metadata
Report[JSON or terminal report]:::review
Rules[YARA rule candidates]:::review
end
Operator -->|selects mode and sample| Sample
Sample --> Hash --> Magika
Magika -->|file context| Sandbox
Sandbox -->|dynamic behavior| Threat
Sandbox -. degraded or skipped .-> MockSandbox
MockSandbox --> StageWarnings
Sandbox <-->|detonation task| Cuckoo
Threat <-->|optional enrichment| Gemini
Threat -. fallback heuristic .-> StageWarnings
Threat -->|bounded assessment| Yara
Yara <-->|optional rule context| Gemini
Yara -. fallback rule metadata .-> StageWarnings
Yara --> Rules
Threat --> Report
StageWarnings --> Report
```
## 功能特性
- **文件类型检测**:使用 Google 的 Magika 识别文件类型并展示置信度分数
- **动态分析**:集成 Cuckoo Sandbox 进行行为分析
- **威胁评估**:在可用时使用 Gemini,并将启发式回退标记为降级模式
- **YARA 生成**:根据观察到的行为创建检测规则候选项,并记录回退元数据
- **MITRE ATT&CK 映射**:包含在评估数据支持时的技术映射
- **CLI 界面**:带有进度指示器和 JSON 输出支持的终端输出
## 安装说明
### 前置条件
- Python 3.11+
- Cuckoo Sandbox 实例(可选,可使用 mock 模式)
- Gemini API 密钥
### 快速开始
```
# Clone repository
git clone https://github.com/lord-dubious/malware-analysis-pipeline.git
cd malware-analysis-pipeline
# Create virtual environment
uv venv
source .venv/bin/activate
# Install dependencies
uv pip install -e ".[dev]"
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys
```
## 使用说明
### CLI 命令
#### 完整分析
```
# Analyze a file through the configured pipeline
malware-analyzer analyze suspicious.exe
# Skip sandbox analysis (triage only with AI assessment)
malware-analyzer analyze suspicious.exe --skip-sandbox
# Output as JSON
malware-analyzer analyze suspicious.exe --json
# Use mock mode for testing
malware-analyzer analyze suspicious.exe --mock
```
#### 快速分类
```
# Fast file type detection with Magika
malware-analyzer triage unknown_file.bin
# JSON output
malware-analyzer triage unknown_file.bin --json
```
#### YARA 规则生成
```
# Generate YARA rules from analysis
malware-analyzer generate-yara malware.exe
# Save to file
malware-analyzer generate-yara malware.exe -o malware.yar
```
#### 报告生成
```
# Generate text report
malware-analyzer report malware.exe
# Generate markdown report
malware-analyzer report malware.exe -f markdown -o report.md
# Generate JSON report
malware-analyzer report malware.exe -f json
```
#### 批量分析
```
# Analyze all files in a directory
malware-analyzer batch /path/to/samples --pattern "*.exe"
# Save reports to output directory
malware-analyzer batch /path/to/samples -o /path/to/reports
```
### Python API
```
import asyncio
from malware_analyzer import create_agent, create_config
# Create configuration
config = create_config(
gemini_api_key="your-api-key",
cuckoo_url="http://localhost:8090",
)
# Create agent
agent = create_agent(config)
# Analyze a file
async def analyze():
result = await agent.analyze("suspicious.exe")
print(f"Threat Level: {result.threat_assessment.threat_level}")
print(f"Classification: {result.threat_assessment.classification}")
print(f"MITRE Techniques: {result.threat_assessment.mitre_techniques}")
# Get generated YARA rules
for rule in result.yara_rules:
print(rule.to_yara())
asyncio.run(analyze())
```
## 配置说明
### 环境变量
| 变量 | 描述 | 默认值 |
|----------|-------------|---------|
| `MALWARE_GEMINI_API_KEY` | Gemini API 密钥 | 必填 |
| `MALWARE_CUCKOO_URL` | Cuckoo Sandbox URL | `http://localhost:8090` |
| `MALWARE_CUCKOO_API_TOKEN` | Cuckoo API token | 可选 |
| `MALWARE_MOCK_MODE` | 启用 mock 模式 | `false` |
| `MALWARE_ANALYSIS_TIMEOUT` | 分析超时时间(秒) | `300` |
| `MALWARE_MAX_FILE_SIZE` | 最大文件大小(字节) | `104857600` |
| `MALWARE_LOG_LEVEL` | 日志级别 | `INFO` |
## 测试
```
# Run all tests
pytest
# Run with coverage
pytest --cov=src/malware_analyzer
# Run specific test file
pytest tests/test_agent.py
# Run in verbose mode
pytest -v
```
## 项目结构
```
malware-analysis-pipeline/
├── src/malware_analyzer/
│ ├── __init__.py # Package exports
│ ├── models.py # Pydantic models
│ ├── file_analyzer.py # Magika integration
│ ├── sandbox.py # Cuckoo Sandbox API
│ ├── report_analyzer.py # Gemini analysis
│ ├── yara_generator.py # YARA rule generation
│ ├── agent.py # Main orchestrator
│ └── cli.py # Typer CLI
├── tests/
│ ├── conftest.py # Test fixtures
│ ├── test_models.py
│ ├── test_file_analyzer.py
│ ├── test_sandbox.py
│ ├── test_report_analyzer.py
│ ├── test_yara_generator.py
│ ├── test_agent.py
│ └── test_cli.py
├── pyproject.toml
├── Dockerfile
├── docker-compose.yml
└── README.md
```
## 组件
### FileAnalyzer
使用 Google 的 Magika 进行文件类型检测,并记录返回的标签和分数。
```
from malware_analyzer import create_file_analyzer
analyzer = create_file_analyzer()
file_info = analyzer.analyze("sample.exe")
print(f"Type: {file_info.magika_label}")
print(f"Confidence: {file_info.magika_score:.2%}")
print(f"Risk: {analyzer.get_risk_category(file_info)}")
```
### CuckooSandbox
用于 Cuckoo Sandbox API 的客户端,用于恶意软件动态分析。
```
from malware_analyzer import create_sandbox
sandbox = create_sandbox()
task = await sandbox.submit_file(file_path, file_info)
report = await sandbox.get_report(task.task_id, file_info)
```
### ReportAnalyzer
使用 Gemini 评估沙箱报告。如果 Gemini 不可用或返回无效输出,结果将被标记为降级回退评估,并包含来源和错误文本。
```
from malware_analyzer import create_report_analyzer
analyzer = create_report_analyzer()
assessment = await analyzer.analyze(sandbox_report)
print(f"Threat Level: {assessment.threat_level}")
print(f"Classification: {assessment.classification}")
```
### YaraGenerator
根据分析结果生成 YARA 规则。
```
from malware_analyzer import create_yara_generator
generator = create_yara_generator()
rules = await generator.generate(report, assessment)
for rule in rules:
print(rule.to_yara())
```
## 许可证
MIT 许可证 - 详见 [LICENSE](LICENSE)。
## 作者
由 [lord-dubious](https://github.com/lord-dubious) 创建
标签:AI辅助安全, Cloudflare, Cuckoo, DAST, DNS 反向解析, FTP漏洞扫描, Gemini, IP 地址批量处理, Magika, MITRE ATT&CK, Python安全工具, YARA, YARA规则生成, 云安全监控, 云资产可视化, 动态沙箱, 威胁情报, 威胁评估, 安全分析平台, 安全报告, 安全编排, 开发者工具, 恶意样本分析, 恶意软件分析, 搜索语句(dork), 文件哈希, 文件类型检测, 网络信息收集, 网络安全, 自动化流水线, 误配置预防, 逆向工具, 配置审计, 降级模式, 隐私保护, 静态分析