lord-dubious/malware-analysis-pipeline

GitHub: lord-dubious/malware-analysis-pipeline

一个结合了 Cuckoo 动态沙箱、Magika 文件识别与大模型辅助评估的恶意软件自动化分析流水线,旨在提供从样本分类到检测规则生成的一站式解决方案。

Stars: 0 | Forks: 0

# 恶意软件分析流水线 结合了 Cuckoo Sandbox 动态分析、Google Magika 文件类型检测、Gemini 辅助威胁评估以及 YARA 规则生成的恶意软件分析流水线。 ## 作品集展示 ![恶意软件分析流水线 CLI 展示](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/df03fb0b3a204849.png) - **架构深入解析:** [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) - **演示指南:** [`docs/DEMO.md`](docs/DEMO.md) - **审阅重点:** 分阶段分类、沙箱边界、Gemini/YARA 回退机制,以及可审查的降级模式元数据。 ## 架构概述 ``` flowchart TB classDef input fill:#ecfeff,stroke:#0891b2,stroke-width:2px,color:#164e63 classDef core fill:#eef2ff,stroke:#4f46e5,stroke-width:2px,color:#312e81 classDef external fill:#fff7ed,stroke:#ea580c,stroke-width:2px,color:#7c2d12 classDef metadata fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d classDef review fill:#fef2f2,stroke:#dc2626,stroke-width:2px,color:#7f1d1d Sample[/Unknown sample/]:::input Operator[/Analyst CLI or JSON run/]:::input subgraph Triage["Triage Layer"] Hash[Hash and file metadata]:::core Magika[Magika file identification]:::core end subgraph Dynamic["Dynamic Analysis Boundary"] Sandbox[Cuckoo sandbox client]:::core MockSandbox[Mock sandbox report]:::metadata Cuckoo[(Cuckoo Sandbox)]:::external end subgraph Intelligence["Assessment and Detection"] Threat[Threat assessment model]:::core Yara[YARA candidate generator]:::core Gemini{{Gemini API}}:::external end subgraph Evidence["Reviewable Output"] StageWarnings[Stage errors and warnings]:::metadata Report[JSON or terminal report]:::review Rules[YARA rule candidates]:::review end Operator -->|selects mode and sample| Sample Sample --> Hash --> Magika Magika -->|file context| Sandbox Sandbox -->|dynamic behavior| Threat Sandbox -. degraded or skipped .-> MockSandbox MockSandbox --> StageWarnings Sandbox <-->|detonation task| Cuckoo Threat <-->|optional enrichment| Gemini Threat -. fallback heuristic .-> StageWarnings Threat -->|bounded assessment| Yara Yara <-->|optional rule context| Gemini Yara -. fallback rule metadata .-> StageWarnings Yara --> Rules Threat --> Report StageWarnings --> Report ``` ## 功能特性 - **文件类型检测**:使用 Google 的 Magika 识别文件类型并展示置信度分数 - **动态分析**:集成 Cuckoo Sandbox 进行行为分析 - **威胁评估**:在可用时使用 Gemini,并将启发式回退标记为降级模式 - **YARA 生成**:根据观察到的行为创建检测规则候选项,并记录回退元数据 - **MITRE ATT&CK 映射**:包含在评估数据支持时的技术映射 - **CLI 界面**:带有进度指示器和 JSON 输出支持的终端输出 ## 安装说明 ### 前置条件 - Python 3.11+ - Cuckoo Sandbox 实例(可选,可使用 mock 模式) - Gemini API 密钥 ### 快速开始 ``` # Clone repository git clone https://github.com/lord-dubious/malware-analysis-pipeline.git cd malware-analysis-pipeline # Create virtual environment uv venv source .venv/bin/activate # Install dependencies uv pip install -e ".[dev]" # Set up environment variables cp .env.example .env # Edit .env with your API keys ``` ## 使用说明 ### CLI 命令 #### 完整分析 ``` # Analyze a file through the configured pipeline malware-analyzer analyze suspicious.exe # Skip sandbox analysis (triage only with AI assessment) malware-analyzer analyze suspicious.exe --skip-sandbox # Output as JSON malware-analyzer analyze suspicious.exe --json # Use mock mode for testing malware-analyzer analyze suspicious.exe --mock ``` #### 快速分类 ``` # Fast file type detection with Magika malware-analyzer triage unknown_file.bin # JSON output malware-analyzer triage unknown_file.bin --json ``` #### YARA 规则生成 ``` # Generate YARA rules from analysis malware-analyzer generate-yara malware.exe # Save to file malware-analyzer generate-yara malware.exe -o malware.yar ``` #### 报告生成 ``` # Generate text report malware-analyzer report malware.exe # Generate markdown report malware-analyzer report malware.exe -f markdown -o report.md # Generate JSON report malware-analyzer report malware.exe -f json ``` #### 批量分析 ``` # Analyze all files in a directory malware-analyzer batch /path/to/samples --pattern "*.exe" # Save reports to output directory malware-analyzer batch /path/to/samples -o /path/to/reports ``` ### Python API ``` import asyncio from malware_analyzer import create_agent, create_config # Create configuration config = create_config( gemini_api_key="your-api-key", cuckoo_url="http://localhost:8090", ) # Create agent agent = create_agent(config) # Analyze a file async def analyze(): result = await agent.analyze("suspicious.exe") print(f"Threat Level: {result.threat_assessment.threat_level}") print(f"Classification: {result.threat_assessment.classification}") print(f"MITRE Techniques: {result.threat_assessment.mitre_techniques}") # Get generated YARA rules for rule in result.yara_rules: print(rule.to_yara()) asyncio.run(analyze()) ``` ## 配置说明 ### 环境变量 | 变量 | 描述 | 默认值 | |----------|-------------|---------| | `MALWARE_GEMINI_API_KEY` | Gemini API 密钥 | 必填 | | `MALWARE_CUCKOO_URL` | Cuckoo Sandbox URL | `http://localhost:8090` | | `MALWARE_CUCKOO_API_TOKEN` | Cuckoo API token | 可选 | | `MALWARE_MOCK_MODE` | 启用 mock 模式 | `false` | | `MALWARE_ANALYSIS_TIMEOUT` | 分析超时时间(秒) | `300` | | `MALWARE_MAX_FILE_SIZE` | 最大文件大小(字节) | `104857600` | | `MALWARE_LOG_LEVEL` | 日志级别 | `INFO` | ## 测试 ``` # Run all tests pytest # Run with coverage pytest --cov=src/malware_analyzer # Run specific test file pytest tests/test_agent.py # Run in verbose mode pytest -v ``` ## 项目结构 ``` malware-analysis-pipeline/ ├── src/malware_analyzer/ │ ├── __init__.py # Package exports │ ├── models.py # Pydantic models │ ├── file_analyzer.py # Magika integration │ ├── sandbox.py # Cuckoo Sandbox API │ ├── report_analyzer.py # Gemini analysis │ ├── yara_generator.py # YARA rule generation │ ├── agent.py # Main orchestrator │ └── cli.py # Typer CLI ├── tests/ │ ├── conftest.py # Test fixtures │ ├── test_models.py │ ├── test_file_analyzer.py │ ├── test_sandbox.py │ ├── test_report_analyzer.py │ ├── test_yara_generator.py │ ├── test_agent.py │ └── test_cli.py ├── pyproject.toml ├── Dockerfile ├── docker-compose.yml └── README.md ``` ## 组件 ### FileAnalyzer 使用 Google 的 Magika 进行文件类型检测,并记录返回的标签和分数。 ``` from malware_analyzer import create_file_analyzer analyzer = create_file_analyzer() file_info = analyzer.analyze("sample.exe") print(f"Type: {file_info.magika_label}") print(f"Confidence: {file_info.magika_score:.2%}") print(f"Risk: {analyzer.get_risk_category(file_info)}") ``` ### CuckooSandbox 用于 Cuckoo Sandbox API 的客户端,用于恶意软件动态分析。 ``` from malware_analyzer import create_sandbox sandbox = create_sandbox() task = await sandbox.submit_file(file_path, file_info) report = await sandbox.get_report(task.task_id, file_info) ``` ### ReportAnalyzer 使用 Gemini 评估沙箱报告。如果 Gemini 不可用或返回无效输出,结果将被标记为降级回退评估,并包含来源和错误文本。 ``` from malware_analyzer import create_report_analyzer analyzer = create_report_analyzer() assessment = await analyzer.analyze(sandbox_report) print(f"Threat Level: {assessment.threat_level}") print(f"Classification: {assessment.classification}") ``` ### YaraGenerator 根据分析结果生成 YARA 规则。 ``` from malware_analyzer import create_yara_generator generator = create_yara_generator() rules = await generator.generate(report, assessment) for rule in rules: print(rule.to_yara()) ``` ## 许可证 MIT 许可证 - 详见 [LICENSE](LICENSE)。 ## 作者 由 [lord-dubious](https://github.com/lord-dubious) 创建
标签:AI辅助安全, Cloudflare, Cuckoo, DAST, DNS 反向解析, FTP漏洞扫描, Gemini, IP 地址批量处理, Magika, MITRE ATT&CK, Python安全工具, YARA, YARA规则生成, 云安全监控, 云资产可视化, 动态沙箱, 威胁情报, 威胁评估, 安全分析平台, 安全报告, 安全编排, 开发者工具, 恶意样本分析, 恶意软件分析, 搜索语句(dork), 文件哈希, 文件类型检测, 网络信息收集, 网络安全, 自动化流水线, 误配置预防, 逆向工具, 配置审计, 降级模式, 隐私保护, 静态分析