aditya777-dev/soc-workflow-automation

GitHub: aditya777-dev/soc-workflow-automation

一款 SOC 工作流自动化 Python 脚本，通过 VirusTotal 和 AbuseIPDB API 富化 SIEM 告警指标并生成多格式风险事件报告。

Stars: 1 | Forks: 0

# SOC 工作流自动化 — 告警富化这是一个 Python 自动化工具，它接收模拟的 SIEM 告警，通过 **VirusTotal** 和 **AbuseIPDB** 的免费 API 对每个入侵指标（IP 地址、域名、文件哈希）进行富化分析，计算风险评分，并生成三种格式的结构化事件报告。本项目作为 SOC 分析师的 portfolio 项目构建，展示了威胁情报自动化、API 集成以及事件报告能力。 ## 功能说明 ``` [SIEM Alert JSON] │ ▼ [Extract IOCs] ─────────────────────────────────────────────┐ │ │ ├── IP Addresses ──► VirusTotal (malicious votes, │ │ AbuseIPDB country, ASN, ISP) │ │ │ ├── Domains ──────► VirusTotal (malicious votes, │ │ registrar, category) │ │ │ └── File Hashes ──► VirusTotal (AV detections, │ file type, SHA-256) │ │ [Risk Scoring: 0–100] ◄────────────────┘ │ ┌──────┴──────┐ ▼ ▼ reports/*.json reports/*.txt reports/*.html (machine-readable) (analyst review) (interactive web) ``` 私有/内部 IP（RFC 1918、环回地址、链路本地地址）以及语法无效的 IP 字符串会被自动跳过 —— 避免浪费 API 额度。 ## 项目结构 ``` SOC workflow automation script/ │ ├── src/ │ └── soc_enrichment.py ← Main script (run this) │ ├── data/ │ └── sample_alerts.json ← Simulated SIEM alerts (input) │ ├── reports/ │ ├── example_report.txt ← Pre-generated plain-text sample │ └── example_report.html ← Pre-generated HTML sample │ ├── .env ← Your API keys (gitignored — never committed) ├── .env.example ← API key template (safe to commit) ├── .gitignore ├── requirements.txt └── README.md ``` ## 快速开始 ### 1 — 克隆仓库 ``` git clone https://github.com/aditya777-dev/soc-workflow-automation.git cd soc-workflow-automation ``` ### 2 — 安装依赖 ``` pip install -r requirements.txt ``` 安装：`requests`（HTTP API 调用）和 `python-dotenv`（加载 API 密钥）。 ### 3 — 配置 API 密钥 ``` cp .env.example .env ``` 编辑 `.env`： ``` VIRUSTOTAL_API_KEY=your_key_here ABUSEIPDB_API_KEY=your_key_here ``` **获取免费 API 密钥：** | 服务 | URL | 免费层级 | |---------|-----|-----------| | VirusTotal | https://www.virustotal.com/gui/sign-in | 每天 500 次查询，每分钟 4 次 | | AbuseIPDB | https://www.abuseipdb.com/register | 每天 1,000 次检查 | ### 4 — 运行 ``` python src/soc_enrichment.py ``` 报告将保存在 `reports/` 目录中，文件名包含时间戳。 **自定义路径：** ``` python src/soc_enrichment.py data/my_alerts.json --output-dir /tmp/reports ``` ## 示例告警格式该脚本接收一个 JSON 数组。每个 IOC 字段都是可选的 —— 仅对存在的字段进行富化。 ``` [ { "alert_id": "ALT-2026-001", "timestamp": "2026-05-30T08:15:00Z", "severity": "CRITICAL", "alert_type": "Malware C2 Communication", "source_host": "WORKSTATION-042", "source_ip": "192.168.1.42", "destination_ip": "185.220.101.45", "destination_domain": "update.microsoft-cdn.net", "file_hash_md5": "44d88612fea8a8f36de82e1278abb02f", "file_name": "system_update.exe", "process": "svchost.exe", "description": "Suspicious outbound connection to known Tor exit node", "rule_triggered": "TOR_EXIT_NODE_COMMUNICATION" } ] ``` **支持的 IOC 字段：** | 字段 | 富化方式 | |-------|-------------| | `source_ip` | VirusTotal + AbuseIPDB | | `destination_ip` | VirusTotal + AbuseIPDB | | `destination_domain` | VirusTotal | | `file_hash_md5` | VirusTotal | ## 输出报告每次运行都会生成三个文件，命名为 `incident_report_YYYYMMDD_HHMMSS.*`： ### `*.json` — 机器可读格式包含原始 API 字段和判定分数的完整结构化输出。适用于 SIEM 摄取、工单系统导入或进一步编写脚本处理。 ### `*.txt` — 纯文本分析师报告采用框线格式的报告，包含按告警划分的部分、IOC 富化模块和风险因素。易于打印并方便附加到工单中。 ### `*.html` — 交互式网页报告深色主题的 HTML 报告，带有颜色编码的严重性/判定徽章、风险评分进度条以及可折叠的 IOC 详情面板。可在任何浏览器中打开 —— 无需互联网连接（完全独立）。请参阅 [`reports/example_report.html`](reports/example_report.html) 和 [`reports/example_report.txt`](reports/example_report.txt) 查看通过真实 API 生成的实际示例。 ## 风险评分累加评分，上限为 100： | 信号 | 分数 | |--------|--------| | IP：≥1 个 VirusTotal 恶意投票 | +40 | | 文件哈希：≥1 个 VirusTotal 恶意投票 | +40 | | IP：AbuseIPDB 置信度 ≥ 75% | +30 | | 域名：≥1 个 VirusTotal 恶意投票 | +25 | | IP：AbuseIPDB 置信度 25–74% | +15 | | 最终分数 | 判定结果 | |-------------|---------| | 0–9 | CLEAN | | 10–39 | POTENTIALLY_SUSPICIOUS | | 40–69 | SUSPICIOUS | | 70–100 | MALICIOUS | ## 终端输出 ``` 20:03:55 [INFO ] Loaded 3 alert(s) from data\sample_alerts.json 20:03:55 [INFO ] Note: VirusTotal free tier = 4 req/min — expect ~16 s between lookups. 20:03:55 [INFO ] ── Processing alert 1/3: ALT-2026-001 [CRITICAL] ── 20:03:55 [INFO ] Skipping non-public IP '192.168.1.42' (source_ip) 20:03:55 [INFO ] [VT] Enriching IP: 185.220.101.45 20:03:56 [INFO ] [AbuseIPDB] Checking IP: 185.220.101.45 ... ============================================================== ENRICHMENT COMPLETE ============================================================== Alerts processed : 3 JSON report : reports\incident_report_20260530_200522.json Text report : reports\incident_report_20260530_200522.txt HTML report : reports\incident_report_20260530_200522.html ============================================================== Alert ID Verdict Risk Score ------------------ ------------------------ ---------- ALT-2026-001 MALICIOUS 100/100 ALT-2026-002 SUSPICIOUS 40/100 ALT-2026-003 MALICIOUS 70/100 ``` ## 速率限制与时间 | 服务 | 免费限制 | 本脚本的处理方式 | |---------|------------|---------------------------| | VirusTotal | 每分钟 4 次请求，每天 500 次 | 每两次 VT 调用之间强制延迟 16 秒 | | AbuseIPDB | 每天 1,000 次 | 无每分钟限制 —— 立即调用 | 对于在多个告警中出现的相同 IOC 值，仅查询 **一次** 并在本次运行期间进行缓存。 **3 个示例告警的预计运行时间：** 约 2 分钟（6 次 VT 调用 × 16 秒）。 ## 开发过程中修复的 Bug | # | Bug | 修复方法 | |---|-----|-----| | 1 | 无效的 IP 字符串（`not.an.ip.addr`）绕过了私有 IP 检查并被发送至 API（导致 HTTP 400/422） | `is_valid_public_ip()` 现在使用 `ipaddress` 进行验证，并要求必须是 `is_global` | | 2 | 429 速率限制重试为递归调用 —— 在持续限流时有栈溢出风险 | 替换为迭代重试循环（`VT_MAX_RETRIES = 3`） | | 3 | 日志行显示顺序混乱（logging → stderr，print → stdout） | `logging.basicConfig(stream=sys.stdout)` + `sys.stdout.reconfigure(encoding='utf-8')` | | 4 | 日志信息中的 Unicode 框线字符（`→`, `──`）在 Windows cp1252 控制台中导致崩溃 | 在日志设置前执行 `sys.stdout.reconfigure(encoding='utf-8', errors='replace')` | ## 使用的技术 | 工具 | 用途 | |------|---------| | Python 3.8+ | 核心脚本编写 | | `requests` | 对威胁情报 API 进行 HTTP 调用 | | `python-dotenv` | 安全加载 API 密钥 | | VirusTotal API v3 | 多厂商恶意软件扫描（IP、域名、哈希） | | AbuseIPDB API v2 | 众包的 IP 滥用信誉查询 | ## 作者作为 SOC 分析师职位的 portfolio 项目构建。 **展示的技能：** - REST API 集成（身份验证、速率限制、错误处理、重试） - IOC 提取与富化自动化 - 风险评分与判定分类 - 多格式报告生成（JSON, TXT, HTML） - Python 最佳实践：日志记录、类型提示、模块化 OOP 设计、缓存 - 具备安全意识的输入验证（无效/私有 IP 过滤）

标签：逆向工具