Aegispub/thir-ha

GitHub: Aegispub/thir-ha

一个基于 Oracle Cloud 双 VM 高可用架构的实时蜜罐威胁情报平台，通过 GitHub Actions 流水线自动完成攻击捕获、情报丰富、恶意软件分析与 SOC 报告生成。

Stars: 0 | Forks: 0

# THIR.HA — 威胁狩猎情报靶场 HAProxy 一个实时的蜜罐威胁情报平台。两台 Oracle Cloud Always Free VM —— 一台作为传感器，一台作为核心处理节点 —— 持续捕获真实世界的攻击。GitHub Actions 流水线每两小时运行一次：解析会话、丰富攻击者 IP 信息、聚类攻击活动、分析恶意软件，并发布到带有自动化 SOC 报告的实时仪表板。 **在线演示：** [thirha.aegispub.com](https://thirha.aegispub.com) ## 架构 ``` Internet │ Cloudflare (free tier) DNS health checks · DDoS protection │ ┌───────────────┴───────────────┐ ▼ ▼ VM1 — Sensor Node VM2 — Brain Node ───────────────── ──────────────── Cowrie SSH :2222 HAProxy TCP LB :2222 Cowrie Telnet :2223 HAProxy Telnet :2223 cloudflared tunnel pipeline tools Public IP: 129.80.119.236 rsync collector Private IP: 10.0.0.53 cloudflared tunnel Public IP: 150.230.174.199 Private IP: 10.0.0.73 │ │ └──────── Oracle VCN ───────────┘ 10.0.0.0/24 (internal · 10Gbps) │ GitHub Actions (SSHes to VM2 only — VM1 never touched) │ thirha.aegispub.com (GitHub Pages) ``` ### 数据流 ``` VM1 Cowrie writes → /home/cowrie/cowrie/var/log/cowrie/cowrie.json │ │ rsync over Oracle VCN private IP (10.0.0.x) │ VM2 cron pulls at :55 — before GitHub Actions at :00 ▼ VM2 local copy → /opt/thir/logs/cowrie.json │ │ GitHub Actions SSHes to VM2 public IP (port 22222) │ reads /opt/thir/logs/cowrie.json (watermark-incremental) ▼ ┌──────────────────────────────────────────────────────────────┐ │ GitHub Actions Pipeline │ │ │ │ ── Every 2 hours ─────────────────────────────────────── │ │ Tool 05 → Honeypot liveness (HAProxy:2222)→posture.json │ │ → → data/assets.json │ │ [rsync] → Incremental log fetch → /tmp/cowrie.json │ │ Tool 26 → Parse Cowrie sessions → data/ir_cases.json│ │ Tool 34 → Credential extraction → data/credentials.json│ │ Tool 35 → SSH fingerprint aggregation → data/ssh_fingerprints.json│ │ Tool 36 → Command clustering → data/command_clusters.json│ │ Tool 27 → Enrich attacker IPs → data/threat_ips.json│ │ Tool 29 → FP filter → data/fp_filter.json│ │ Tool 30 → Aggregate metrics → data/stats.json │ │ Tool 30b → ASN clustering → data/asn_clusters.json│ │ [cond] → Fetch downloads (if any) → /tmp/cowrie-downloads/│ │ Tool 31 → Malware analysis (cond.) → data/malware_report.json│ │ Tool 33 → YARA classifier (cond.) → data/yara_matches.json│ │ Tool 28 → SOC handover report → data/soc_handover.md│ │ Tool 37 → Alert engine → data/alert_history.json│ │ Tool 32 → Save daily + peak stats → reports/daily/ │ │ Tool 07 → Data integrity check → (exit code) │ │ │ │ ── Monday 00:05 UTC ──────────────────────────────────── │ │ Tool 32 --rollup weekly → reports/weekly/ │ │ │ │ ── 1st of month 00:10 UTC ────────────────────────────── │ │ Tool 32 --rollup monthly → reports/monthly/ │ │ │ └──────────────────────────────────────────────────────────────┘ │ git push data/ + reports/ ▼ GitHub Pages → thirha.aegispub.com ``` ### HA 堆栈 | 层级 | 技术 | 范围 | 故障转移时间 | |---|---|---|---| | DNS | Cloudflare 健康检查 | 完整 VM 故障转移 | 60–120 秒 | | TCP | VM2 上的 HAProxy | 服务级（Cowrie 崩溃） | 30–60 秒 | | Tunnel | 两台 VM 上的 cloudflared | 网络闪断恢复 | 10–30 秒 | | 数据 | 通过 VCN 10.0.0.x 的 Rsync | 日志连续性 | 无实时间隙 | ## 流水线工具 ### 核心工具（沿用自 thir-live，针对 Oracle 适配） | # | 工具 | 语言 | 职责 | |---|---|---|---| | 05 | `05_network_monitor_live.go` | Go | 对 VM2 HAProxy:2222 进行 TCP 存活检查；写入 `posture.json` + `assets.json` | | 07 | `07_file_integrity_live.go` | Go | 对 `data/` 文件进行 SHA-256 基线验证 | | 26 | `26_incident_timeline_live.py` | Python | 解析 Cowrie NDJSON → 包含 MITRE ATT&CK TTPs 的 IR 案例 | | 27 | `27_threat_intel_feeder_live.go` | Go | 通过 AbuseIPDB + OTX 进行并发 IP 信息丰富 | | 28 | `28_soc_handover_live.py` | Python | 每次运行生成结构化的 SOC 交接报告 | | 29 | `29_false_positive_live.py` | Python | 3 信号误报（FP）过滤器（评分、ISP、行为） | | 30 | `30_metric_exporter_live.go` | Go | 汇总所有流水线输出 → 仪表板统计数据 | | 30b | `30b_asn_clustering_live.go` | Go | 按 ASN 对攻击者 IP 进行分组；标记 Tor/VPN/proxy 基础设施 | | 31 | `31_malware_analyzer_live.py` | Python | Magic bytes、哈希、可疑字符串、可选的 VirusTotal | | 32 | `32_report_lifecycle.py` | Python | 每日保存，每周/每月汇总，峰值统计，6 个月数据保留 | | 33 | `33_yara_classifier_live.py` | Python | 对下载的恶意软件进行 YARA 规则匹配；启发式后备 | | 34 | `34_credential_extractor_live.py` | Python | 提取攻击者的用户名/密码对；热门凭据分析 | | 35 | `35_ssh_fingerprint_live.py` | Python | HASSH 指纹，客户端家族映射，僵尸网络 KEX 检测 | | 36 | `36_command_clustering_live.py` | Python | 通过 Jaccard 相似度对会话进行分组；检测攻击活动 | | 37 | `37_alerts_live.py` | Python | 警报引擎 — 通过 Slack/email/dry-run 发送 HIGH/CRITICAL 级别发现 | ### HA 工具（计划中 — 仅限 thir-ha） | # | 工具 | 语言 | 职责 | |---|---|---|---| | 00 | `00_historical_processor.py` | Python | 批量重新处理完整的 59 天 AWS 日志语料库 → `historical_data/` | | 38 | `38_rsync_collector.py` | Python | 通过私有 VCN 进行从 VM1 到 VM2 的结构化日志拉取；替代 shell cron 脚本 | | 39 | `39_node_healthcheck.go` | Go | 直接对 VM1 进行健康检查 (10.0.0.53)；写入 `data/node_health.json` | | 40 | `40_failover_notifier.py` | Python | 当 HAProxy 在 backends 之间切换流量时发出警报 | ### HTTP 蜜罐工具（计划中 — 等待 Tool 41 部署） | # | 工具 | 语言 | 职责 | |---|---|---|---| | 41 | `41_http_honeypot.py` | Python/Flask | 8080 端口上的 HTTP 攻击面；NDJSON 输出 | | 42 | `42_http_parser_live.py` | Python | 解析 HTTP 蜜罐日志 → `ir_cases.json` 格式 | ## 增量日志获取该流水线使用基于水位线的增量获取方式。每次成功运行后，VM2 `cowrie.json` 的总行数会被保存到 `data/cowrie_watermark.json` 中。在下一次运行时，只会通过 `tail -n +N` 获取自上次水位线之后的新增行。每次运行只处理增量部分 —— 通常在每个 2 小时窗口期内包含 50–200 行。 **后备机制：** 如果水位线文件丢失，或者 VM2 的行数少于存储的水位线（VM1 上发生了日志轮转），流水线会自动回退到完整获取模式。 ## 报告生命周期（Tool 32） | 级别 | 触发条件 | 输出 | 保留期限 | |---|---|---|---| | 每日 | 每次流水线运行 | `reports/daily/soc_YYYY-MM-DD.md` | 5–7 天 | | 每周 | UTC 时间周一 00:05 | `reports/weekly/soc_week_YYYY-WNN.md` | 3–4 周 | | 每月 | UTC 时间每月 1 日 00:10 | `reports/monthly/soc_YYYY-MM.md` | 6 个月 | 峰值统计数据（峰值会话数、独立 IP 数、已确认威胁）会作为高水位线记录在 `data/stats.json` 中 —— 只有当当前运行打破现有峰值时才会更新，绝不会因为某次较安静的运行而重置。 ## 恶意软件分析（Tool 31）仅当在 `ir_cases.json` 中检测到下载行为时才运行。会在 Tool 28 之前运行，以确保 SOC 交接报告始终包含当前运行的恶意软件发现。 - 通过 magic byte 特征进行文件类型检测（ELF, PE, shell 脚本, 压缩包） - 哈希计算 — 每个样本的 MD5, SHA1, SHA256 - ELF 架构检测 — x86, x86-64, ARM, AArch64, MIPS, RISC-V - 可疑字符串扫描 — 涵盖持久化、C2、加密挖矿程序、破坏性命令等 30 多种模式 - VirusTotal 查询（可选，免费层级） — 基于哈希，报告检测率 - 威胁评分 — 0–100 分映射到 LOW / MEDIUM / HIGH 严重性输出：`data/malware_report.json` ## 警报引擎（Tool 37）警报条件：HIGH/CRITICAL 恶意软件样本、新的成功认证 IP、新的 ASN 集群、TCP tunnel 尝试、来自 Tool 36 聚类的活跃攻击活动。渠道由 `ALERT_CHANNEL` secret 控制：`slack`、`email`、`both` 或 `dry-run`（未设置时的默认值 —— 在首次部署时很安全）。`data/alert_history.json` 中的去重状态可防止对同一发现重复发出警报。 ## 快速开始查看 **[SETUP.md](SETUP.md)** 获取完整的 Oracle HA 分步部署指南。 ### 必需的 GitHub Secrets | Secret | 用途 | |---|---| | `ORACLE_VPS_SSH_KEY` | 用于 Oracle VM2 的 SSH 私钥（ubuntu 用户，端口 22222） | | `ORACLE_VPS_IP` | Oracle VM2 公网 IP — 流水线核心处理节点 | | `ABUSEIPDB_API_KEY` | [abuseipdb.com](https://www.abuseipdb.com) 免费密钥 | | `OTX_API_KEY` | [otx.alienvault.com](https://otx.alienvault.com) 免费密钥 | ### 可选的 GitHub Secrets | Secret | 用途 | |---|---| | `VIRUSTOTAL_API_KEY` | VirusTotal 免费密钥 — 启用 Tool 31 哈希查询 | | `ALERT_CHANNEL` | `slack` \| `email` \| `both` \| `dry-run`（默认：`dry-run`） | | `SLACK_WEBHOOK_URL` | 当 `ALERT_CHANNEL` 包含 `slack` 时必填 | | `SMTP_HOST` / `SMTP_USER` / `SMTP_PASS` | 当 `ALERT_CHANNEL` 包含 `email` 时必填 | | `ALERT_EMAIL_FROM` / `ALERT_EMAIL_TO` | 用于电子邮件警报的发件人/收件人 | ## 仓库结构 ``` thir-ha/ ├── .github/workflows/ │ └── pipeline.yml ← 3 schedules: every 2h + weekly + monthly ├── tools/ │ ├── core/ ← Tools 05, 07, 26–37 (Oracle-adapted) │ ├── ha/ ← Tools 00, 38, 39, 40 (HA-specific) │ └── http_honeypot/ ← Tools 41, 42 (planned) ├── config/ │ ├── haproxy.cfg ← HAProxy reference config (VM2) │ ├── vcn_rules.md ← Oracle VCN ingress rules │ └── cloudflare.md ← DNS failover + tunnel setup ├── data/ ← Written by pipeline every 2 hours │ ├── ir_cases.json ← IR cases from Cowrie sessions (Tool 26) │ ├── threat_ips.json ← Enriched attacker IPs (Tool 27) │ ├── fp_filter.json ← False positive decisions (Tool 29) │ ├── stats.json ← Aggregated metrics + peak stats (Tool 30) │ ├── node_health.json ← VM1 direct health checks (Tool 39) │ ├── posture.json ← HAProxy liveness + CIS controls (Tool 05) │ ├── assets.json ← Live asset inventory (Tool 05) │ ├── soc_handover.md ← Current SOC shift report (Tool 28) │ ├── malware_report.json ← Malware analysis results (Tool 31) │ ├── yara_matches.json ← YARA classification results (Tool 33) │ ├── credentials.json ← Attacker credential pairs (Tool 34) │ ├── ssh_fingerprints.json ← HASSH fingerprints (Tool 35) │ ├── command_clusters.json ← Session clusters + campaigns (Tool 36) │ ├── asn_clusters.json ← ASN groupings (Tool 30b) │ ├── alert_history.json ← Alert dedup state (Tool 37) │ ├── cowrie_watermark.json ← Incremental fetch watermark │ └── integrity_baseline.json ← SHA-256 baseline (Tool 07) ├── historical_data/ ← Tool 00 output — 59-day AWS corpus baseline │ │ Source: thir-raw-archive (AWS R2 bucket) │ ├── historical_ir_cases.json │ ├── historical_stats.json │ └── historical_credentials.json ├── reports/ ← SOC report archive (Tool 32) │ ├── daily/ │ ├── weekly/ │ └── monthly/ ├── docs/ │ └── THIR_HA_Runbooks_v2.docx ← 6 recovery runbooks (RB-01 to RB-06) ├── css/thir.css ← Dashboard stylesheet ├── js/ ← Dashboard modules │ ├── data.js │ ├── pipeline.js │ ├── render.js │ ├── map.js │ └── main.js ├── index.html ← Live dashboard ├── CNAME ← thirha.aegispub.com ├── README.md ├── SETUP.md ← Oracle HA deployment guide ├── ARCHITECTURE.md ← Two-node design reference ├── MIGRATION.md ← What changed from thir-live and why ├── CONTRIBUTING.md ├── SECURITY.md ├── DISCLAIMER.md └── LICENSE ``` ## 计划路线图 | 优先级 | 项目 | 工具 | |---|---|---| | 高 | Tool 38 — 替代 shell cron 的结构化 rsync 收集器 | `tools/ha/` | | 高 | Tool 39 — VM1 直接节点健康检查器 | `tools/ha/` | | 中 | Tool 40 — HAProxy 故障转移通知器 | `tools/ha/` | | 中 | Tool 00 — 59 天历史处理器（在 AWS 停用前） | `tools/ha/` | | 中 | Tool 41 — VM1:8080 上的 HTTP 蜜罐 Flask 应用 | `tools/http_honeypot/` | | 中 | Tool 42 — HTTP 日志解析器 → ir_cases 格式 | `tools/http_honeypot/` | | 低 | 在两台 VM 上部署 cloudflared tunnel | 基础设施 | ## 许可证 MIT — 查看 [LICENSE](LICENSE) ## 免责声明仅用于防御性安全研究。查看 [DISCLAIMER.md](DISCLAIMER.md)。

标签：HAProxy, 威胁情报, 安全运营中心, 开发者工具, 日志审计, 网络映射, 网络调试, 自动化, 蜜罐, 证书利用, 逆向工具