rvong65/mitre-attack-chain-visualizer

GitHub: rvong65/mitre-attack-chain-visualizer

将真实 Atomic Red Team 遥测数据中分散的 Sysmon/EDR 事件分组为带置信度评分的攻击链，映射 MITRE ATT&CK 技术后在交互式时间轴上可视化展示，帮助 SOC 分析师从海量进程事件中快速识别攻击剧情。

Stars: 1 | Forks: 0

# MITRE ATT&CK 链可视化工具进程树和事件链将真实的攻击行为隐藏在铺天盖地的 EDR/Sysmon 遥测数据中。本项目将相关事件分组为**带分数且具解释性的攻击链**，将其映射到 MITRE ATT&CK 技术和战术，并在交互式时间轴上展示——灵感来源于 [SentinelOne Storyline](https://www.sentinelone.com/blog/rapid-threat-hunting-with-deep-visibility-feature-spotlight/) 和 [CrowdStrike Falcon](https://www.crowdstrike.com/platform/endpoint-security/falcon-insight-xdr/) 的行为图谱。 [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://mitre-attack-chain-visualizer.streamlit.app/) ## 目录 **快速开始** - [在线演示](#live-demo) - [功能](#features) - [快速开始](#quick-start) **概述** - [问题与动机](#problem-motivation) - [技术栈](#tech-stack) - [数据来源与版权说明](#data-sources) **技术** - [架构与设计选择](#architecture-design-choices) - [开发历程](#development-journey) - [安全考量](#safety-considerations) - [CI/CD](#cicd) - [项目状态与构建日志](#project-status) - [仓库结构](#repository-layout) **法律与联系方式** - [许可证](#license) - [联系方式 / 后续计划](#contact) ## 🚀 在线演示 **[▶ 在 Streamlit Cloud 打开在线应用](https://mitre-attack-chain-visualizer.streamlit.app/)** **在打开应用之前：** - **冷启动：** 本应用运行在 Streamlit Community Cloud 上，可能会在一段时间不活动后进入休眠状态。如果您看到 **“Zzzz — This app has gone to sleep due to inactivity”**，请点击 **“Yes, get this app back up!”** 将其唤醒——任何人都可以执行此操作；您无需联系维护者。点击后启动可能需要一分钟时间。 **注意：** 在线应用会从 `data/processed/` 加载预构建的优化攻击链。原始的 Splunk Attack Data 日志未包含在内——请单独下载以从零开始重建。您也可以通过侧边栏上传自己的 Sysmon/EDR CSV 文件（在 Streamlit Cloud 免费层建议限制在 ~50 MB 以内）。 **截图：** ![主屏幕](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/59acfd42cf024331.png) ## ✨ 功能 - **仓库中包含预构建的优化攻击链** — 克隆后即可探索攻击链；可选择从 Splunk Attack Data 重新构建 - **上传您自己的 Sysmon/EDR CSV** — 包含验证、大小检查和友好的错误处理 - **按置信度、攻击链长度和战术进行过滤** — 聚焦于高信号活动 - **交互式时间轴** — Plotly 散点图，带悬停说明、cmdline 片段和战术着色 - **攻击链汇总表** — 按置信度编码的单元格（绿 / 黄 / 红等级） - **将过滤后的攻击链导出为 CSV** **预期的上传列**（最低要求）：`timestamp`、`process_path`、`cmdline`、`parent_process`。 ## 快速开始 ### 1. 克隆并安装 ``` git clone https://github.com/rvong65/mitre-attack-chain-visualizer.git cd mitre-attack-chain-visualizer python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt ``` ### 2. 运行仪表板仓库包含**优化后的攻击链输出**（`data/processed/events_with_chains_polished.csv` 和 `chains_summary_polished.csv`），因此克隆后应用即可立即运行——无需原始日志。如果尚未激活，请激活 `.venv`，然后执行： ``` streamlit run app.py ``` 打开 **http://localhost:8501**。 ### 3. 从原始日志重建（可选）原始的 Atomic Red Team 日志**未**被提交（出于许可证/大小原因）。要重新生成所有 pipeline 输出： 1. 从 [Splunk Attack Data](https://github.com/splunk/attack_data) 下载 T1059.001、T1003.001、T1003.003、T1547.001 的 `atomic_red_team` 子文件夹中的日志。 2. 将文件放入 `data/raw//`（例如 `data/raw/T1059.001/windows-sysmon.txt`）。 3. 激活 `.venv` 后，运行 pipeline： ``` python -c "from src.pipeline import run_pipeline; run_pipeline()" python -m src.features.pipeline python -m src.chain_detection python -m src.chain_refine python -m src.chain_polish streamlit run app.py ``` 输出将写入 `data/processed/`（除优化后的文件对外，其余均被 gitignored）。 ### 替代方案：上传 CSV 使用侧边栏上传工具上传 Sysmon/EDR 事件 CSV（`timestamp`、`process_path`、`cmdline`、`parent_process`）——无需在本地运行 pipeline。 ### 开发（可选）激活 `.venv` 后： ``` pip install -r requirements-dev.txt pytest tests/ -q ``` 当 `data/raw/` 或中间处理文件缺失时，Loader 和特征测试会自动跳过。攻击链检测的冒烟测试无需原始日志即可运行。 ## 🎯 问题与动机基于签名的检测和静态指标难以应对新型技术和“靠土地生存”（living-off-the-land）二进制文件。SOC 分析师仍然面临着成千上万互不关联的进程创建事件，而有意义的信号其实是**序列**：执行 → 凭据访问 → 持久化。本项目通过以下方式填补了这一空白： - 通过父子进程关系和时间邻近性将事件分组为进程链 - 使用**置信度分数**和人类可读的说明将攻击链映射到 MITRE ATT&CK - 优先展示**多事件攻击链**——即信号最强的攻击剧情 - 提供对分析师友好的时间轴，包含过滤器、悬停详情和 CSV 导出内置了可解释性和分诊功能。置信度阈值过滤减少了噪音；每个高亮显示的攻击链都包含分析师可采取行动的上下文——这是专为安全运营工作流设计的实用方案。 ## 🛠️ 技术栈 ![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54) ![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white) ![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white) ![Plotly](https://img.shields.io/badge/plotly-%233F4F75.svg?style=for-the-badge&logo=plotly&logoColor=white) ![Streamlit](https://img.shields.io/badge/Streamlit-%23FF4B4B.svg?style=for-the-badge&logo=streamlit&logoColor=white) ![GitHub Actions](https://img.shields.io/badge/CI-GitHub_Actions-2088FF?style=for-the-badge&logo=githubactions&logoColor=white) | 层级 | 工具 | |-------|-------| | **数据处理** | Python, Pandas, NumPy | | **可视化** | Plotly, Streamlit | | **检测** | 基于规则的攻击链构建 + 加权置信度评分 | | **部署** | [Streamlit Cloud](https://streamlit.io/cloud) | | **CI** | GitHub Actions (pytest) | ## 📊 数据来源与版权说明本项目使用了来自 [Splunk Attack Data repository](https://github.com/splunk/attack_data) 的精选攻击模拟日志（Apache License 2.0）。 **使用的数据集**（来自 `atomic_red_team` 子文件夹）： | 技术 | 名称 | 原始日志来源 | |-----------|------|-----------------| | **T1059.001** | Command and Scripting Interpreter: PowerShell | `windows-sysmon`, `windows-powershell` | | **T1003.001** | OS Credential Dumping (LSASS) | `windows-sysmon`, `crowdstrike_falcon` | | **T1003.003** | OS Credential Dumping (NTDS) | `windows-sysmon`, `crowdstrike_falcon` | | **T1547.001** | Boot or Logon Autostart: Registry Run Keys | `windows-sysmon` | **许可证合规性** - © Splunk Inc. (Apache 2.0)。与 Splunk、CrowdStrike、SentinelOne 或 MITRE 无任何隶属关系。 - 数据仅用于教育和研究目的。 - 完整许可证：[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)。 **原始日志未被提交。** 当从头重新构建 pipeline 时，请在本地将它们下载到 `data/raw//`（例如 `data/raw/T1059.001/windows-sysmon.txt`）。 ## 🏗️ 架构与设计选择 ``` flowchart TB subgraph Client["Client Layer"] U[Analyst / Demo user] UI[Streamlit UI
app.py] end subgraph Ingest["Ingestion Layer — pipeline.py + loaders/"] RAW[Sysmon · PowerShell · Falcon logs] LOAD[Per-source parsers
sysmon · powershell · falcon] SCHEMA[Unified schema
schema.py] end subgraph Features["Feature Layer — features/ (optional)"] FEAT[Cmdline & parent-child signals
technique-specific flags] end subgraph Chains["Chain Layer — chain_detection.py · chain_refine.py"] LINK[Union-find linking
parent-child · GUID · time windows] RULES[MITRE technique rules
confidence 0–100] BENIGN[Benign-root filtering] EXP[Human-readable explanations] end subgraph Polish["Polish Layer — chain_polish.py"] GATE[Confidence gating
≥40% · drop benign-root] TACTIC[Tactic mapping · summary tables] end subgraph Artifacts["Artifact Layer — data/processed/"] CSV[Staged CSV artifacts
chains · events · polished demo] end subgraph Guard["Safety Layer — app.py"] VAL[Upload validation
size · schema · parse errors] TRIAGE[Filters · timeline · CSV export] end U --> UI UI --> VAL VAL -->|sample / valid upload| TRIAGE VAL -->|invalid / empty| UI RAW --> LOAD --> SCHEMA --> FEAT FEAT --> LINK --> RULES RULES --> BENIGN --> EXP EXP --> GATE --> TACTIC --> CSV CSV --> UI TRIAGE --> UI ``` **Pipeline 摘要：** 原始日志（Sysmon、PowerShell、CrowdStrike Falcon）将被加载并标准化（[`src/pipeline.py`](src/pipeline.py)、[`src/loaders/`](src/loaders/)）为统一的 schema（`timestamp`、`process_path`、`parent_process`、`cmdline`、`technique_id` 等）。可选的特征工程（[`src/features/`](src/features/)）通过 cmdline 和父子进程信号丰富事件内容。**攻击链检测**（[`src/chain_detection.py`](src/chain_detection.py)）通过 union-find 和时间邻近性对事件进行分组。**优化**（[`src/chain_refine.py`](src/chain_refine.py)）应用了 GUID 优先链接、技术规则、置信度评分、良性根过滤和说明。**润色**（[`src/chain_polish.py`](src/chain_polish.py)）为 UI 设置攻击链门槛并构建汇总表。**Streamlit** 仪表板（[`app.py`](app.py)）加载优化后的攻击链或上传的 CSV，按置信度/长度/战术进行过滤，并渲染交互式时间轴。 **核心设计决策** | 决策 | 依据 | |----------|-----------| | 攻击链级别分析 | 技术和战术是从多事件序列中推断出来的——而不是孤立的行——这与 EDR 剧情展示攻击的方式一致 | | 基于规则的置信度 | 带有通俗语言说明的可解释评分（基础分 + 序列加分）；避免了在重叠的模拟数据上使用黑盒 ML | | 置信度门槛 | 默认过滤器（≥40% 置信度，仅限多事件）展示可供分析师操作的剧情，并减少 Splunk/Windows 的噪音 | | 良性根过滤 | 扎根于良性进程（如 svchost、splunkd）且没有可疑指标的攻击链将从润色视图中排除 | | 可重复性 | Pipeline 将分阶段的 CSV 写入 `data/processed/`；仅将润色后的演示输出提交至 GitHub | **Pipeline 输出**（`data/processed/` — 仅提交润色后的文件）： | 阶段 | 事件 | 汇总 | |-------|--------|---------| | 攻击链检测 | `events_with_chains.csv` | `chains_summary.csv` | | 优化后 | `events_with_chains_refined.csv` | `chains_summary_refined.csv` | | 润色后（UI 可用，**已提交**） | `events_with_chains_polished.csv` | `chains_summary_polished.csv` | ### 开发历程最初探索了基于事件规则的分类和 ML 分类方法（RandomForest、集成方法、SMOTE）。虽然捕获到了一些信号，但由于 Atomic Red Team 数据中严重的事件重叠和 PowerShell 的主导地位，准确率遇到了瓶颈。随后转向了进程链检测和时间轴可视化——这是一种更实用、更契合行业需求的解决方案，能更好地展示有意义的多阶段攻击序列（执行 → 凭据访问 → 持久化），并借鉴了 SentinelOne Storyline 和 CrowdStrike Threat Graph 等工具的理念。早期的分类实验保留在 [`archived/`](archived/) 中。 ``` flowchart LR A[Multi-source log ingest
Sysmon · PowerShell · Falcon] --> B[Per-event ML experiments
rules · RF · SMOTE] B --> C[Pivot to chain detection
union-find · time windows] C --> D[Refinement
confidence · benign filter · explanations] D --> E[Polish & tactic mapping
UI-ready summaries] E --> F[Streamlit UI
dark theme · filters · timeline] F --> G[Upload validation
polished demo CSVs] G --> H[Streamlit Cloud deploy
MVP live] H --> I[GitHub Actions tests
offline pytest] ``` ## 🛡️ 安全考量 | 原则 | 实现方式 | |-----------|----------------| | **仅用于模拟 — 无实时执行** | 使用静态的 Splunk Attack Data / 上传的 CSV；无 agent、网络回调或 payload 执行 | | **辅助分析师判断，而非取代 EDR** | 置信度分数和解释是分诊辅助工具；该工具在生产环境中不会阻断、报警或执行策略 | | **验证不受信任的上传** | 文件大小限制（建议 ~50 MB）、解析错误处理、schema 检查，并在失败时回退到内置的演示数据 | | **保护敏感遥测数据** | 未经授权，请勿将生产环境 EDR/SIEM 导出文件上传到公开的 Streamlit 部署；原始日志不会被提交到仓库 | ## 🔄 CI/CD **GitHub Actions** 会在每次推送到 `main` / `master` 以及发起 Pull Request 时运行： | 步骤 | 操作 | |------|--------| | **触发器** | 推送到 `main` / `master` 或发起 PR | | **环境** | `ubuntu-latest`, Python 3.11 | | **安装** | `pip install -r requirements-dev.txt` | | **测试** | `pytest tests/ -q` | 工作流文件：[`.github/workflows/tests.yml`](.github/workflows/tests.yml) 当 `data/raw/` 或中间处理文件不存在时，Loader 和特征测试会跳过；攻击链检测的冒烟测试在没有原始日志的情况下离线运行。**Streamlit Cloud** 在连接到此仓库时，会独立从 `main` 分支进行部署（`app.py` + `requirements.txt`）。 ## 📈 项目状态与构建日志 | 步骤 | 重点 | 状态 | |------|-------|------| | **1 — 数据** | 加载并统一 Sysmon、PowerShell 和 Falcon 日志；统一 schema | ✅ | | **2 — 特征** | Cmdline 模式、父子进程链接、特定技术信号 | ✅ | | **3 — 转型** | 归档基于事件的 ML；转向攻击链级别的检测 | ✅ | | **4 — 攻击链** | 父子进程 + 时间邻近性分组；技术规则 | ✅ | | **5 — 优化** | 置信度评分、良性过滤、说明 | ✅ | | **6 — 润色** | 战术颜色、汇总表、多事件门槛 | ✅ | | **7 — UI** | Streamlit 仪表板：深色主题、过滤器、时间轴、导出 | ✅ | | **8 — 部署** | 上传验证、可读性修复、Streamlit Cloud | ✅ | | **9 — CI** | GitHub Actions pytest 工作流 | ✅ | **当前状态：** ✅ MVP 完成 — 已在 Streamlit Cloud 上线，提交了润色后的演示数据，支持 CSV 上传/导出，并启用了 CI。 ## 📁 仓库结构 ``` mitre-attack-chain-visualizer/ ├── app.py # Streamlit dashboard (Streamlit Cloud entry point) ├── requirements.txt # Runtime dependencies ├── requirements-dev.txt # Dev deps (pytest); local & CI only ├── LICENSE # MIT License ├── README.md # Project overview ├── .gitignore # Raw logs, .venv, intermediate processed outputs ├── .github/ │ └── workflows/tests.yml # GitHub Actions pytest on push/PR to main ├── docs/ │ └── screenshots/ # README demo images ├── data/ │ ├── raw/ # Local only (gitignored) — Splunk Attack Data logs │ └── processed/ # Polished demo CSVs committed; rest gitignored ├── src/ │ ├── pipeline.py # Load & normalize raw logs │ ├── schema.py # Unified event schema │ ├── config.py # Paths, technique IDs, defaults │ ├── chain_detection.py # Union-find chain building │ ├── chain_refine.py # MITRE rules, confidence, explanations │ ├── chain_polish.py # UI-ready polish & gating │ ├── timeline_viz_helpers.py # Plotly timeline helpers (used by app.py) │ ├── features/ # Optional feature enrichment │ └── loaders/ # Sysmon, Falcon, PowerShell parsers ├── archived/ # Per-event ML experiments (source only) └── tests/ # pytest suite (chain smoke tests run offline) ``` ## 📄 许可证 **MIT License** — 请参阅 [LICENSE](LICENSE)。数据集的署名和许可证（Splunk Attack Data，Apache 2.0）已在[数据来源与版权说明](#data-sources)中描述。数据仅用于**教育和研究目的**。与 Splunk、CrowdStrike、SentinelOne 或 MITRE 无任何隶属关系。 ## 🤝 联系方式 / 后续计划欢迎提供反馈、建议以及与目标一致的合作。 **未来的潜在方向** *（不承诺时间表）*： - 用于 SIEM 集成的 STIX/TAXII 导出 - 用于大规模攻击链查询的图数据库后端 - LLM 辅助的攻击链摘要（带防护措施） - 更多的 Atomic Red Team 技术和数据源 - 用于可重现的本地 + 云部署的 Docker 镜像

_{使用真实的 Splunk Atomic Red Team 遥测数据构建 · MITRE ATT&CK® 是 The MITRE Corporation 的注册商标}

标签：ATT&CK可视化, EDR数据分析, Kubernetes, Streamlit, 安全运营, 扫描框架, 攻击链还原, 访问控制, 逆向工具