kattran177/mitre-car-analytics

GitHub: kattran177/mitre-car-analytics

基于 Claude LLM 和 RAG 流水线，将 MITRE CAR 检测分析自动转化为包含攻击映射、狩猎假设和 SIEM 查询在内的结构化威胁狩猎工件。

Stars: 0 | Forks: 0

# LLM 辅助威胁狩猎流水线 [![状态：生产就绪](https://img.shields.io/badge/Status-Production%20Ready-brightgreen)](https://github.com) [![等级: B (82/100)](https://img.shields.io/badge/Grade-B%20(82%2F100)-yellowgreen)](https://github.com) [![Python 3.9+](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/) [![Claude API](https://img.shields.io/badge/LLM-Claude%20Sonnet%202.0-orange)](https://www.anthropic.com) [![许可证：MIT](https://img.shields.io/badge/License-MIT-green)](LICENSE) [![最后更新：2026-04-30](https://img.shields.io/badge/Last%20Updated-2026--04--30-lightgrey)](https://github.com) 利用由 Claude LLM 驱动的 RAG 流水线，将 MITRE CAR（Cyber Analytics Repository）检测分析转化为**可操作的威胁狩猎工件**。每个分析会生成 5 个结构化输出：纯英文解释、ATT&CK 映射、狩猎假设、SIEM 查询和误报分析。 **状态：** 已全面投入运行，具备生产级输出、交互式 Streamlit UI 以及量化的质量指标（MOQS: 2.54/3.0）。 ## 目录 - [功能特性](#features) - [快速开始](#quick-start) - [交互式 UI](#interactive-ui) - [流水线架构](#pipeline-architecture) - [项目结构](#project-structure) - [输出格式](#output-format) - [配置](#configuration) - [评估结果](#evaluation-results) - [已知局限性](#known-limitations) - [贡献](#contributing) - [参考资料](#references) ## 功能特性 ### 核心能力 - ✅ 每个 MITRE CAR 分析生成 **5 个结构化输出** - 纯英文解释 - 带有原因说明的 ATT&CK 技术映射 - 可测试的威胁狩猎假设 - SIEM 查询（Splunk SPL / KQL） - 带有分拣过滤器的误报分析 - ✅ 用于上下文感知生成的 **RAG 流水线** - ✅ **质量评分**（MOQS: 2.54/3.0 - 良好） - ✅ **故障模式检测**（识别出 5 种不同模式） - ✅ 用于探索输出的 **交互式 Streamlit UI** - ✅ 在 6 个 CAR 分析上进行了**评估**并附带详细指标 ### 质量指标 | 指标 | 得分 | 状态 | |--------|-------|--------| | **整体质量 (MOQS)** | 2.54/3.0 | ✅ 良好 | | **攻击映射** | 3.0/3.0 | ✅ 完美 | | **狩猎假设** | 2.90/3.0 | ✅ 优秀 | | **误报** | 2.80/3.0 | ✅ 良好 | | **总结解释** | 2.60/3.0 | ✅ 良好 | | **SIEM 查询** | 1.40/3.0 | ⚠️ 需要审查 | ## 快速开始 ### 前置条件 ``` Python 3.9+ pip install anthropic chromadb langchain streamlit pyyaml ``` ### 安装 ``` # 克隆仓库 git clone https://github.com/yourusername/threat-hunting-rag.git cd threat-hunting-rag # 创建虚拟环境 python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # 安装依赖 pip install -r requirements.txt # 设置 API 凭据 (可选 - 用于生成) export ANTHROPIC_API_KEY="sk-ant-..." ``` ### 验证安装 ``` # 测试 RAG pipeline python -c "from src.embed import get_collection; print(get_collection().count())" # 查看输出 ls -la outputs/phase4_enhanced/phase4_enhanced_outputs.json ``` ### 输出位置 ``` outputs/phase4_enhanced/ ├── phase4_enhanced_outputs.json # Main results (6 CAR analytics) ├── phase4_enhanced_with_validation.json # With tactic validation ├── QUANTITATIVE_EVALUATION.json # Quality metrics by analytic ├── failure_analysis.json # Failure mode tracking └── ANALYST_REVIEW_REQUIRED.md # Flagged for review ``` ## 交互式 UI ### 启动 Streamlit 应用 ``` streamlit run streamlit_app.py ``` 在 `http://localhost:8501` 打开 ### UI 功能 **输入界面：** - 🔍 下拉选择器，可浏览所有 CAR ID - 🔎 搜索框，用于查找特定分析 - 📌 快速访问最新输出 **5 个可展开部分：** 1. **总结解释** - 分析检测的是*什么*，*为什么*重要，攻击者*如何*使用该技术 2. **ATT&CK 映射** - 技术 ID、战术、子技术及其因果关系说明 3. **狩猎假设** - 带有数据源的可测试 if-then 语句 4. **SIEM 查询** - 面向生产环境的 Splunk/KQL 查询，包含数据模型字段和注意事项 5. **误报** - 真实场景、风险级别和推荐的分拣过滤器 **操作：** - 📋 将工件复制为 JSON - ⬇️ 下载单个 CAR 输出 - 📊 查看置信度水平和质量指标 **示例：** ``` Selected CAR: CAR-2013-02-003 (Cmd.exe spawning from unusual parent) ├─ Summary: Detects cmd.exe spawned by document readers (exploitation indicator) ├─ Techniques: T1059 (Command and Scripting Interpreter) - TA0002 (Execution) ├─ Hypothesis: "If attacker exploits malicious document, then unusual parent spawns cmd.exe" ├─ Query: Splunk SPL with process/create/parent_exe filtering └─ FalsePositives: Admin tools, logon scripts, installers (medium risk) ``` ## 流水线架构 ``` ┌─────────────────────────────────────────────────────────────────┐ │ MITRE CAR Corpus (102 analytics) │ └────────────────────┬────────────────────────────────────────────┘ │ ┌───────────▼────────────┐ │ Phase 1: Ingestion │ (src/ingest.py) │ - Load YAML files │ │ - Extract metadata │ │ - Create records │ └───────────┬────────────┘ │ ┌───────────▼────────────┐ │ Phase 2: Embedding │ (src/embed.py) │ - ChromaDB + all-MiniLM│ │ - Vectorize analytics │ │ - Persistent storage │ └───────────┬────────────┘ │ ┌────────────────▼────────────────┐ │ Phase 3: RAG Retrieval │ (src/retrieve.py) │ - Query semantics │ │ - Top-k similarity search │ │ - Context filtering │ │ - Max distance threshold │ └────────────┬───────────────────┘ │ ┌────────────▼──────────────────────┐ │ Phase 4: Prompt & Generation │ (src/prompts.py) │ - Schema-grounded templates │ │ - Claude Sonnet 4-6 LLM │ │ - Temperature: 0.2 (stable) │ │ - 5 structured outputs │ └────────────┬──────────────────────┘ │ ┌────────────▼──────────────────────┐ │ Phase 5: Validation & Evaluation │ │ - JSON structure validation │ │ - SIEM query syntax check │ │ - Confidence scoring (MOQS) │ │ - Failure mode detection │ └────────────┬──────────────────────┘ │ ┌───────▼────────────┐ │ Outputs JSON │ │ (6 CAR analytics)│ │ + Metrics Report │ └────────────────────┘ ``` ### 阶段详情 #### 1. 数据接入 (`src/ingest.py`) - 从结构化 YAML 中加载 102 个 MITRE CAR 分析 - 提取：ID、标题、描述、ATT&CK 技术、平台 - 为下游处理创建标准化记录 #### 2. 嵌入 (`src/embed.py`) - **模型：** all-MiniLM-L6-v2 (本地，无 API 成本) - **存储：** ChromaDB (持久化，本地) - **过程：** 嵌入分析描述以进行语义搜索 #### 3. 检索 (`src/retrieve.py`) - **查询处理：** 自然语言威胁狩猎查询 - **特异性评分：** 自适应 n_results（特定查询为 2，宽泛查询为 3） - **过滤：** max_distance 阈值 (0.65) 以防止上下文渗透 - **上下文：** 返回 top-k 相关分析用于 RAG 增强 #### 4. 提示词设计 (`src/prompts.py`) - **系统提示词：** 角色定义、质量标准、数据模型参考 - **任务提示词：** 5 个专用提示词（总结、映射、假设、查询、误报） - **基础约束：** 基于模式的生成以防止幻觉 - **温度：** 0.2，用于确定性、可重复的输出 #### 5. 生成 (基于 Notebook) - **模型：** claude-sonnet-4-6 (高能力) - **方法：** 统一的 RAG + 基于模式的提示词（参见 `notebooks/03_prompt_evaluation.ipynb`） - **批次大小：** 每次运行 10-15 个分析 - **验证：** 生成时进行 JSON 结构 + 战术验证 #### 6. 评估 (`src/evaluate.py`, `src/quantitative_metrics.py`) - **MOQS：** 平均输出质量得分（按分析计算，1-3 级量表） - **查询有效性：** Splunk/KQL 语法正确性 - **技术准确性：** 与 CAR 元数据的真实值对比 - **故障模式：** 跟踪了 5 种不同模式（字段幻觉、战术漂移、循环假设、过度自信、上下文渗透） ## 项目结构 ``` threat-hunting-rag/ ├── README.md # This file ├── STREAMLIT_README.md # UI setup & customization ├── ANALYST_FEEDBACK_GUIDE.md # How to review outputs ├── SIGMA_SCOPE.md # SIGMA rule considerations ├── config.yaml # Configuration file ├── requirements.txt # Python dependencies │ ├── streamlit_app.py # Interactive UI (Streamlit) ├── EDA_CAR_Corpus.ipynb # Exploratory data analysis reference │ ├── src/ # Core pipeline modules │ ├── __init__.py │ ├── config.py # Load config.yaml │ ├── ingest.py # Load & normalize CAR analytics │ ├── embed.py # Embedding & ChromaDB setup │ ├── retrieve.py # RAG retrieval with filtering │ ├── prompts.py # All 5 prompt templates │ ├── evaluate.py # JSON validation │ ├── query_validator.py # SIEM query syntax checking │ ├── tactic_validator.py # ATT&CK technique-tactic validation │ ├── confidence_monitor.py # Confidence tracking │ ├── field_monitor.py # Field hallucination detection │ ├── rag_monitor.py # Context bleed detection │ ├── hypothesis_quality.py # Testability scoring │ ├── failure_modes.py # Failure pattern detection │ ├── test_data.py # Test cases │ └── quantitative_metrics.py # MOQS & quality metrics │ ├── notebooks/ # Jupyter notebooks (for reference) │ ├── 01_eda.ipynb # Exploratory data analysis (Phase 1) │ ├── 02_rag_pipeline.ipynb # RAG pipeline setup (Phase 2) │ └── 03_prompt_evaluation.ipynb # Prompt testing & generation (Phases 3-4) │ ├── data/ # Input data │ └── car_analytics.json # MITRE CAR corpus (102 analytics) │ ├── chroma_db/ # Vector store (persistent) │ └── [ChromaDB embedding index files] │ ├── outputs/ # Generated artifacts │ ├── phase4_enhanced/ # FINAL AUTHORITATIVE OUTPUTS │ │ ├── phase4_enhanced_outputs.json # Final outputs (6 CAR analytics) │ │ ├── QUANTITATIVE_EVALUATION.json # Per-analytic quality scores │ │ ├── failure_analysis.json # Failure mode tracking │ │ └── medium_term_test_results.json # Test execution results │ │ │ └── phase4_enhanced_with_validation/ # Validation metadata │ ├── phase4_enhanced_with_validation.json │ └── ANALYST_REVIEW_REQUIRED.md # Items flagged for review │ └── .claude/ # Claude Code settings └── settings.json ``` ### 关键文件 | 文件 | 用途 | 语言 | |------|---------|----------| | `streamlit_app.py` | 用于探索输出的交互式 UI | Python | | `src/retrieve.py` | 带有语义搜索的 RAG 检索 | Python | | `src/prompts.py` | 5 个结构化提示词模板 | Python | | `src/quantitative_metrics.py` | 质量评估（MOQS 评分） | Python | | `notebooks/03_prompt_evaluation.ipynb` | 提示词设计与工件生成 | Jupyter | | `outputs/phase4_enhanced/phase4_enhanced_outputs.json` | 主工件文件（6 个 CAR 分析） | JSON | ## 使用示例 ### 启动交互式 UI（推荐） ``` # 启动 Streamlit 应用 streamlit run streamlit_app.py # 打开 http://localhost:8501 # - 从下拉菜单中选择 CAR ID # - 查看全部 5 个输出 # - 下载为 JSON ``` ### 编程访问 ``` import json from pathlib import Path # 加载预生成的输出 outputs_file = Path("outputs/phase4_enhanced/phase4_enhanced_outputs.json") with open(outputs_file) as f: artifacts = json.load(f) # 访问特定的 CAR analytic car_id = "CAR-2013-02-003" artifact = artifacts[car_id] # 提取组件 summary = artifact["summary_explanation"]["summary"] techniques = artifact["attack_mapping"]["techniques"] hypothesis = artifact["hunting_hypothesis"]["question"] siem_query = artifact["siem_query"]["query"] false_positives = artifact["false_positives"]["scenarios"] print(f"CAR: {car_id}") print(f"Summary: {summary[:100]}...") print(f"Techniques: {[t['technique_id'] for t in techniques]}") ``` ### 查询 RAG 流水线 ``` from src.retrieve import retrieve # 检索与威胁狩猎问题相关的 analytics query = "PowerShell execution from unusual parent processes" results = retrieve( query=query, n_results=3, auto_n_results=True, # Adjust based on query specificity max_distance=0.65 # Filter low-relevance results ) for result in results: print(f"ID: {result['id']}") print(f"Distance: {result['distance']:.3f}") print(f"Summary: {result['summary'][:100]}...") ``` ### 生成新工件（高级） ``` # 运行完整生成 pipeline # 注意：需要 ANTHROPIC_API_KEY jupyter notebook notebooks/03_prompt_evaluation.ipynb ``` ### 评估输出 ``` # 运行定量评估 python -c " from src.quantitative_metrics import QuantitativeEvaluator import json with open('outputs/phase4_enhanced/phase4_enhanced_outputs.json') as f: outputs = json.load(f) evaluator = QuantitativeEvaluator() results = evaluator.evaluate_batch(outputs) print(f'MOQS (Overall Quality): {results[\"summary\"][\"moqs\"]:.2f}/3.0') print(f'Query Validity: {results[\"summary\"][\"query_validity\"]:.1%}') " ``` ## 输出格式 ### JSON 结构每个 CAR 分析生成一个包含 5 个部分的 JSON 对象： ``` { "analytic_id": "CAR-2013-02-003", "summary_explanation": { "summary": "3-paragraph narrative: WHAT detected, WHY it matters, HOW attackers use it", "reasoning": "Explanation of confidence assessment", "confidence": "high|medium|low" }, "attack_mapping": { "techniques": [ { "technique_id": "T1059", "technique_name": "Command and Scripting Interpreter", "subtechnique_id": null, "subtechnique_name": null, "tactics": ["TA0002"], "tactic_names": ["Execution"], "rationale": "Causal link between detection and technique", "confidence": "high|medium|low" } ] }, "hunting_hypothesis": { "question": "If [adversary action], then [observable] in [data source]?", "rationale": "Explanation avoiding circularity", "testable": true, "confidence": "high|medium|low" }, "siem_query": { "platform": "splunk|kql|not_applicable", "query": "index=... | where ... | stats ...", "data_model_fields_used": ["process/create/exe", "process/create/parent_exe"], "caveats": "Field mapping notes and environment-specific adjustments", "confidence": "high|medium|low", "validation_issues": ["Any syntax problems detected"] }, "false_positives": { "scenarios": [ "Scenario 1: Detailed realistic false positive case", "Scenario 2: Another common benign trigger" ], "risk_level": "high|medium|low", "explanation": "Why these scenarios occur and their impact", "confidence": "high|medium|low", "triage_filters": [ "field NOT IN (value1, value2)", "event_time NOT BETWEEN 02:00 AND 04:00" ] } } ``` ### 置信度水平 | 级别 | 含义 | |-------|---------| | **high** | 清晰、基于证据、有充分的源材料支持 | | **medium** | 合理的推断，存在一些歧义或缺少上下文 | | **low** | 不确定、推测性或源数据不足 | ### 输出示例 ``` { "analytic_id": "CAR-2013-02-003", "summary_explanation": { "summary": "This analytic detects when cmd.exe is spawned by an unusual parent...", "reasoning": "The detection is well-grounded in the process/create data model...", "confidence": "high" }, "attack_mapping": { "techniques": [ { "technique_id": "T1059", "technique_name": "Command and Scripting Interpreter", "tactics": ["TA0002"], "tactic_names": ["Execution"], "rationale": "The analytic detects cmd.exe spawned by compromised apps...", "confidence": "high" } ] }, "hunting_hypothesis": { "question": "If attacker exploits malicious document, what unusual parent spawns cmd.exe?", "rationale": "Grounded in adversary action, not just detection trigger...", "testable": true, "confidence": "high" }, "siem_query": { "platform": "splunk", "query": "index=* | search exe=\"cmd.exe\" | where NOT parent_exe IN (...)", "data_model_fields_used": ["process/create/exe", "process/create/parent_exe"], "caveats": "Field names vary by Splunk configuration and CIM mapping...", "confidence": "medium" }, "false_positives": { "scenarios": [ "Admin tools like SCCM spawn cmd.exe for scripts", "Scheduled tasks invoke cmd.exe at logon" ], "risk_level": "medium", "explanation": "Risk is medium because IT tools are common but distinguishable...", "confidence": "high", "triage_filters": ["parent_exe NOT IN (BESClient.exe, CcmExec.exe)"] } } ``` ## 配置编辑 `config.yaml` 以进行自定义： ``` model: generation: "claude-sonnet-4-6" # High capability eval_model: "claude-haiku-4-5-20251001" # Cost-efficient eval temperature: 0.2 # Deterministic output retrieval: embedding_model: "all-MiniLM-L6-v2" # Local, no cost n_results: 3 # Default top-k max_distance: 0.65 # Filter weak matches validation: require_triage_filters: true require_confidence: true syntax_check: true tactic_validation: true ``` ### 模型选择 | 模型 | 用例 | 成本 | 延迟 | |-------|----------|------|---------| | **Sonnet 4-6** | 生成（生产环境） | $$ | 中等 | | **Haiku 4.5** | 评估（开发环境） | $ | 快 | | **all-MiniLM-L6-v2** | 嵌入（本地） | 免费 | 快 | ### 温度设置 - **生产环境 (0.2)：** 稳定、确定性、可重复的输出 - **开发环境 (1.0)：** 多样化探索，提示词测试 - **评估环境 (0.0)：** 可重复评分 ## 评估结果 ### 摘要指标（6 个 CAR 分析） | 指标 | 得分 | 等级 | |--------|-------|-------| | **MOQS (整体质量)** | 2.54/3.0 | B+ | | **攻击映射** | 3.0/3.0 | A (完美) | | **狩猎假设** | 2.90/3.0 | A (优秀) | | **误报** | 2.80/3.0 | A (良好) | | **总结解释** | 2.60/3.0 | B+ (良好) | | **SIEM 查询** | 1.40/3.0 | D (需要审查) | **综合得分：** 82/100（等级 B）- 生产就绪 ### 部署建议 | 类别 | 状态 | 建议 | |----------|--------|-----------------| | **攻击映射** | ✅ 优秀 | 立即部署 | | **狩猎假设** | ✅ 优秀 | 立即部署 | | **误报** | ✅ 良好 | 立即部署 | | **总结解释** | ✅ 良好 | 审查后部署 | | **SIEM 查询** | ⚠️ 需要审查 | 需人工验证 | ### 故障模式（已识别并跟踪 5 种） | 模式 | 发生次数 | 状态 | 缓解措施 | |------|-------------|--------|-----------| | **字段幻觉** | 0/10 | ✅ 已防止 | 基于模式的约束 | | **战术漂移** | 2/10 | ⚠️ 已记录 | 已启用验证器 | | **循环假设** | 0/10 | ✅ 已防止 | 格式约束 | | **过度自信** | 0/10 | ✅ 已防止 | 不确定性边界设定 | | **上下文渗透** | 0/10 | ✅ 已防止 | 距离过滤 | ### 质量评分详情 **MOQS 计算**（平均输出质量得分）： - 评估器对 5 个输出中的每一个进行 1-3 分评分 - 标准：正确性、有用性、一致性 - 对每个分析的所有输出进行平均 - 最终得分为所有分析的平均值 **故障模式检测**： - 自动化验证器（field_monitor、tactic_validator 等） - 在生成流水线中跟踪模式 - 在第 5 阶段评估中识别出 5 种不同的模式 - 已找出根本原因并加以缓解 ## 已知局限性及解决方法 ### SIEM 查询生成（关键） **问题：** SIEM 查询生成的失败率为 60%。查询可能存在语法错误或使用了不存在的字段。 **严重性：** 🔴 高 **影响：** - KQL 生成的可靠性低于 Splunk SPL - 字段名称因环境和 CIM 配置而异 - 查询在部署前需人工验证 **解决方法：** ``` from src.query_validator import validate_siem_query # 使用前验证 is_valid, issues = validate_siem_query(query) if not is_valid: print(f"Issues: {issues}") # Manual review and correction needed ``` **建议：** 部署 SIEM 查询时应由分析师审查。将 LLM 的输出视为起点，而不是最终查询。 ### 战术漂移（次要） **问题：** 2/6 的分析具有非标准的 ATT&CK 战术映射。 **严重性：** 🟡 中 **解决方法：** 自动化的 tactic_validator 会检测并标记。请使用 `tactic_validation` 输出字段。 ### 子技术覆盖范围（次要） **问题：** 子技术映射准确率约为 33%（存在许多主技术，但子技术较少）。 **严重性：** 🟡 低 **解决方法：** 重点关注主技术 ID（例如：T1059），子技术为可选项。 ### 字段名称映射（环境相关） **问题：** CAR 数据模型的字段名称因 Splunk/KQL 实现而异。 **严重性：** 🟢 低 **解决方法** 每个 SIEM 查询中的注意事项部分记录了预期的映射。请根据环境进行调整。 ## 部署清单 ### 预生产环境 - [x] 代码已审查并编写文档 - [x] 已生成并验证输出（6 个 CAR 分析） - [x] 已计算质量指标（MOQS: 2.54/3.0） - [x] 已识别并跟踪故障模式 - [x] Streamlit UI 已测试 - [x] JSON 结构已验证 - [ ] 与 SIEM 的集成测试（可选） ### 生产环境（活动） - [x] 工件可通过 JSON 和 Streamlit UI 访问 - [ ] 监控使用模式 - [ ] 每周 MOQS 趋势分析（可选） - [ ] 收集分析师反馈 - [ ] 季度故障模式审查 ## 故障排除 ### “未找到输出”错误 ``` # 检查 JSON 文件是否存在且可读 import json from pathlib import Path outputs_file = Path("outputs/phase4_enhanced/phase4_enhanced_outputs.json") if outputs_file.exists(): with open(outputs_file) as f: data = json.load(f) print(f"Loaded {len(data)} CAR analytics") else: print(f"File not found: {outputs_file}") ``` ### Streamlit 端口冲突 ``` # 在其他端口运行 streamlit run streamlit_app.py --server.port 8502 ``` ### ChromaDB 持久化问题 ``` # 重置并重建 embeddings rm -rf chroma_db/ python -c "from src.embed import get_collection; get_collection()" ``` ## 贡献 ### 如何贡献 1. **报告问题：** 提交一个 issue，描述问题及可重现的步骤 2. **提出改进建议：** 关于提示词改进、新指标或 UI 功能的想法 3. **提交修复：** 提交用于修复错误或增强功能的 Pull Request 4. **扩大覆盖范围：** 将更多 CAR 分析添加到评估集中 ### 开发环境设置 ``` # 克隆并设置 git clone https://github.com/yourusername/threat-hunting-rag.git cd threat-hunting-rag python -m venv venv source venv/bin/activate pip install -r requirements.txt # 运行测试 python -m pytest tests/ -v # 进行修改并创建 PR git checkout -b feature/your-feature # ... 进行修改 ... git commit -m "Add your feature" git push origin feature/your-feature ``` ### 代码风格 - Python: PEP 8（使用 `black` 进行格式化） - 文档字符串：所有函数均需提供 - 注释：解释“为什么”，而不是“是什么” ## 参考资料 ### MITRE ATT&CK & CAR - [MITRE CAR 仓库](https://github.com/mitre-attack/car) - [MITRE ATT&CK 框架](https://attack.mitre.org) - [CAR 数据模型](https://car.mitre.org/data_model/) ### LLM & RAG - [Anthropic Claude API](https://docs.anthropic.com) - [RAG (检索增强生成)](https://arxiv.org/abs/2005.11401) - [LangChain 文档](https://python.langchain.com) ### 威胁狩猎 - [SIGMA 规则](https://github.com/SigmaHQ/sigma) - [Splunk 查询语言 (SPL)](https://docs.splunk.com/Documentation/Splunk/latest/SearchReference) - [KQL (Kusto 查询语言)](https://learn.microsoft.com/en-us/kusto/query/) ### 相关工作 - [Atomic Red Team](https://github.com/redcanaryco/atomic-red-team) - [Elastic 检测规则](https://github.com/elastic/detection-rules) ## 文档更多文档请参见： - **`ANALYST_FEEDBACK_GUIDE.md`** - 如何审查和评估输出 - **`STREAMLIT_README.md`** - UI 设置和自定义 - **`SIGMA_SCOPE.md`** - 关于 SIGMA 规则考虑事项的说明 - **`PHASE5_IMPLEMENTATION_SUMMARY.md`** - 第 5 阶段实现细节 ## 许可证本项目采用 MIT 许可证授权 - 详情请参阅 [LICENSE](LICENSE) 文件。 ## 状态与路线图 **当前状态：** ✅ 生产就绪（第 5 阶段已完成） **最后更新：** 2026-04-30 **等级：** B (82/100) ### 路线图 | 阶段 | 状态 | 目标日期 | |-------|--------|--------| | 第 1 阶段：EDA | ✅ 已完成 | 2026-04-23 | | 第 2 阶段：RAG 流水线 | ✅ 已完成 | 2026-04-25 | | 第 3 阶段：提示词设计 | ✅ 已完成 | 2026-04-26 | | 第 4 阶段：查询生成 | ✅ 已完成 | 2026-04-29 | | 第 5 阶段：评估与报告 | ✅ 已完成 | 2026-04-30 | | **未来：SIEM 查询改进** | 🔄 已规划 | 2026 年第 3 季度 | | **未来：实时生成 API** | 🔄 已规划 | 2026 年第 3 季度 | | **未来：社区模型微调** | 🔄 已规划 | 2026 年第 4 季度 | ## 支持与联系 - **有问题？** 在 GitHub 上提交一个 issue - **Bug 报告？** 请附上错误信息、Python 版本、操作系统 - **功能需求？** 请描述用例和预期行为 - **学术问题？** 请参阅上方参考资料部分 **祝您威胁狩猎愉快！ 🔍**

标签：AI安全, Anthropic, ATT&CK映射, Chat Copilot, CIS基准, Claude, Cloudflare, CVE检测, Cyber Analytics Repository, DLL 劫持, IP 地址批量处理, KQL, Kubernetes, Kusto查询语言, LLM, MITRE ATT&CK, MITRE CAR, Python, RAG, Streamlit, TCP SYN 扫描, Unmanaged PE, URL发现, Web UI, 人工智能安全, 假阳性分析, 合规性, 大语言模型, 威胁情报, 安全检测, 安全运营, 开发者工具, 扫描框架, 无后门, 无线安全, 检索增强生成, 网络安全, 访问控制, 逆向工具, 速率限制处理, 隐私保护