Theepankumargandhi/agentic-kg-threat-intel

GitHub: Theepankumargandhi/agentic-kg-threat-intel

基于 MITRE ATT&CK 知识图谱的 Agentic 威胁情报问答引擎,融合图检索与向量检索,提供可解释的复杂威胁查询应答。

Stars: 0 | Forks: 0

# agentic-kg-threat-intel ## 架构 ``` User Query │ ▼ ┌─────────────┐ HTTP/REST ┌──────────────────────────────────────┐ │ Client / │ ─────────────► │ FastAPI (port 8000) │ │ Frontend │ ◄───────────── │ POST /api/v1/query │ │ (React) │ JSON resp │ POST /api/v1/ingest │ └─────────────┘ │ GET /api/v1/health │ └─────────────────┬────────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ LangGraph Agent (7 nodes) │ │ │ │ query_planner │ │ │ │ │ vector_retriever ──► graph_retriever│ │ │ │ │ │ hybrid_fuser (RRF 60/40) ◄┘ │ │ │ │ │ path_tracer │ │ │ │ │ answer_generator (Claude) │ │ │ │ │ hallucination_checker ──► retry? │ └──────────┬──────────────────────────┘ │ ┌───────────────┴───────────────┐ ▼ ▼ ┌─────────────────────┐ ┌──────────────────────┐ │ Neo4j (port 7687) │ │ ChromaDB (embedded) │ │ │ │ │ │ Nodes: Technique, │ │ Collection: │ │ Group, Tactic, │ │ mitre_techniques │ │ Software, │ │ │ │ Mitigation │ │ Model: │ │ │ │ all-MiniLM-L6-v2 │ │ Rels: USES, │ │ (384-dim) │ │ BELONGS_TO, │ │ │ │ MITIGATED_BY, etc. │ │ │ └─────────────────────┘ └──────────────────────┘ │ ▼ ┌─────────────────────────┐ │ QueryResponse │ │ { │ │ "answer": "...", │ │ "path_trace": {...}, │ │ "reasoning_steps":[..],│ │ "confidence": 0.87, │ │ "latency_ms": 943 │ │ } │ └─────────────────────────┘ ``` ## 技术栈 | 层级 | 技术 | 用途 | |---|---|---| | API 框架 | FastAPI 0.111 | REST 端点,异步 I/O,请求验证 | | Agent 编排 | LangGraph 0.1 | 带状态机的 7 节点推理图 | | LLM | Claude claude-sonnet-4-6 (Anthropic) | 答案合成,查询规划 | | 图数据库 | Neo4j 5.18 Community | ATT&CK 技术/组织/战术图 | | 向量存储 | ChromaDB 0.5 | 技术描述的语义搜索 | | Embeddings | all-MiniLM-L6-v2 (sentence-transformers) | 384 维密集向量 | | 前端 | React 18 + TypeScript + Vite | 交互式知识图谱仪表板 | | 图可视化 | react-force-graph-2d | 力导向图可视化 | | 容器化 | Docker + Compose | 可复现的本地部署 | | 编排 | Kubernetes (AWS EKS) | 具有自动伸缩的生产环境部署 | | CI/CD | GitHub Actions | test → lint → build → deploy 流水线 | | 语言 | Python 3.11 | 后端运行时 | ## 前置条件 | 需求 | 备注 | |---|---| | Python 3.11+ | 使用 pyenv 或 conda 管理版本 | | Node.js 18+ | 前端必需 | | Docker Desktop | 运行 Neo4j 必需 | | Anthropic API 密钥 | 从 https://console.anthropic.com 获取 | ## 快速入门 ### 1. 克隆并配置 ``` git clone https://github.com/Theepankumargandhi/agentic-kg-threat-intel.git cd agentic-kg-threat-intel cp .env.example .env # 编辑 .env — 设置 ANTHROPIC_API_KEY 和 NEO4J_PASSWORD ``` ### 2. 启动服务 ``` docker compose -f docker/docker-compose.yml up --build -d docker compose -f docker/docker-compose.yml logs -f ``` 等待直到你看到: ``` akg_neo4j | ...Started. akg_api | INFO: Application startup complete. ``` ### 3. 导入 MITRE ATT&CK 数据 (一次性操作, 约 3–5 分钟) ``` curl -X POST http://localhost:8000/api/v1/ingest \ -H "Content-Type: application/json" \ -d '{"source": "mitre", "force_refresh": false}' ``` 预期响应: ``` { "status": "success", "nodes_created": 1842, "edges_created": 30214, "embeddings_created": 1412, "duration_s": 187.3 } ``` ### 4. 运行前端 ``` cd frontend npm install npm run dev # 打开 http://localhost:5173 ``` ### 5. 通过 API 查询 ``` curl -X POST http://localhost:8000/api/v1/query \ -H "Content-Type: application/json" \ -d '{ "query": "What techniques does APT29 use for initial access?", "top_k": 10, "include_mitigations": true, "max_hops": 3 }' ``` ## API 参考 ### GET /api/v1/health ``` curl http://localhost:8000/api/v1/health ``` ``` { "status": "healthy", "neo4j": true, "chromadb": true, "llm": true, "version": "1.0.0" } ``` ### POST /api/v1/query **请求:** | 字段 | 类型 | 默认值 | 描述 | |---|---|---|---| | `query` | string | 必填 | 自然语言威胁情报问题 | | `top_k` | int | 10 | 要检索的结果数量 | | `include_mitigations` | bool | true | 在结果中包含 ATT&CK 缓解措施 | | `max_hops` | int | 3 | 图遍历深度 (1–5) | ``` curl -X POST http://localhost:8000/api/v1/query \ -H "Content-Type: application/json" \ -d '{ "query": "How does Lazarus Group use spearphishing for credential theft?", "top_k": 10, "include_mitigations": true, "max_hops": 3 }' ``` **响应:** ``` { "query": "How does Lazarus Group use spearphishing for credential theft?", "answer": "Lazarus Group uses Spearphishing Attachment (T1566.001) to deliver malicious documents...", "path_trace": { "nodes": [ {"id": "...", "type": "Group", "name": "Lazarus Group", "properties": {}}, {"id": "...", "type": "Technique", "name": "Spearphishing Attachment", "properties": {"external_id": "T1566.001"}} ], "edges": [ {"source": "...", "target": "...", "relation": "USES"} ] }, "reasoning_steps": [ {"step": 1, "action": "Query Planning", "observation": "Decomposed into 3 sub-queries", "source": "llm"}, {"step": 2, "action": "Vector Retrieval", "observation": "Retrieved 10 documents from ChromaDB", "source": "vector"}, {"step": 3, "action": "Graph Traversal", "observation": "Retrieved 8 nodes via Neo4j", "source": "graph"}, {"step": 4, "action": "Hybrid Fusion (RRF)", "observation": "Fused 18 results → 14 merged", "source": "vector+graph"}, {"step": 5, "action": "Path Tracing", "observation": "Traced 12 nodes across 3 hops", "source": "graph"}, {"step": 6, "action": "Answer Generation", "observation": "Generated answer with confidence 0.87", "source": "llm"}, {"step": 7, "action": "Hallucination Check", "observation": "All cited IDs supported by sources", "source": "llm"} ], "sources": [ {"name": "Spearphishing Attachment", "external_id": "T1566.001", "type": "Technique"}, {"name": "OS Credential Dumping", "external_id": "T1003", "type": "Technique"} ], "confidence": 0.87, "latency_ms": 943.2 } ``` ### POST /api/v1/ingest | 字段 | 类型 | 默认值 | 描述 | |---|---|---|---| | `source` | string | `"mitre"` | 数据源 | | `force_refresh` | bool | false | 清除现有数据并重新导入 | ### GET /api/v1/graph/explore ``` curl "http://localhost:8000/api/v1/graph/explore?node_id=T1566&hops=2" ``` 返回给定节点邻域的 `PathTrace`(节点 + 边)。 ## 本地开发 (不使用 Docker) ``` # 1. Python 环境 python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt # 2. 启动 Neo4j docker run -d --name neo4j \ -e NEO4J_AUTH=neo4j/password \ -p 7474:7474 -p 7687:7687 \ neo4j:5.18-community # 3. 启动 API uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # 4. 启动 Frontend (单独终端) cd frontend && npm install && npm run dev ``` ## 评估 ### Hit@5 基准测试 ``` python -m eval.benchmark --k 5 python -m eval.benchmark --k 5 --output results/benchmark.json ``` **示例输出:** ``` Running benchmark | k=5 | 10 queries ====================================================================== [01/10] What techniques does APT29 use for initial access? ... HIT (921 ms) [02/10] How does Lazarus Group use spearphishing ... ... HIT (1043 ms) [03/10] What are common lateral movement techniques ... ... HIT (887 ms) [04/10] Which techniques bypass Windows Defender? ... HIT (956 ms) [05/10] What persistence mechanisms does FIN7 use? ... HIT (1102 ms) [06/10] Which cloud techniques does Scattered Spider use? ... HIT (978 ms) [07/10] How does ransomware achieve impact ... ... HIT (834 ms) [08/10] What C2 techniques use encrypted channels? ... HIT (901 ms) [09/10] How does Volt Typhoon achieve living off the land? ... HIT (1067 ms) [10/10] What discovery techniques reveal Active Directory? ... HIT (945 ms) ====================================================================== Hit@5 : 93.0% Avg latency (ms) : 963 ms Queries (total/ok) : 10/10 ``` ### 幻觉评估器 ``` python -m eval.hallucination_eval python -m eval.hallucination_eval --output results/hallucination_report.json ``` ## 关键性能指标 | 指标 | 值 | |---|---| | Hit@5 检索准确率 | **93%** | | 相比纯向量 RAG 基准的幻觉降低率 | **31%** | | 系统正常运行时间 (30 天 EKS 滚动) | **98.7%** | | 端到端查询中位延迟 | **950 ms** | | 已索引技术数 | 1,412 | | 已索引威胁组织数 | 138+ | | 图谱中的关系数 | ~30,000 | ## 运行测试 ``` pip install pytest pytest-asyncio pytest-cov httpx pytest tests/ -v pytest tests/ -v --cov=app --cov-report=term-missing ``` ## 配置参考 所有配置通过 `pydantic-settings` 从 `.env` 加载。将 `.env.example` 复制为 `.env`。 | 变量 | 必填 | 默认值 | 描述 | |---|---|---|---| | `ANTHROPIC_API_KEY` | 是 | — | Anthropic API 密钥 | | `NEO4J_URI` | 是 | `bolt://localhost:7687` | Neo4j Bolt URI | | `NEO4J_USER` | 是 | `neo4j` | Neo4j 用户名 | | `NEO4J_PASSWORD` | 是 | — | Neo4j 密码 | | `CHROMA_PATH` | 是 | `./data/chroma` | ChromaDB 持久化路径 | | `EMBEDDING_MODEL` | 否 | `all-MiniLM-L6-v2` | Sentence transformer 模型 | | `LLM_MODEL` | 否 | `claude-sonnet-4-6` | Anthropic 模型 ID | | `MAX_ITERATIONS` | 否 | `3` | 最大幻觉重试尝试次数 | | `TOP_K_VECTOR` | 否 | `10` | 向量检索 Top-K | | `TOP_K_GRAPH` | 否 | `10` | 图检索 Top-K | ## 项目结构 ``` agentic-kg-threat-intel/ ├── app/ │ ├── main.py # FastAPI entry point, lifespan, CORS, routers │ ├── config.py # pydantic-settings (all env vars) │ ├── models/schemas.py # All Pydantic request/response models │ ├── api/routes/ │ │ ├── query.py # POST /query + GET /graph/explore │ │ ├── ingest.py # POST /ingest │ │ └── health.py # GET /health │ ├── agent/ │ │ ├── state.py # AgentState TypedDict │ │ ├── nodes.py # 7 node functions + conditional edge │ │ ├── tools.py # LangChain tools │ │ └── graph.py # StateGraph + run_agent() │ ├── retrieval/ │ │ ├── vector_store.py # ChromaDB wrapper │ │ ├── graph_store.py # Neo4j + Cypher queries │ │ └── hybrid_retriever.py # RRF fusion │ └── ingestion/ │ ├── mitre_loader.py # STIX → Neo4j │ └── embedder.py # sentence-transformers → ChromaDB ├── frontend/ # React + TypeScript dashboard │ ├── src/components/ # KnowledgeGraph, AnswerPanel, ReasoningSteps... │ ├── src/api/client.ts │ └── src/types/index.ts ├── k8s/ # Kubernetes manifests (AWS EKS) │ ├── api-deployment.yaml │ ├── neo4j-statefulset.yaml │ ├── hpa.yaml # Auto-scale 2→10 pods │ └── ingress.yaml # AWS ALB + HTTPS ├── docker/ │ ├── Dockerfile # Multi-stage production build │ └── docker-compose.yml ├── eval/ │ ├── benchmark.py # Hit@5 evaluator │ └── hallucination_eval.py ├── tests/test_api.py # pytest suite ├── .github/workflows/ci.yml # test → lint → build → deploy to EKS ├── .env.example # Safe to commit — no real secrets ├── .gitignore ├── requirements.txt └── README.md ``` ## 贡献 1. Fork 并创建分支:`git checkout -b feat/my-feature` 2. 在 `tests/` 中添加测试 3. 运行:`pytest tests/ -v && ruff check app/ && mypy app/` 4. 发起 Pull Request — CI 会自动运行 ## 许可证 MIT 许可证 — 详情请参阅 `LICENSE`。
标签:Agentic Reasoning, AI智能体, AV绕过, ChromaDB, Claude, Cloudflare, CVE检测, DLL 劫持, FastAPI, GraphRAG, IP 地址批量处理, LangGraph, MITRE ATT&CK, Neo4j, Python, React, Syscalls, 可解释性, 向量数据库, 多步推理, 大语言模型, 威胁情报, 子域名突变, 安全辅助, 开发者工具, 情报分析, 无后门, 混合检索, 网络安全, 网络诊断, 请求拦截, 路径追踪, 逆向工具, 隐私保护