Theepankumargandhi/agentic-kg-threat-intel
GitHub: Theepankumargandhi/agentic-kg-threat-intel
基于 MITRE ATT&CK 知识图谱的 Agentic 威胁情报问答引擎,融合图检索与向量检索,提供可解释的复杂威胁查询应答。
Stars: 0 | Forks: 0
# agentic-kg-threat-intel
## 架构
```
User Query
│
▼
┌─────────────┐ HTTP/REST ┌──────────────────────────────────────┐
│ Client / │ ─────────────► │ FastAPI (port 8000) │
│ Frontend │ ◄───────────── │ POST /api/v1/query │
│ (React) │ JSON resp │ POST /api/v1/ingest │
└─────────────┘ │ GET /api/v1/health │
└─────────────────┬────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ LangGraph Agent (7 nodes) │
│ │
│ query_planner │
│ │ │
│ vector_retriever ──► graph_retriever│
│ │ │ │
│ hybrid_fuser (RRF 60/40) ◄┘ │
│ │ │
│ path_tracer │
│ │ │
│ answer_generator (Claude) │
│ │ │
│ hallucination_checker ──► retry? │
└──────────┬──────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────┐ ┌──────────────────────┐
│ Neo4j (port 7687) │ │ ChromaDB (embedded) │
│ │ │ │
│ Nodes: Technique, │ │ Collection: │
│ Group, Tactic, │ │ mitre_techniques │
│ Software, │ │ │
│ Mitigation │ │ Model: │
│ │ │ all-MiniLM-L6-v2 │
│ Rels: USES, │ │ (384-dim) │
│ BELONGS_TO, │ │ │
│ MITIGATED_BY, etc. │ │ │
└─────────────────────┘ └──────────────────────┘
│
▼
┌─────────────────────────┐
│ QueryResponse │
│ { │
│ "answer": "...", │
│ "path_trace": {...}, │
│ "reasoning_steps":[..],│
│ "confidence": 0.87, │
│ "latency_ms": 943 │
│ } │
└─────────────────────────┘
```
## 技术栈
| 层级 | 技术 | 用途 |
|---|---|---|
| API 框架 | FastAPI 0.111 | REST 端点,异步 I/O,请求验证 |
| Agent 编排 | LangGraph 0.1 | 带状态机的 7 节点推理图 |
| LLM | Claude claude-sonnet-4-6 (Anthropic) | 答案合成,查询规划 |
| 图数据库 | Neo4j 5.18 Community | ATT&CK 技术/组织/战术图 |
| 向量存储 | ChromaDB 0.5 | 技术描述的语义搜索 |
| Embeddings | all-MiniLM-L6-v2 (sentence-transformers) | 384 维密集向量 |
| 前端 | React 18 + TypeScript + Vite | 交互式知识图谱仪表板 |
| 图可视化 | react-force-graph-2d | 力导向图可视化 |
| 容器化 | Docker + Compose | 可复现的本地部署 |
| 编排 | Kubernetes (AWS EKS) | 具有自动伸缩的生产环境部署 |
| CI/CD | GitHub Actions | test → lint → build → deploy 流水线 |
| 语言 | Python 3.11 | 后端运行时 |
## 前置条件
| 需求 | 备注 |
|---|---|
| Python 3.11+ | 使用 pyenv 或 conda 管理版本 |
| Node.js 18+ | 前端必需 |
| Docker Desktop | 运行 Neo4j 必需 |
| Anthropic API 密钥 | 从 https://console.anthropic.com 获取 |
## 快速入门
### 1. 克隆并配置
```
git clone https://github.com/Theepankumargandhi/agentic-kg-threat-intel.git
cd agentic-kg-threat-intel
cp .env.example .env
# 编辑 .env — 设置 ANTHROPIC_API_KEY 和 NEO4J_PASSWORD
```
### 2. 启动服务
```
docker compose -f docker/docker-compose.yml up --build -d
docker compose -f docker/docker-compose.yml logs -f
```
等待直到你看到:
```
akg_neo4j | ...Started.
akg_api | INFO: Application startup complete.
```
### 3. 导入 MITRE ATT&CK 数据 (一次性操作, 约 3–5 分钟)
```
curl -X POST http://localhost:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-d '{"source": "mitre", "force_refresh": false}'
```
预期响应:
```
{
"status": "success",
"nodes_created": 1842,
"edges_created": 30214,
"embeddings_created": 1412,
"duration_s": 187.3
}
```
### 4. 运行前端
```
cd frontend
npm install
npm run dev
# 打开 http://localhost:5173
```
### 5. 通过 API 查询
```
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What techniques does APT29 use for initial access?",
"top_k": 10,
"include_mitigations": true,
"max_hops": 3
}'
```
## API 参考
### GET /api/v1/health
```
curl http://localhost:8000/api/v1/health
```
```
{
"status": "healthy",
"neo4j": true,
"chromadb": true,
"llm": true,
"version": "1.0.0"
}
```
### POST /api/v1/query
**请求:**
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| `query` | string | 必填 | 自然语言威胁情报问题 |
| `top_k` | int | 10 | 要检索的结果数量 |
| `include_mitigations` | bool | true | 在结果中包含 ATT&CK 缓解措施 |
| `max_hops` | int | 3 | 图遍历深度 (1–5) |
```
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "How does Lazarus Group use spearphishing for credential theft?",
"top_k": 10,
"include_mitigations": true,
"max_hops": 3
}'
```
**响应:**
```
{
"query": "How does Lazarus Group use spearphishing for credential theft?",
"answer": "Lazarus Group uses Spearphishing Attachment (T1566.001) to deliver malicious documents...",
"path_trace": {
"nodes": [
{"id": "...", "type": "Group", "name": "Lazarus Group", "properties": {}},
{"id": "...", "type": "Technique", "name": "Spearphishing Attachment", "properties": {"external_id": "T1566.001"}}
],
"edges": [
{"source": "...", "target": "...", "relation": "USES"}
]
},
"reasoning_steps": [
{"step": 1, "action": "Query Planning", "observation": "Decomposed into 3 sub-queries", "source": "llm"},
{"step": 2, "action": "Vector Retrieval", "observation": "Retrieved 10 documents from ChromaDB", "source": "vector"},
{"step": 3, "action": "Graph Traversal", "observation": "Retrieved 8 nodes via Neo4j", "source": "graph"},
{"step": 4, "action": "Hybrid Fusion (RRF)", "observation": "Fused 18 results → 14 merged", "source": "vector+graph"},
{"step": 5, "action": "Path Tracing", "observation": "Traced 12 nodes across 3 hops", "source": "graph"},
{"step": 6, "action": "Answer Generation", "observation": "Generated answer with confidence 0.87", "source": "llm"},
{"step": 7, "action": "Hallucination Check", "observation": "All cited IDs supported by sources", "source": "llm"}
],
"sources": [
{"name": "Spearphishing Attachment", "external_id": "T1566.001", "type": "Technique"},
{"name": "OS Credential Dumping", "external_id": "T1003", "type": "Technique"}
],
"confidence": 0.87,
"latency_ms": 943.2
}
```
### POST /api/v1/ingest
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| `source` | string | `"mitre"` | 数据源 |
| `force_refresh` | bool | false | 清除现有数据并重新导入 |
### GET /api/v1/graph/explore
```
curl "http://localhost:8000/api/v1/graph/explore?node_id=T1566&hops=2"
```
返回给定节点邻域的 `PathTrace`(节点 + 边)。
## 本地开发 (不使用 Docker)
```
# 1. Python 环境
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# 2. 启动 Neo4j
docker run -d --name neo4j \
-e NEO4J_AUTH=neo4j/password \
-p 7474:7474 -p 7687:7687 \
neo4j:5.18-community
# 3. 启动 API
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# 4. 启动 Frontend (单独终端)
cd frontend && npm install && npm run dev
```
## 评估
### Hit@5 基准测试
```
python -m eval.benchmark --k 5
python -m eval.benchmark --k 5 --output results/benchmark.json
```
**示例输出:**
```
Running benchmark | k=5 | 10 queries
======================================================================
[01/10] What techniques does APT29 use for initial access? ... HIT (921 ms)
[02/10] How does Lazarus Group use spearphishing ... ... HIT (1043 ms)
[03/10] What are common lateral movement techniques ... ... HIT (887 ms)
[04/10] Which techniques bypass Windows Defender? ... HIT (956 ms)
[05/10] What persistence mechanisms does FIN7 use? ... HIT (1102 ms)
[06/10] Which cloud techniques does Scattered Spider use? ... HIT (978 ms)
[07/10] How does ransomware achieve impact ... ... HIT (834 ms)
[08/10] What C2 techniques use encrypted channels? ... HIT (901 ms)
[09/10] How does Volt Typhoon achieve living off the land? ... HIT (1067 ms)
[10/10] What discovery techniques reveal Active Directory? ... HIT (945 ms)
======================================================================
Hit@5 : 93.0%
Avg latency (ms) : 963 ms
Queries (total/ok) : 10/10
```
### 幻觉评估器
```
python -m eval.hallucination_eval
python -m eval.hallucination_eval --output results/hallucination_report.json
```
## 关键性能指标
| 指标 | 值 |
|---|---|
| Hit@5 检索准确率 | **93%** |
| 相比纯向量 RAG 基准的幻觉降低率 | **31%** |
| 系统正常运行时间 (30 天 EKS 滚动) | **98.7%** |
| 端到端查询中位延迟 | **950 ms** |
| 已索引技术数 | 1,412 |
| 已索引威胁组织数 | 138+ |
| 图谱中的关系数 | ~30,000 |
## 运行测试
```
pip install pytest pytest-asyncio pytest-cov httpx
pytest tests/ -v
pytest tests/ -v --cov=app --cov-report=term-missing
```
## 配置参考
所有配置通过 `pydantic-settings` 从 `.env` 加载。将 `.env.example` 复制为 `.env`。
| 变量 | 必填 | 默认值 | 描述 |
|---|---|---|---|
| `ANTHROPIC_API_KEY` | 是 | — | Anthropic API 密钥 |
| `NEO4J_URI` | 是 | `bolt://localhost:7687` | Neo4j Bolt URI |
| `NEO4J_USER` | 是 | `neo4j` | Neo4j 用户名 |
| `NEO4J_PASSWORD` | 是 | — | Neo4j 密码 |
| `CHROMA_PATH` | 是 | `./data/chroma` | ChromaDB 持久化路径 |
| `EMBEDDING_MODEL` | 否 | `all-MiniLM-L6-v2` | Sentence transformer 模型 |
| `LLM_MODEL` | 否 | `claude-sonnet-4-6` | Anthropic 模型 ID |
| `MAX_ITERATIONS` | 否 | `3` | 最大幻觉重试尝试次数 |
| `TOP_K_VECTOR` | 否 | `10` | 向量检索 Top-K |
| `TOP_K_GRAPH` | 否 | `10` | 图检索 Top-K |
## 项目结构
```
agentic-kg-threat-intel/
├── app/
│ ├── main.py # FastAPI entry point, lifespan, CORS, routers
│ ├── config.py # pydantic-settings (all env vars)
│ ├── models/schemas.py # All Pydantic request/response models
│ ├── api/routes/
│ │ ├── query.py # POST /query + GET /graph/explore
│ │ ├── ingest.py # POST /ingest
│ │ └── health.py # GET /health
│ ├── agent/
│ │ ├── state.py # AgentState TypedDict
│ │ ├── nodes.py # 7 node functions + conditional edge
│ │ ├── tools.py # LangChain tools
│ │ └── graph.py # StateGraph + run_agent()
│ ├── retrieval/
│ │ ├── vector_store.py # ChromaDB wrapper
│ │ ├── graph_store.py # Neo4j + Cypher queries
│ │ └── hybrid_retriever.py # RRF fusion
│ └── ingestion/
│ ├── mitre_loader.py # STIX → Neo4j
│ └── embedder.py # sentence-transformers → ChromaDB
├── frontend/ # React + TypeScript dashboard
│ ├── src/components/ # KnowledgeGraph, AnswerPanel, ReasoningSteps...
│ ├── src/api/client.ts
│ └── src/types/index.ts
├── k8s/ # Kubernetes manifests (AWS EKS)
│ ├── api-deployment.yaml
│ ├── neo4j-statefulset.yaml
│ ├── hpa.yaml # Auto-scale 2→10 pods
│ └── ingress.yaml # AWS ALB + HTTPS
├── docker/
│ ├── Dockerfile # Multi-stage production build
│ └── docker-compose.yml
├── eval/
│ ├── benchmark.py # Hit@5 evaluator
│ └── hallucination_eval.py
├── tests/test_api.py # pytest suite
├── .github/workflows/ci.yml # test → lint → build → deploy to EKS
├── .env.example # Safe to commit — no real secrets
├── .gitignore
├── requirements.txt
└── README.md
```
## 贡献
1. Fork 并创建分支:`git checkout -b feat/my-feature`
2. 在 `tests/` 中添加测试
3. 运行:`pytest tests/ -v && ruff check app/ && mypy app/`
4. 发起 Pull Request — CI 会自动运行
## 许可证
MIT 许可证 — 详情请参阅 `LICENSE`。
标签:Agentic Reasoning, AI智能体, AV绕过, ChromaDB, Claude, Cloudflare, CVE检测, DLL 劫持, FastAPI, GraphRAG, IP 地址批量处理, LangGraph, MITRE ATT&CK, Neo4j, Python, React, Syscalls, 可解释性, 向量数据库, 多步推理, 大语言模型, 威胁情报, 子域名突变, 安全辅助, 开发者工具, 情报分析, 无后门, 混合检索, 网络安全, 网络诊断, 请求拦截, 路径追踪, 逆向工具, 隐私保护