mehulmorker/autonomous_incident_response_system

GitHub: mehulmorker/autonomous_incident_response_system

基于 LangGraph 构建的多 Agent 自主事件响应系统，模拟 SRE 对生产告警进行指标分析、日志追踪和根因报告的自动化调查流程。

Stars: 0 | Forks: 0

# 自主事件响应系统这是一个自主的多 Agent 系统，模拟高级 SRE 调查生产环境事件的流程。它基于 LangGraph 构建，编排了三个专门的 AI Agent 来分析模拟遥测数据，并生成一份根因分析 (RCA) 报告。 ## 架构 ``` Alert Payload (FastAPI) │ ▼ MetricsAnalyzer ──→ LogTraceSleuth ──→ RCACommander ──→ RCA Report ▲ │ │ └────────────────────┘ (conditional │ loop-back) [ChromaDB RAG] ``` ## 双路径设计在您的 `.env` 文件中设置 `LLM_PROVIDER` 进行选择： | | 免费路径 (`LLM_PROVIDER=free`) | 付费路径 (`LLM_PROVIDER=openai`) | | ---------- | -------------------------------- | ---------------------------------- | | LLM | Ollama 本地模型 (llama3.2) | OpenAI GPT-4o-mini | | Embeddings | sentence-transformers (本地) | OpenAI text-embedding-3-small | | Vector DB | ChromaDB (本地) | ChromaDB (本地) | ## 设置说明 ### 1. 安装 uv ``` curl -LsSf https://astral.sh/uv/install.sh | sh ``` ### 2. 创建虚拟环境并安装依赖 ``` uv venv uv sync ``` ### 3. 配置环境变量 ``` cp .env.example .env # 编辑 .env — 设置 LLM_PROVIDER 和你的 API key ``` ### 4. （仅限免费路径）拉取 Ollama 模型 ``` ollama pull llama3.2 ``` ### 5. 验证安装 ``` uv run python3 -c "import langgraph; import chromadb; print('All imports OK')" ``` ## 运行演示完成所有阶段后： ``` # 选项 1：直接运行（无 HTTP） uv run python3 scripts/run_demo.py # 选项 2：通过 API 运行 uv run uvicorn api.main:app --reload curl -X POST http://localhost:8000/api/v1/incident \ -H "Content-Type: application/json" \ -d @data/mock_telemetry/alert.json ``` ## 项目结构 ``` backend_incident_multi_agent/ ├── pyproject.toml # uv project manifest + all dependencies ├── config.py # Provider factory: get_llm() and get_embeddings() ├── data/ │ ├── mock_telemetry/ # Simulated metrics, logs, traces, alert (JSON) │ └── runbooks/ # Internal runbook markdown files ├── state/ # IncidentState TypedDict ├── tools/ # Deterministic telemetry query functions ├── agents/ # LangGraph agent node implementations ├── rag/ # ChromaDB ingest and retrieval pipeline ├── models/ # Pydantic RCA output model ├── graph/ # LangGraph StateGraph definition ├── api/ # FastAPI webhook layer └── scripts/ # Demo runner ``` ## 构建阶段 | 阶段 | 描述 | | ----- | -------------------------------------------- | | 1 | 项目基础与环境搭建 | | 2 | 模拟遥测引擎 | | 3 | 确定性遥测工具函数 | | 4 | LangGraph 状态机与图架构 | | 5 | MetricsAnalyzer Agent | | 6 | LogTraceSleuth Agent | | 7 | RAG 流水线与 Runbook 数据提取 | | 8 | RCACommander Agent | | 9 | FastAPI 触发层 | | 10 | 端到端验证与演示运行 |

标签：AIOps, AI风险缓解, DLL 劫持, LangGraph, PyRIT, 多智能体系统, 大语言模型, 根因分析, 运维自动化, 逆向工具