mehulmorker/autonomous_incident_response_system
GitHub: mehulmorker/autonomous_incident_response_system
Stars: 0 | Forks: 0
# Autonomous Incident Response System
An autonomous multi-agent system that mimics a senior SRE investigating a production incident. Built with LangGraph, it orchestrates three specialized AI agents to analyze mock telemetry data and produce a Root-Cause Analysis (RCA) report.
## Architecture
Alert Payload (FastAPI)
│
▼
MetricsAnalyzer ──→ LogTraceSleuth ──→ RCACommander ──→ RCA Report
▲ │ │
└────────────────────┘ (conditional │
loop-back) [ChromaDB RAG]
## Two-Route Design
Set `LLM_PROVIDER` in your `.env` to choose:
| | Free Route (`LLM_PROVIDER=free`) | Paid Route (`LLM_PROVIDER=openai`) |
| ---------- | -------------------------------- | ---------------------------------- |
| LLM | Ollama local (llama3.2) | OpenAI GPT-4o-mini |
| Embeddings | sentence-transformers (local) | OpenAI text-embedding-3-small |
| Vector DB | ChromaDB (local) | ChromaDB (local) |
## Setup
### 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
### 2. Create the virtual environment and install dependencies
uv venv
uv sync
### 3. Configure environment variables
cp .env.example .env
# Edit .env — set LLM_PROVIDER and your API key
### 4. (Free route only) Pull the Ollama model
ollama pull llama3.2
### 5. Verify installation
uv run python3 -c "import langgraph; import chromadb; print('All imports OK')"
## Running the Demo
After completing all phases:
# Option 1: Run directly (no HTTP)
uv run python3 scripts/run_demo.py
# Option 2: Run via API
uv run uvicorn api.main:app --reload
curl -X POST http://localhost:8000/api/v1/incident \
-H "Content-Type: application/json" \
-d @data/mock_telemetry/alert.json
## Project Structure
backend_incident_multi_agent/
├── pyproject.toml # uv project manifest + all dependencies
├── config.py # Provider factory: get_llm() and get_embeddings()
├── data/
│ ├── mock_telemetry/ # Simulated metrics, logs, traces, alert (JSON)
│ └── runbooks/ # Internal runbook markdown files
├── state/ # IncidentState TypedDict
├── tools/ # Deterministic telemetry query functions
├── agents/ # LangGraph agent node implementations
├── rag/ # ChromaDB ingest and retrieval pipeline
├── models/ # Pydantic RCA output model
├── graph/ # LangGraph StateGraph definition
├── api/ # FastAPI webhook layer
└── scripts/ # Demo runner
## Build Phases
| Phase | Description |
| ----- | -------------------------------------------- |
| 1 | Project Foundation & Environment Setup |
| 2 | Mock Telemetry Simulation Engine |
| 3 | Deterministic Telemetry Tool Functions |
| 4 | LangGraph State Machine & Graph Architecture |
| 5 | MetricsAnalyzer Agent |
| 6 | LogTraceSleuth Agent |
| 7 | RAG Pipeline & Runbook Ingestion |
| 8 | RCACommander Agent |
| 9 | FastAPI Trigger Layer |
| 10 | End-to-End Validation & Demo Run |