mehulmorker/autonomous_incident_response_system

GitHub: mehulmorker/autonomous_incident_response_system

Stars: 0 | Forks: 0

# Autonomous Incident Response System An autonomous multi-agent system that mimics a senior SRE investigating a production incident. Built with LangGraph, it orchestrates three specialized AI agents to analyze mock telemetry data and produce a Root-Cause Analysis (RCA) report. ## Architecture Alert Payload (FastAPI) │ ▼ MetricsAnalyzer ──→ LogTraceSleuth ──→ RCACommander ──→ RCA Report ▲ │ │ └────────────────────┘ (conditional │ loop-back) [ChromaDB RAG] ## Two-Route Design Set `LLM_PROVIDER` in your `.env` to choose: | | Free Route (`LLM_PROVIDER=free`) | Paid Route (`LLM_PROVIDER=openai`) | | ---------- | -------------------------------- | ---------------------------------- | | LLM | Ollama local (llama3.2) | OpenAI GPT-4o-mini | | Embeddings | sentence-transformers (local) | OpenAI text-embedding-3-small | | Vector DB | ChromaDB (local) | ChromaDB (local) | ## Setup ### 1. Install uv curl -LsSf https://astral.sh/uv/install.sh | sh ### 2. Create the virtual environment and install dependencies uv venv uv sync ### 3. Configure environment variables cp .env.example .env # Edit .env — set LLM_PROVIDER and your API key ### 4. (Free route only) Pull the Ollama model ollama pull llama3.2 ### 5. Verify installation uv run python3 -c "import langgraph; import chromadb; print('All imports OK')" ## Running the Demo After completing all phases: # Option 1: Run directly (no HTTP) uv run python3 scripts/run_demo.py # Option 2: Run via API uv run uvicorn api.main:app --reload curl -X POST http://localhost:8000/api/v1/incident \ -H "Content-Type: application/json" \ -d @data/mock_telemetry/alert.json ## Project Structure backend_incident_multi_agent/ ├── pyproject.toml # uv project manifest + all dependencies ├── config.py # Provider factory: get_llm() and get_embeddings() ├── data/ │ ├── mock_telemetry/ # Simulated metrics, logs, traces, alert (JSON) │ └── runbooks/ # Internal runbook markdown files ├── state/ # IncidentState TypedDict ├── tools/ # Deterministic telemetry query functions ├── agents/ # LangGraph agent node implementations ├── rag/ # ChromaDB ingest and retrieval pipeline ├── models/ # Pydantic RCA output model ├── graph/ # LangGraph StateGraph definition ├── api/ # FastAPI webhook layer └── scripts/ # Demo runner ## Build Phases | Phase | Description | | ----- | -------------------------------------------- | | 1 | Project Foundation & Environment Setup | | 2 | Mock Telemetry Simulation Engine | | 3 | Deterministic Telemetry Tool Functions | | 4 | LangGraph State Machine & Graph Architecture | | 5 | MetricsAnalyzer Agent | | 6 | LogTraceSleuth Agent | | 7 | RAG Pipeline & Runbook Ingestion | | 8 | RCACommander Agent | | 9 | FastAPI Trigger Layer | | 10 | End-to-End Validation & Demo Run |