VishalVinayRam/sentinel-framework

GitHub: VishalVinayRam/sentinel-framework

Stars: 22 | Forks: 0

# Sentinel Framework **AI-powered incident response — plug into any cloud, any git provider, any LLM.** Sentinel watches your production alerts, confirms they are real failures, runs a multi-step root-cause analysis with an LLM of your choice, generates a runbook, and posts everything to a live dashboard — all in under two minutes. Alert fired │ Kinesis stream │ ┌───▼──────────────────┐ │ Validator Lambda │ 3-signal cross-check: health endpoint + smoke test + metrics └───┬──────────────────┘ │ confirmed real incident ┌───▼──────────────────┐ │ Log Analyzer │ rule-based + ML severity (P1–P4), impact scope, degradation trend └───┬──────────────────┘ │ ┌───▼──────────────────┐ │ Root Cause Agent │ Step Functions: recent commits → RAG query → LLM RCA → runbook │ (5-step pipeline) │ └───┬──────────────────┘ │ ┌───▼──────────────────┐ │ Dashboard + Slack │ FastAPI SPA auto-refreshes every 15 s; P1/P2 → Slack alert └──────────────────────┘ Also ships a **PR Security Agent** — every pull request is scanned for OWASP issues, missed edge cases, and structural bugs before merge. ## Quick start (60 seconds) git clone https://github.com/VishalVinayRam/Project-KEMM cd Project-KEMM ./setup_demo.sh # then open http://localhost:8501 `setup_demo.sh` is idempotent. It: 1. Checks prereqs (Python 3.10+, pip, Docker) 2. Installs Python deps 3. Starts **Floci** (local AWS emulator) at `http://localhost:4566` 4. Starts **Ollama** daemon, picks the best available model, starts the KServe bridge on port 8080 5. Creates Kinesis streams / SQS queues / DynamoDB tables 6. Seeds 6 realistic demo incidents 7. Starts the FastAPI dashboard on **port 8501** To fire a test incident from the dashboard, click **"Fire Demo Incident"** or call the API: curl -s -X POST http://localhost:8501/api/demo/fire \ -H "Content-Type: application/json" \ -d '{"severity": "P1", "service": "auth-service"}' | jq . ## Provider configuration (`sentinel.yaml`) Drop a `sentinel.yaml` in the project root (see `sentinel.example.yaml` for the full schema): llm: provider: kserve # kserve | openai | anthropic | ollama | gemini endpoint: http://localhost:8080 model: phi3:mini cloud_provider: provider: floci # floci | aws | gcp endpoint: http://localhost:4566 git_provider: provider: github token: ${GITHUB_TOKEN} repo: org/repo alerting: provider: slack webhook_url: ${SLACK_WEBHOOK_URL} ### Supported providers | Category | Providers | |---|---| | **LLM** | KServe (Ollama bridge), OpenAI, Anthropic, Gemini, Ollama (direct) | | **Cloud / storage** | AWS (DynamoDB, Kinesis, S3, SQS, SNS), GCP, Floci (local dev) | | **Git** | GitHub, GitLab | | **Alerting** | Slack, PagerDuty | | **Log ingestion** | CloudWatch Alarms (SNS→Lambda), Loki/Grafana (AlertManager webhook) | **LLM fallback chain** — if KServe is unavailable, Sentinel automatically falls back through `gemini-1.5-flash → gpt-4o-mini → claude-haiku`. Set any combination of `GEMINI_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`; only the ones you set are tried. ## Key environment variables | Variable | Default | Purpose | |---|---|---| | `SENTINEL_API_KEY` | _(unset = open)_ | Enforces `X-API-Key` auth on all API routes | | `SENTINEL_ENV` | `development` | Set to `production` to restrict CORS origins | | `SENTINEL_ALLOWED_ORIGINS` | _(none)_ | Comma-separated allowed CORS origins | | `FLOCI_ENDPOINT` | `http://localhost:4566` | Local AWS emulator URL | | `KSERVE_ENDPOINT` | `http://localhost:8081` | KServe / Ollama bridge URL | | `GEMINI_API_KEY` | _(none)_ | Fallback LLM — Gemini | | `OPENAI_API_KEY` | _(none)_ | Fallback LLM — OpenAI | | `ANTHROPIC_API_KEY` | _(none)_ | Fallback LLM — Anthropic | | `SLACK_WEBHOOK_URL` | _(none)_ | P1/P2 Slack notifications | | `INCIDENTS_TABLE` | `sentinel-incidents` | DynamoDB table | ## Repository layout sentinel/ Core Python package (pip-installable) config/ YAML config loader → typed dataclasses core/ Severity enum, Incident dataclass, PR review logic providers/ base/ Abstract base classes (LLM, Cloud, Git, Alerting) llm/ anthropic · openai · gemini · kserve · ollama · fallback cloud/ aws · gcp git/ github · gitlab alerting/ slack · pagerduty rag/ Codebase indexer → pgvector similarity search registry.py ProviderRegistry.from_config() — wires everything services/ Lambda handlers + local servers dashboard/ FastAPI REST API + single-page dashboard UI cloudwatch-alarm-receiver/ SNS → Lambda → incident receiver log-analyzer/ Kinesis consumer: rule + ML severity classification loki-bridge/ AlertManager webhook → Kinesis validator/ 3-signal alert validation root-cause-agent/ Step Functions: 5-step LLM RCA pipeline pr-security-agent/ GitHub webhook → OWASP/edge-case PR scan kserve-local/ Local KServe V2 bridge → Ollama infra/ Terraform — all AWS resources helm/sentinel/ Helm chart for Kubernetes deployment ml-core/ KServe ISVC YAML, MLflow training pipeline observability/ Prometheus values, Grafana dashboards, Loki rules tests/ pytest suite (189 tests, 0 dependencies on real AWS) ## Running tests pip install -r requirements.txt -r requirements-dev.txt pytest tests/ # 189 unit tests, no infrastructure needed python scripts/e2e_test.py # 29 integration tests (needs Floci running) ## Real AWS deployment cd infra terraform init terraform apply \ -var="github_token=$GITHUB_TOKEN" \ -var="kserve_endpoint=http://your-cluster:8080" Helm chart for Kubernetes: helm install sentinel helm/sentinel/ \ --set sentinelApiKey=$SENTINEL_API_KEY \ --set kserveEndpoint=http://your-kserve:8080 ## Adding a new LLM provider 1. Implement `BaseLLMProvider` in `sentinel/providers/llm/yourprovider.py` (four methods: `complete`, `embed`, `embed_batch`, `health_check`) 2. Add a branch in `sentinel/registry.py::_build_llm()` 3. Set `llm.provider: yourprovider` in `sentinel.yaml` ## License MIT — see [LICENSE](LICENSE).