VishalVinayRam/sentinel-framework

GitHub: VishalVinayRam/sentinel-framework

AI 驱动的事件响应框架，通过多步 LLM 根因分析流水线自动确认、诊断生产环境告警并生成处置 runbook。

Stars: 22 | Forks: 0

# Sentinel Framework **AI 驱动的事件响应 —— 接入任何云平台、任何 git 提供商、任何 LLM。** Sentinel 会监控您的生产环境告警，确认它们是否为真实的故障，使用您选择的 LLM 运行多步根因分析，生成 runbook，并将所有内容发布到实时仪表板 —— 所有这些都在两分钟内完成。 ``` Alert fired │ Kinesis stream │ ┌───▼──────────────────┐ │ Validator Lambda │ 3-signal cross-check: health endpoint + smoke test + metrics └───┬──────────────────┘ │ confirmed real incident ┌───▼──────────────────┐ │ Log Analyzer │ rule-based + ML severity (P1–P4), impact scope, degradation trend └───┬──────────────────┘ │ ┌───▼──────────────────┐ │ Root Cause Agent │ Step Functions: recent commits → RAG query → LLM RCA → runbook │ (5-step pipeline) │ └───┬──────────────────┘ │ ┌───▼──────────────────┐ │ Dashboard + Slack │ FastAPI SPA auto-refreshes every 15 s; P1/P2 → Slack alert └──────────────────────┘ ``` 此外还附带了一个 **PR 安全代理** —— 在合并之前，每一个 pull request 都会被扫描，以检查 OWASP 问题、遗漏的边缘情况和结构性 bug。 ## 快速开始（60 秒） ``` git clone https://github.com/VishalVinayRam/Project-KEMM cd Project-KEMM ./setup_demo.sh # 然后打开 http://localhost:8501 ``` `setup_demo.sh` 是幂等的。它会： 1. 检查前置条件（Python 3.10+、pip、Docker） 2. 安装 Python 依赖 3. 在 `http://localhost:4566` 启动 **Floci**（本地 AWS 模拟器） 4. 启动 **Ollama** daemon，选择最佳可用模型，并在端口 8080 上启动 KServe bridge 5. 创建 Kinesis streams / SQS 队列 / DynamoDB 表 6. 注入 6 个真实的演示事件 7. 在 **端口 8501** 上启动 FastAPI 仪表板要从仪表板触发测试事件，请点击 **“Fire Demo Incident”** 或调用 API： ``` curl -s -X POST http://localhost:8501/api/demo/fire \ -H "Content-Type: application/json" \ -d '{"severity": "P1", "service": "auth-service"}' | jq . ``` ## 提供商配置 (`sentinel.yaml`) 在项目根目录下放置一个 `sentinel.yaml`（完整 schema 请参见 `sentinel.example.yaml`）： ``` llm: provider: kserve # kserve | openai | anthropic | ollama | gemini endpoint: http://localhost:8080 model: phi3:mini cloud_provider: provider: floci # floci | aws | gcp endpoint: http://localhost:4566 git_provider: provider: github token: ${GITHUB_TOKEN} repo: org/repo alerting: provider: slack webhook_url: ${SLACK_WEBHOOK_URL} ``` ### 支持的提供商 | 类别 | 提供商 | |---|---| | **LLM** | KServe (Ollama bridge)、OpenAI、Anthropic、Gemini、Ollama (直接) | | **云 / 存储** | AWS (DynamoDB, Kinesis, S3, SQS, SNS)、GCP、Floci (本地开发) | | **Git** | GitHub、GitLab | | **告警** | Slack、PagerDuty | | **日志摄取** | CloudWatch Alarms (SNS→Lambda)、Loki/Grafana (AlertManager webhook) | **LLM fallback 链** —— 如果 KServe 不可用，Sentinel 会自动通过 `gemini-1.5-flash → gpt-4o-mini → claude-haiku` 进行回退。设置 `GEMINI_API_KEY`、`OPENAI_API_KEY`、`ANTHROPIC_API_KEY` 的任意组合；仅会尝试您已设置的项。 ## 关键环境变量 | 变量 | 默认值 | 用途 | |---|---|---| | `SENTINEL_API_KEY` | _(未设置 = 开放)_ | 在所有 API 路由上强制执行 `X-API-Key` 认证 | | `SENTINEL_ENV` | `development` | 设置为 `production` 以限制 CORS 源 | | `SENTINEL_ALLOWED_ORIGINS` | _(无)_ | 逗号分隔的允许 CORS 源 | | `FLOCI_ENDPOINT` | `http://localhost:4566` | 本地 AWS 模拟器 URL | | `KSERVE_ENDPOINT` | `http://localhost:8081` | KServe / Ollama bridge URL | | `GEMINI_API_KEY` | _(无)_ | 备选 LLM —— Gemini | | `OPENAI_API_KEY` | _(无)_ | 备选 LLM —— OpenAI | | `ANTHROPIC_API_KEY` | _(无)_ | 备选 LLM —— Anthropic | | `SLACK_WEBHOOK_URL` | _(无)_ | P1/P2 Slack 通知 | | `INCIDENTS_TABLE` | `sentinel-incidents` | DynamoDB 表 | ## 仓库布局 ``` sentinel/ Core Python package (pip-installable) config/ YAML config loader → typed dataclasses core/ Severity enum, Incident dataclass, PR review logic providers/ base/ Abstract base classes (LLM, Cloud, Git, Alerting) llm/ anthropic · openai · gemini · kserve · ollama · fallback cloud/ aws · gcp git/ github · gitlab alerting/ slack · pagerduty rag/ Codebase indexer → pgvector similarity search registry.py ProviderRegistry.from_config() — wires everything services/ Lambda handlers + local servers dashboard/ FastAPI REST API + single-page dashboard UI cloudwatch-alarm-receiver/ SNS → Lambda → incident receiver log-analyzer/ Kinesis consumer: rule + ML severity classification loki-bridge/ AlertManager webhook → Kinesis validator/ 3-signal alert validation root-cause-agent/ Step Functions: 5-step LLM RCA pipeline pr-security-agent/ GitHub webhook → OWASP/edge-case PR scan kserve-local/ Local KServe V2 bridge → Ollama infra/ Terraform — all AWS resources helm/sentinel/ Helm chart for Kubernetes deployment ml-core/ KServe ISVC YAML, MLflow training pipeline observability/ Prometheus values, Grafana dashboards, Loki rules tests/ pytest suite (189 tests, 0 dependencies on real AWS) ``` ## 运行测试 ``` pip install -r requirements.txt -r requirements-dev.txt pytest tests/ # 189 unit tests, no infrastructure needed python scripts/e2e_test.py # 29 integration tests (needs Floci running) ``` ## 真实 AWS 部署 ``` cd infra terraform init terraform apply \ -var="github_token=$GITHUB_TOKEN" \ -var="kserve_endpoint=http://your-cluster:8080" ``` 用于 Kubernetes 的 Helm chart： ``` helm install sentinel helm/sentinel/ \ --set sentinelApiKey=$SENTINEL_API_KEY \ --set kserveEndpoint=http://your-kserve:8080 ``` ## 添加新的 LLM 提供商 1. 在 `sentinel/providers/llm/yourprovider.py` 中实现 `BaseLLMProvider`（四个方法：`complete`、`embed`、`embed_batch`、`health_check`） 2. 在 `sentinel/registry.py::_build_llm()` 中添加一个分支 3. 在 `sentinel.yaml` 中设置 `llm.provider: yourprovider` ## 许可证 MIT —— 见 [LICENSE](LICENSE)。

标签：AI风险缓解, 自定义请求头, 请求拦截, 逆向工具