Jai-Gogineni/cloudwatch-sre-alerts

GitHub: Jai-Gogineni/cloudwatch-sre-alerts

集成 CloudWatch、Datadog、PagerDuty 的 SRE 告警框架,提供 SLO 消耗率跟踪与自动化事件响应 Runbook。

Stars: 0 | Forks: 0

# CloudWatch SRE 告警 [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/2c0edfde34002325.svg)](https://github.com/Jai-Gogineni/cloudwatch-sre-alerts/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Python 3.11](https://img.shields.io/badge/Python-3.11-blue.svg)](https://www.python.org/downloads/) 生产级 SRE 告警框架,集成了 AWS CloudWatch、Datadog 监控器、SLO 消耗率计算以及自动化事件响应 Runbook。 ## 架构 ``` graph TB subgraph Monitoring CW[CloudWatch Alarms] --> AE[Alert Engine] DD[Datadog Monitors] --> AE end subgraph SLO Tracking AE --> SC[SLO Calculator] SC --> BR[Burn Rate Analysis] BR --> EB[Error Budget] end subgraph Incident Response AE --> RB[Runbook Executor] RB --> AS[Auto-Scale] RB --> RS[Restart Service] RB --> RK[Rollback] end subgraph Notifications AE --> SNS[AWS SNS] AE --> PD[PagerDuty] AE --> SL[Slack] end ``` ## 快速开始 ``` git clone https://github.com/Jai-Gogineni/cloudwatch-sre-alerts.git cd cloudwatch-sre-alerts pip install pytest pyyaml pytest tests/ -v ``` ## 项目结构 ``` ├── src/ │ ├── alerts/ │ │ ├── cloudwatch_rules.py # CloudWatch alarm definitions │ │ └── datadog_monitors.py # Datadog monitor configs │ ├── runbooks/ │ │ └── incident_response.py # Automated runbook executor │ └── slo/ │ └── slo_calculator.py # SLO burn rate calculator ├── config/ │ ├── alerts.yaml # Alert definitions │ └── slos.yaml # SLO targets ├── tests/ │ └── test_slo_calculator.py └── .github/workflows/ci.yml ``` ## 功能 - **CloudWatch Alarms** — 声明式告警定义,支持 SNS/PagerDuty 集成 - **Datadog 监控器** — 编程方式管理监控器 - **SLO 计算器** — 消耗率分析与错误预算跟踪 - **事件 Runbook** — 具备回滚能力的自动化响应 ## 许可证 MIT
标签:CloudWatch, Datadog, PagerDuty, Python, SRE, 偏差过滤, 告警框架, 无后门, 监控, 自动化运维, 逆向工具