Jai-Gogineni/cloudwatch-sre-alerts
GitHub: Jai-Gogineni/cloudwatch-sre-alerts
集成 CloudWatch、Datadog、PagerDuty 的 SRE 告警框架,提供 SLO 消耗率跟踪与自动化事件响应 Runbook。
Stars: 0 | Forks: 0
# CloudWatch SRE 告警
[](https://github.com/Jai-Gogineni/cloudwatch-sre-alerts/actions/workflows/ci.yml)
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
生产级 SRE 告警框架,集成了 AWS CloudWatch、Datadog 监控器、SLO 消耗率计算以及自动化事件响应 Runbook。
## 架构
```
graph TB
subgraph Monitoring
CW[CloudWatch Alarms] --> AE[Alert Engine]
DD[Datadog Monitors] --> AE
end
subgraph SLO Tracking
AE --> SC[SLO Calculator]
SC --> BR[Burn Rate Analysis]
BR --> EB[Error Budget]
end
subgraph Incident Response
AE --> RB[Runbook Executor]
RB --> AS[Auto-Scale]
RB --> RS[Restart Service]
RB --> RK[Rollback]
end
subgraph Notifications
AE --> SNS[AWS SNS]
AE --> PD[PagerDuty]
AE --> SL[Slack]
end
```
## 快速开始
```
git clone https://github.com/Jai-Gogineni/cloudwatch-sre-alerts.git
cd cloudwatch-sre-alerts
pip install pytest pyyaml
pytest tests/ -v
```
## 项目结构
```
├── src/
│ ├── alerts/
│ │ ├── cloudwatch_rules.py # CloudWatch alarm definitions
│ │ └── datadog_monitors.py # Datadog monitor configs
│ ├── runbooks/
│ │ └── incident_response.py # Automated runbook executor
│ └── slo/
│ └── slo_calculator.py # SLO burn rate calculator
├── config/
│ ├── alerts.yaml # Alert definitions
│ └── slos.yaml # SLO targets
├── tests/
│ └── test_slo_calculator.py
└── .github/workflows/ci.yml
```
## 功能
- **CloudWatch Alarms** — 声明式告警定义,支持 SNS/PagerDuty 集成
- **Datadog 监控器** — 编程方式管理监控器
- **SLO 计算器** — 消耗率分析与错误预算跟踪
- **事件 Runbook** — 具备回滚能力的自动化响应
## 许可证
MIT
标签:CloudWatch, Datadog, PagerDuty, Python, SRE, 偏差过滤, 告警框架, 无后门, 监控, 自动化运维, 逆向工具