intruderfr/it-incident-commander
GitHub: intruderfr/it-incident-commander
一款 YAML 驱动的 IT 事件响应 CLI 工具,通过步骤化运行手册与 SLA 监控解决事件处置不一致问题。
Stars: 0 | Forks: 0
# IT 事件指挥官
[](https://github.com/intruderfr/it-incident-commander/actions/workflows/ci.yml)
[](https://www.python.org/)
[](LICENSE)
一款 **YAML 驱动的 IT 事件响应 CLI 工具**。只需定义一次运行手册,即可每次一致地执行——支持逐步跟踪、SLA 监控以及自动生成事后报告。
```
$ incident start server-down
Incident created: INC-3A7F92B1
Runbook : Server Down
Severity: P1
Escalate: Head of IT / on-call engineer
Steps:
1. [verify-outage] Verify the outage — IT Operations [SLA: 5m]
2. [notify-stakeholders] Notify stakeholders — IT Operations [SLA: 10m]
3. [check-console] Access console / IPMI / ... — IT Operations [SLA: 15m]
...
Run 'incident step INC-3A7F92B1 verify-outage start' to begin.
```
## 功能特性
- **YAML 运行手册** — 定义步骤、团队、SLA 目标和升级联系人
- **步骤生命周期** — `start` → `done` / `skip`,带时间戳和可选备注
- **SLA 跟踪** — 当步骤超过目标时间时发出警告
- **事后报告** — 支持纯文本、Markdown 或 JSON 格式
- **4 个内置运行手册** — 服务器宕机、网络中断、安全漏洞、电子邮件服务中断
- **自定义运行手册** — 导入你自己的 YAML 文件
- **零外部依赖**(除 PyYAML 外)
- **持久化事件日志** — 存储于 `~/.local/share/it-incident-commander/incidents.json`
## 安装
```
pip install PyYAML
pip install git+https://github.com/intruderfr/it-incident-commander.git
```
或者克隆并本地安装:
```
git clone https://github.com/intruderfr/it-incident-commander.git
cd it-incident-commander
pip install -e .
```
## 快速开始
### 1. 启动事件
```
incident start server-down
# Incident created: INC-ABCD1234
```
### 2. 执行步骤
```
# 开始第一步
incident step INC-ABCD1234 verify-outage start
# 标记完成并添加备注
incident step INC-ABCD1234 verify-outage done --notes "Confirmed: web01 unreachable from 3 locations"
# 跳过可选步骤
incident step INC-ABCD1234 change-record skip
```
### 3. 检查进度
```
incident status INC-ABCD1234
```
### 4. 解决事件
```
incident resolve INC-ABCD1234 --notes "Root cause: disk full on /var/log. Cleared logs, added log rotation."
```
### 5. 生成事后报告
```
# 纯文本(默认)
incident report INC-ABCD1234
# Markdown(适用于 Confluence、Notion、GitHub)
incident report INC-ABCD1234 --format markdown --output incident-report.md
# JSON(用于 ITSM 集成)
incident report INC-ABCD1234 --format json
```
## 可用运行手册
| 运行手册 | 严重等级 | 步骤数 | 描述 |
|---------|----------|-------|-------------|
| `server-down` | P1 | 8 | 生产服务器无响应 |
| `network-outage` | P1 | 8 | 网络连接丢失 |
| `security-breach` | P1 | 10 | 疑似入侵 — 包含遏制、根除、恢复 |
| `email-service-down` | P2 | 9 | 企业电子邮件不可用 |
## 编写自定义运行手册
```
name: Database Failover
severity: P1
category: database
description: Primary database is down — fail over to replica.
escalation_contact: "DBA Team / AWS Support"
steps:
- id: confirm-primary-down
title: Confirm the primary is unreachable
team: DBA Team
sla_minutes: 5
- id: promote-replica
title: Promote read replica to primary
team: DBA Team
sla_minutes: 15
```
运行方式:
```
incident start ./my-runbooks/db-failover.yaml
```
## CLI 参考
```
incident start
incident step start|done|skip [--notes TEXT]
incident status
incident resolve [--notes TEXT]
incident cancel
incident list [--status open|resolved|cancelled]
incident report [--format text|markdown|json] [--output FILE]
incident list-runbooks [--runbook-dir DIR]
incident validate
```
## 作者
**Aslam Ahamed** — Prestige One Developments 总部 IT 负责人,阿联酋迪拜
[LinkedIn](https://www.linkedin.com/in/aslam-ahamed/)
## 许可证
MIT — 参见 [LICENSE](LICENSE)
标签:API集成, Awesome, IT治理, IT运营, Python, PyYAML, SLA监控, YAML配置, 事后报告, 二进制发布, 可观测性, 开源工具, 恶意代码分类, 无后门, 日志持久化, 步骤跟踪, 系统管理, 自动化运维, 运行手册, 逆向工具, 零依赖