intruderfr/it-incident-commander

GitHub: intruderfr/it-incident-commander

一款 YAML 驱动的 IT 事件响应 CLI 工具,通过步骤化运行手册与 SLA 监控解决事件处置不一致问题。

Stars: 0 | Forks: 0

# IT 事件指挥官 [![CI](https://github.com/intruderfr/it-incident-commander/actions/workflows/ci.yml/badge.svg)](https://github.com/intruderfr/it-incident-commander/actions/workflows/ci.yml) [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) 一款 **YAML 驱动的 IT 事件响应 CLI 工具**。只需定义一次运行手册,即可每次一致地执行——支持逐步跟踪、SLA 监控以及自动生成事后报告。 ``` $ incident start server-down Incident created: INC-3A7F92B1 Runbook : Server Down Severity: P1 Escalate: Head of IT / on-call engineer Steps: 1. [verify-outage] Verify the outage — IT Operations [SLA: 5m] 2. [notify-stakeholders] Notify stakeholders — IT Operations [SLA: 10m] 3. [check-console] Access console / IPMI / ... — IT Operations [SLA: 15m] ... Run 'incident step INC-3A7F92B1 verify-outage start' to begin. ``` ## 功能特性 - **YAML 运行手册** — 定义步骤、团队、SLA 目标和升级联系人 - **步骤生命周期** — `start` → `done` / `skip`,带时间戳和可选备注 - **SLA 跟踪** — 当步骤超过目标时间时发出警告 - **事后报告** — 支持纯文本、Markdown 或 JSON 格式 - **4 个内置运行手册** — 服务器宕机、网络中断、安全漏洞、电子邮件服务中断 - **自定义运行手册** — 导入你自己的 YAML 文件 - **零外部依赖**(除 PyYAML 外) - **持久化事件日志** — 存储于 `~/.local/share/it-incident-commander/incidents.json` ## 安装 ``` pip install PyYAML pip install git+https://github.com/intruderfr/it-incident-commander.git ``` 或者克隆并本地安装: ``` git clone https://github.com/intruderfr/it-incident-commander.git cd it-incident-commander pip install -e . ``` ## 快速开始 ### 1. 启动事件 ``` incident start server-down # Incident created: INC-ABCD1234 ``` ### 2. 执行步骤 ``` # 开始第一步 incident step INC-ABCD1234 verify-outage start # 标记完成并添加备注 incident step INC-ABCD1234 verify-outage done --notes "Confirmed: web01 unreachable from 3 locations" # 跳过可选步骤 incident step INC-ABCD1234 change-record skip ``` ### 3. 检查进度 ``` incident status INC-ABCD1234 ``` ### 4. 解决事件 ``` incident resolve INC-ABCD1234 --notes "Root cause: disk full on /var/log. Cleared logs, added log rotation." ``` ### 5. 生成事后报告 ``` # 纯文本(默认) incident report INC-ABCD1234 # Markdown(适用于 Confluence、Notion、GitHub) incident report INC-ABCD1234 --format markdown --output incident-report.md # JSON(用于 ITSM 集成) incident report INC-ABCD1234 --format json ``` ## 可用运行手册 | 运行手册 | 严重等级 | 步骤数 | 描述 | |---------|----------|-------|-------------| | `server-down` | P1 | 8 | 生产服务器无响应 | | `network-outage` | P1 | 8 | 网络连接丢失 | | `security-breach` | P1 | 10 | 疑似入侵 — 包含遏制、根除、恢复 | | `email-service-down` | P2 | 9 | 企业电子邮件不可用 | ## 编写自定义运行手册 ``` name: Database Failover severity: P1 category: database description: Primary database is down — fail over to replica. escalation_contact: "DBA Team / AWS Support" steps: - id: confirm-primary-down title: Confirm the primary is unreachable team: DBA Team sla_minutes: 5 - id: promote-replica title: Promote read replica to primary team: DBA Team sla_minutes: 15 ``` 运行方式: ``` incident start ./my-runbooks/db-failover.yaml ``` ## CLI 参考 ``` incident start incident step start|done|skip [--notes TEXT] incident status incident resolve [--notes TEXT] incident cancel incident list [--status open|resolved|cancelled] incident report [--format text|markdown|json] [--output FILE] incident list-runbooks [--runbook-dir DIR] incident validate ``` ## 作者 **Aslam Ahamed** — Prestige One Developments 总部 IT 负责人,阿联酋迪拜 [LinkedIn](https://www.linkedin.com/in/aslam-ahamed/) ## 许可证 MIT — 参见 [LICENSE](LICENSE)
标签:API集成, Awesome, IT治理, IT运营, Python, PyYAML, SLA监控, YAML配置, 事后报告, 二进制发布, 可观测性, 开源工具, 恶意代码分类, 无后门, 日志持久化, 步骤跟踪, 系统管理, 自动化运维, 运行手册, 逆向工具, 零依赖