keephq/keep

GitHub: keephq/keep

Keep 是一个开源 AIOps 和告警管理平台,通过统一面板、告警去重关联和声明式工作流,帮助企业集中治理来自数十种监控工具的告警并实现事件响应自动化。

Stars: 11755 | Forks: 1336

开源的 AIOps 和告警管理平台


统一管理面板,告警去重、丰富、过滤和关联,双向集成,工作流,仪表盘。
PRs Welcome Join Slack GitHub commit activity

文档 · 立即体验 · 报告 Bug · 预约演示 · 官网

Sneak preview screenshot

- 🔍 **统一管理面板** - 为您所有告警和事件提供一流的可定制 UI - 🛠️ **告警的瑞士军刀** - 去重、关联、过滤和丰富 - 🔄 **深度集成** - 与监控工具双向同步,可定制工作流 - ⚡ **[自动化](#workflows)** - 针对您的监控工具的 GitHub Actions - 🤖 **AIOps 2.0** - 由 AI 驱动的关联和摘要

## 支持的集成 ### 用于数据丰富、关联和事件上下文收集的 AI 后端
Anthropic
Anthropic
OpenAI
OpenAI
DeepSeek
DeepSeek
Ollama
Ollama
LlamaCPP
LlamaCPP
Grok
Grok
Gemini
Gemini
### 可观测性工具
AppDynamics
AppDynamics
Axiom
Axiom
Azure Monitoring
Azure Monitoring
Centreon
Centreon
Checkmk
Checkmk
Cilium
Cilium
Checkly
Checkly
CloudWatch
CloudWatch
Coralogix
Coralogix
Dash0
Dash0
Datadog
Datadog
Dynatrace
Dynatrace
Elastic
Elastic
GCP Monitoring
GCP Monitoring
Grafana
Grafana
Grafana Loki
Grafana Loki
Graylog
Graylog
Icinga2
Icinga2
Kibana
Kibana
LibreNMS
LibreNMS
NetBox
NetBox
Netdata
Netdata
New Relic
New Relic
OpenSearch Serverless
OpenSearch Serverless
Parseable
Parseable
Pingdom
Pingdom
Prometheus
Prometheus
Rollbar
Rollbar
Sentry
Sentry
SignalFX
SignalFX
OpenObserve
OpenObserve
Site24x7
Site24x7
Splunk
Splunk
StatusCake
StatusCake
SumoLogic
SumoLogic
SumoLogic
ThousandEyes
UptimeKuma
UptimeKuma
VictoriaLogs
VictoriaLogs
VictoriaMetrics
VictoriaMetrics
Wazuh
Wazuh
Zabbix
Zabbix
### 数据库与数据仓库
BigQuery
BigQuery
ClickHouse
ClickHouse
Databend
Databend
MongoDB
MongoDB
MySQL
MySQL
PostgreSQL
PostgreSQL
Snowflake
Snowflake
### 通信平台
Discord
Discord
Google Chat
Google Chat
Mailgun
Mailgun
Mattermost
Mattermost
Ntfy.sh
Ntfy.sh
Pushover
Pushover
Resend
Resend
SendGrid
SendGrid
Slack
Slack
SMTP
SMTP
Telegram
Telegram
Twilio
Twilio
Teams
Teams
Zoom
Zoom
Zoom Chat
Zoom Chat
### 事件管理
Grafana Incident
Grafana Incident
Grafana OnCall
Grafana OnCall
Ilert
Ilert
Incident.io
Incident.io
AWS Incident Manager
AWS Incident Manager
OpsGenie
OpsGenie
PagerDuty
PagerDuty
Pagertree
Pagertree
SINGL4
SINGL4
Squadcast
Squadcast
Zenduty
Zenduty
Flashduty
Flashduty
### 工单管理工具
Asana
Asana
GitHub
GitHub
GitLab
GitLab
Jira
Jira
Linear
Linear
LinearB
LinearB
Microsoft Planner
Microsoft Planner
Monday
Monday
Redmine
Redmine
ServiceNow
ServiceNow
Trello
Trello
YouTrack
YouTrack
### 容器编排平台
Azure AKS
Azure AKS
ArgoCD
ArgoCD
Flux CD
Flux
GKE
GKE
Kubernetes
Kubernetes
OpenShift
OpenShift
### 数据丰富
Bash
Bash
OpenAI
OpenAI
Python
Python
QuickChart
QuickChart
SSH
SSH
Webhook
Webhook
### 工作流编排
Airflow
Airflow
### 队列
AmazonSQS
Amazon SQS
Kafka
Kafka
## 工作流 Keep 是针对您监控工具的 GitHub Actions。 Keep 工作流是一个声明式的 YAML 文件,用于自动化您的告警和事件管理。每个工作流包含: - **触发器** - 启动工作流的条件(告警、事件、计划或手动触发) - **步骤** - 读取或获取数据(丰富、上下文) - **操作** - 执行具体任务(更新工单、发送通知、重启服务器) 这是一个简单的工作流示例,它为来自 `sentry` 的针对 `payments` 和 `api` 服务的每个 `critical` 告警创建一个 Jira 工单。 有关更多工作流,请参见[此处](https://github.com/keephq/keep/tree/main/examples/workflows)。 ``` workflow: id: sentry-alerts description: create ticket alerts for critical alerts from sentry triggers: - type: alert # customize the filter to run only on critical alert from sentry filters: - key: source value: sentry - key: severity value: critical # regex to match specific services - key: service value: r"(payments|ftp)" actions: - name: send-slack-message-team-payments # if the alert is on the payments service, slack the payments team if: "'{{ alert.service }}' == 'payments'" provider: type: slack # control which Slack configuration you want to use config: " {{ providers.team-payments-slack }} " # customize the alert message with context from {{ alert }} or any other {{ step }} with: message: | "A new alert from Sentry: Alert: {{ alert.name }} - {{ alert.description }} {{ alert}}" - name: create-jira-ticket-oncall-board # control the workflow flow with "if" and "foreach" statements if: "'{{ alert.service }}' == 'ftp' and not '{{ alert.ticket_id }}'" provider: type: jira config: " {{ providers.jira }} " with: board_name: "Oncall Board" custom_fields: customfield_10201: "Critical" issuetype: "Task" # customize the summary summary: "{{ alert.name }} - {{ alert.description }} (created by Keep)" description: | "This ticket was created by Keep. Please check the alert details below: {code:json} {{ alert }} {code}" # enrich the alerts with more context. from now on, the alert will be assigned with the ticket id, type and url enrich_alert: - key: ticket_type value: jira - key: ticket_id value: results.issue.key - key: ticket_url value: results.ticket_url ``` ## 企业级就绪 - **开发者优先** - 现代化的 REST API、原生 SDK 和全面的文档,实现无缝集成 - **[企业安全](https://docs.keephq.dev/deployment/authentication/overview)** - 完整的认证支持(SSO、SAML、OIDC、LDAP),具备细粒度的访问控制(RBAC、ABAC)和团队管理功能 - **灵活部署** - 具有云无关架构,可部署在本地或物理隔离环境中 - **[生产级规模](https://docs.keephq.dev/deployment/stress-testing)** - 高可用性、经过性能测试的基础设施,支持企业级工作负载的水平扩展 ## 快速入门 Keep 可以在各种环境和配置中运行。最简单的启动方式是使用 Keep 的 Docker Compose。 - 在[本地](https://docs.keephq.dev/development/getting-started)运行 Keep。 - 在 [Kubernetes](https://docs.keephq.dev/deployment/kubernetes/installation) 上运行 Keep。 - 使用 [Docker](https://docs.keephq.dev/deployment/docker) 运行 Keep。 - 在 [AWS ECS](https://docs.keephq.dev/deployment/ecs) 上运行 Keep。 - 在 [OpenShift](https://docs.keephq.dev/deployment/kubernetes/openshift) 上运行 Keep。 ### 贡献者 感谢您的贡献,让 Keep 变得越来越好,你太棒了 🫶
标签:AIOps, API集成, IT运维, Socks5代理, SRE, 仪表盘, 偏差过滤, 单一面板, 双向集成, 可观测性, 告警丰富, 告警关联, 告警去重, 告警管理, 告警过滤, 子域名突变, 工作流自动化, 开源平台, 故障排查, 测试用例, 监控工具, 请求拦截, 逆向工具, 集中监控