开源 AI Agent,用于调查生产事故并查找根本原因。我们是 [Cloud Native Computing Foundation](https://www.cncf.io/) 沙箱项目。最初由 [Robusta.Dev](http://robusta.dev) 创建,并获得了 [Microsoft](https://microsoft.com/) 的重要贡献。
- **PB 级数据规模**:服务端过滤、JSON 树遍历和工具输出转换器使大型负载远离上下文窗口
- **[深度集成](https://holmesgpt.dev/data-sources/builtin-toolsets/)**:Prometheus、Grafana、Datadog、Kubernetes 以及[更多](#-data-sources)——还包括任何 [REST API](https://holmesgpt.dev/data-sources/api-toolsets/)
- **双向告警集成**:从 AlertManager、PagerDuty、OpsGenie 或 Jira 获取告警——并将分析结果回写
- **[任意 LLM 提供商](https://holmesgpt.dev/ai-providers/)**:OpenAI、Anthropic、Azure、Bedrock、Gemini 等
- **[Operator 模式](https://holmesgpt.dev/operator/)**:作为 Kubernetes Operator 按计划运行调查
## 工作原理
HolmesGPT 使用 **Agentic Loop** 从多个来源查询实时可观测性数据并识别根本原因。


### 🔗 数据源
HolmesGPT 与主流的可观测性和云平台集成。以下数据源(“toolsets”)是内置的。[添加您自己的](https://holmesgpt.dev/data-sources/custom-toolsets/)。
| 数据源 | 说明 |
|-------------|-------|
| [

**AKS**](https://holmesgpt.dev/data-sources/builtin-toolsets/aks/) | Azure Kubernetes Service 集群和节点健康诊断 |
| [

**ArgoCD**](https://holmesgpt.dev/data-sources/builtin-toolsets/argocd/) | 获取应用、项目和集群的状态、历史、清单等 |
| [

**AWS**](https://holmesgpt.dev/data-sources/builtin-toolsets/aws/) | RDS 事件、实例、慢查询日志等 (MCP) |
| [

**Azure**](https://holmesgpt.dev/data-sources/builtin-toolsets/azure-mcp/) | Azure 资源和诊断 (MCP) |
| [

**Azure SQL**](https://holmesgpt.dev/data-sources/builtin-toolsets/azure-sql/) | 数据库健康、性能、连接和慢查询 |
| [

**Confluence**](https://holmesgpt.dev/data-sources/builtin-toolsets/confluence/) | 私有 Runbook 和文档 |
| [

**Coralogix**](https://holmesgpt.dev/data-sources/builtin-toolsets/coralogix-logs/) | 检索任意资源的日志 |
| [

**Datadog**](https://holmesgpt.dev/data-sources/builtin-toolsets/datadog/) | 查询日志、指标和链路 |
| [

**Docker**](https://holmesgpt.dev/data-sources/builtin-toolsets/docker/) | 获取镜像、日志、事件、历史等 |
| [

**Elasticsearch / OpenSearch**](https://holmesgpt.dev/data-sources/builtin-toolsets/elasticsearch/) | 查询日志、集群健康、分片和索引诊断 |
| [

**GCP**](https://holmesgpt.dev/data-sources/builtin-toolsets/gcp/) | Google Cloud Platform 资源 (MCP) |
| [

**GitHub**](https://holmesgpt.dev/data-sources/builtin-toolsets/github-mcp/) | 仓库、Issue 和 Pull Request (MCP) |
| [

**Grafana**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanadashboards/) | 查询和分析 Dashboard 配置和面板 |
| [

**Helm**](https://holmesgpt.dev/data-sources/builtin-toolsets/helm/) | Release 状态、Chart 元数据和值 |
| [

**Internet**](https://holmesgpt.dev/data-sources/builtin-toolsets/internet/) | 公开 Runbook、社区文档等 |
| [

**Kafka**](https://holmesgpt.dev/data-sources/builtin-toolsets/kafka/) | 获取元数据、列出消费者和主题或查找滞后的消费者组 |
| [

**Kubernetes**](https://holmesgpt.dev/data-sources/builtin-toolsets/kubernetes/) | Pod 日志、K8s 事件和资源状态 (kubectl describe) |
| [

**Loki**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanaloki/) | 查询 Kubernetes 资源日志或任意查询 |
| [

**MariaDB**](https://holmesgpt.dev/data-sources/builtin-toolsets/mariadb-mcp/) | MariaDB 数据库查询和诊断 (MCP) |
| [

**MongoDB Atlas**](https://holmesgpt.dev/data-sources/builtin-toolsets/mongodb-atlas/) | 集群健康、慢查询和性能诊断 |
| [

**NewRelic**](https://holmesgpt.dev/data-sources/builtin-toolsets/newrelic/) | 调查告警、查询链路追踪数据 |
| [

**OpenShift**](https://holmesgpt.dev/data-sources/builtin-toolsets/openshift/) | Projects、Routes、Builds、Security Context Constraints 和 Deployment Configs |
| [

**Prometheus**](https://holmesgpt.dev/data-sources/builtin-toolsets/prometheus/) | 调查告警、查询指标并生成 PromQL 查询 |
| [

**RabbitMQ**](https://holmesgpt.dev/data-sources/builtin-toolsets/rabbitmq/) | 分区、内存/磁盘告警、排查脑裂场景等 |
| [

**Robusta**](https://holmesgpt.dev/data-sources/builtin-toolsets/robusta/) | 多集群监控、历史变更数据、Runbook、PromQL 图表等 |
| [

**ServiceNow**](https://holmesgpt.dev/data-sources/builtin-toolsets/servicenow/) | 查询表和事件记录 |
| **Sentry** | 错误跟踪、Issue 和性能监控 (MCP) |
| [

**Slab**](https://holmesgpt.dev/data-sources/builtin-toolsets/slab/) | 团队知识库和按需 Runbook |
| **Splunk** | 日志搜索和分析 (MCP) |
| [

**SQL Databases**](https://holmesgpt.dev/data-sources/builtin-toolsets/database-postgresql/) | PostgreSQL、MySQL、ClickHouse、MariaDB、SQL Server、SQLite |
| [

**Tempo**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanatempo/) | 获取链路信息,调试应用高延迟等问题 |
请参阅[内置工具集完整列表](https://holmesgpt.dev/data-sources/builtin-toolsets/),了解包括 Cilium、KubeVela、Notion、Prefect 等在内的更多集成。
### 🚀 端到端自动化
HolmesGPT 可以从外部系统获取告警/工单进行调查,然后将分析结果回写到来源或 Slack。
| 集成 | 状态 | 说明 |
|-------------------------|-----------|-------|
| Slack | ✅ | [演示。](https://www.loom.com/share/afcd81444b1a4adfaa0bbe01c37a4847) 通过 [Robusta.dev](https://home.robusta.dev/)(商业平台)提供 |
| Microsoft Teams | ✅ | 通过 [Robusta.dev](https://home.robusta.dev/)(商业平台)提供 |
| Prometheus/AlertManager | ✅ | Robusta SaaS 或 HolmesGPT CLI |
| PagerDuty | ✅ | 仅限 HolmesGPT CLI |
| OpsGenie | ✅ | 仅限 HolmesGPT CLI |
| Jira | ✅ | 仅限 HolmesGPT CLI |
| GitHub | ✅ | 仅限 HolmesGPT CLI |
## 安装
阅读 [安装文档](https://holmesgpt.dev/installation/cli-installation/) 了解如何安装 HolmesGPT。
## 支持的 LLM 提供商
阅读 [LLM 提供商文档](https://holmesgpt.dev/ai-providers/) 了解如何设置您的 LLM API Key。
## 使用 HolmesGPT
请参阅[演练文档](https://holmesgpt.dev/walkthrough/)获取使用指南,包括:
- 用于提问和后续追问的[交互模式](https://holmesgpt.dev/walkthrough/interactive-mode/)
- [调查 Prometheus 告警](https://holmesgpt.dev/walkthrough/investigating-prometheus-alerts/)
- [CI/CD 故障排查](https://holmesgpt.dev/walkthrough/cicd-troubleshooting/)
## 🔐 数据隐私
根据设计,HolmesGPT 具有**只读访问权限**并遵守 RBAC 权限。在生产环境中运行是安全的。
## 许可证
在 Apache 2.0 许可证下分发。请参阅 [LICENSE](https://github.com/HolmesGPT/holmesgpt/blob/master/LICENSE) 了解更多信息。
## 如何贡献
请阅读我们的 [CONTRIBUTING.md](CONTRIBUTING.md) 了解指南和说明。
如需帮助,请在 [Slack](https://cloud-native.slack.com/archives/C0A1SPQM5PZ) 上联系我们或向 [DeepWiki AI](https://deepwiki.com/HolmesGPT/holmesgpt) 提问。
请务必遵守 CNCF 行为准则 - [详情见此](https://github.com/HolmesGPT/holmesgpt/blob/master/CODE_OF_CONDUCT.md)。
[](https://deepwiki.com/HolmesGPT/holmesgpt)
[](https://www.bestpractices.dev/projects/11586)
[](https://scorecard.dev/viewer/?uri=github.com/HolmesGPT/holmesgpt)