HolmesGPT/holmesgpt

GitHub: HolmesGPT/holmesgpt

开源 AI 驱动的 SRE Agent,用于自动化调查生产事故并从多源可观测性数据中定位根本原因。

Stars: 1948 | Forks: 251

HolmesGPT — CNCF SRE Agent

Installation | Docs | Ask DeepWiki

开源 AI Agent,用于调查生产事故并查找根本原因。我们是 [Cloud Native Computing Foundation](https://www.cncf.io/) 沙箱项目。最初由 [Robusta.Dev](http://robusta.dev) 创建,并获得了 [Microsoft](https://microsoft.com/) 的重要贡献。 - **PB 级数据规模**:服务端过滤、JSON 树遍历和工具输出转换器使大型负载远离上下文窗口 - **[深度集成](https://holmesgpt.dev/data-sources/builtin-toolsets/)**:Prometheus、Grafana、Datadog、Kubernetes 以及[更多](#-data-sources)——还包括任何 [REST API](https://holmesgpt.dev/data-sources/api-toolsets/) - **双向告警集成**:从 AlertManager、PagerDuty、OpsGenie 或 Jira 获取告警——并将分析结果回写 - **[任意 LLM 提供商](https://holmesgpt.dev/ai-providers/)**:OpenAI、Anthropic、Azure、Bedrock、Gemini 等 - **[Operator 模式](https://holmesgpt.dev/operator/)**:作为 Kubernetes Operator 按计划运行调查 ## 工作原理 HolmesGPT 使用 **Agentic Loop** 从多个来源查询实时可观测性数据并识别根本原因。 holmesgpt-architecture-diagram ![HolmesGPT 调查演示](https://holmesgpt.dev/assets/HolmesInvestigation.gif) ### 🔗 数据源 HolmesGPT 与主流的可观测性和云平台集成。以下数据源(“toolsets”)是内置的。[添加您自己的](https://holmesgpt.dev/data-sources/custom-toolsets/)。 | 数据源 | 说明 | |-------------|-------| | [AKS **AKS**](https://holmesgpt.dev/data-sources/builtin-toolsets/aks/) | Azure Kubernetes Service 集群和节点健康诊断 | | [ArgoCD **ArgoCD**](https://holmesgpt.dev/data-sources/builtin-toolsets/argocd/) | 获取应用、项目和集群的状态、历史、清单等 | | [AWS **AWS**](https://holmesgpt.dev/data-sources/builtin-toolsets/aws/) | RDS 事件、实例、慢查询日志等 (MCP) | | [Azure **Azure**](https://holmesgpt.dev/data-sources/builtin-toolsets/azure-mcp/) | Azure 资源和诊断 (MCP) | | [Azure SQL **Azure SQL**](https://holmesgpt.dev/data-sources/builtin-toolsets/azure-sql/) | 数据库健康、性能、连接和慢查询 | | [Confluence **Confluence**](https://holmesgpt.dev/data-sources/builtin-toolsets/confluence/) | 私有 Runbook 和文档 | | [Coralogix **Coralogix**](https://holmesgpt.dev/data-sources/builtin-toolsets/coralogix-logs/) | 检索任意资源的日志 | | [Datadog **Datadog**](https://holmesgpt.dev/data-sources/builtin-toolsets/datadog/) | 查询日志、指标和链路 | | [Docker **Docker**](https://holmesgpt.dev/data-sources/builtin-toolsets/docker/) | 获取镜像、日志、事件、历史等 | | [Elasticsearch **Elasticsearch / OpenSearch**](https://holmesgpt.dev/data-sources/builtin-toolsets/elasticsearch/) | 查询日志、集群健康、分片和索引诊断 | | [GCP **GCP**](https://holmesgpt.dev/data-sources/builtin-toolsets/gcp/) | Google Cloud Platform 资源 (MCP) | | [GitHub **GitHub**](https://holmesgpt.dev/data-sources/builtin-toolsets/github-mcp/) | 仓库、Issue 和 Pull Request (MCP) | | [Grafana **Grafana**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanadashboards/) | 查询和分析 Dashboard 配置和面板 | | [Helm **Helm**](https://holmesgpt.dev/data-sources/builtin-toolsets/helm/) | Release 状态、Chart 元数据和值 | | [Internet **Internet**](https://holmesgpt.dev/data-sources/builtin-toolsets/internet/) | 公开 Runbook、社区文档等 | | [Kafka **Kafka**](https://holmesgpt.dev/data-sources/builtin-toolsets/kafka/) | 获取元数据、列出消费者和主题或查找滞后的消费者组 | | [Kubernetes **Kubernetes**](https://holmesgpt.dev/data-sources/builtin-toolsets/kubernetes/) | Pod 日志、K8s 事件和资源状态 (kubectl describe) | | [Loki **Loki**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanaloki/) | 查询 Kubernetes 资源日志或任意查询 | | [MariaDB **MariaDB**](https://holmesgpt.dev/data-sources/builtin-toolsets/mariadb-mcp/) | MariaDB 数据库查询和诊断 (MCP) | | [MongoDB Atlas **MongoDB Atlas**](https://holmesgpt.dev/data-sources/builtin-toolsets/mongodb-atlas/) | 集群健康、慢查询和性能诊断 | | [NewRelic **NewRelic**](https://holmesgpt.dev/data-sources/builtin-toolsets/newrelic/) | 调查告警、查询链路追踪数据 | | [OpenShift **OpenShift**](https://holmesgpt.dev/data-sources/builtin-toolsets/openshift/) | Projects、Routes、Builds、Security Context Constraints 和 Deployment Configs | | [Prometheus **Prometheus**](https://holmesgpt.dev/data-sources/builtin-toolsets/prometheus/) | 调查告警、查询指标并生成 PromQL 查询 | | [RabbitMQ **RabbitMQ**](https://holmesgpt.dev/data-sources/builtin-toolsets/rabbitmq/) | 分区、内存/磁盘告警、排查脑裂场景等 | | [Robusta **Robusta**](https://holmesgpt.dev/data-sources/builtin-toolsets/robusta/) | 多集群监控、历史变更数据、Runbook、PromQL 图表等 | | [ServiceNow **ServiceNow**](https://holmesgpt.dev/data-sources/builtin-toolsets/servicenow/) | 查询表和事件记录 | | **Sentry** | 错误跟踪、Issue 和性能监控 (MCP) | | [Slab **Slab**](https://holmesgpt.dev/data-sources/builtin-toolsets/slab/) | 团队知识库和按需 Runbook | | **Splunk** | 日志搜索和分析 (MCP) | | [SQL Databases **SQL Databases**](https://holmesgpt.dev/data-sources/builtin-toolsets/database-postgresql/) | PostgreSQL、MySQL、ClickHouse、MariaDB、SQL Server、SQLite | | [Tempo **Tempo**](https://holmesgpt.dev/data-sources/builtin-toolsets/grafanatempo/) | 获取链路信息,调试应用高延迟等问题 | 请参阅[内置工具集完整列表](https://holmesgpt.dev/data-sources/builtin-toolsets/),了解包括 Cilium、KubeVela、Notion、Prefect 等在内的更多集成。 ### 🚀 端到端自动化 HolmesGPT 可以从外部系统获取告警/工单进行调查,然后将分析结果回写到来源或 Slack。 | 集成 | 状态 | 说明 | |-------------------------|-----------|-------| | Slack | ✅ | [演示。](https://www.loom.com/share/afcd81444b1a4adfaa0bbe01c37a4847) 通过 [Robusta.dev](https://home.robusta.dev/)(商业平台)提供 | | Microsoft Teams | ✅ | 通过 [Robusta.dev](https://home.robusta.dev/)(商业平台)提供 | | Prometheus/AlertManager | ✅ | Robusta SaaS 或 HolmesGPT CLI | | PagerDuty | ✅ | 仅限 HolmesGPT CLI | | OpsGenie | ✅ | 仅限 HolmesGPT CLI | | Jira | ✅ | 仅限 HolmesGPT CLI | | GitHub | ✅ | 仅限 HolmesGPT CLI | ## 安装 All Installation Methods 阅读 [安装文档](https://holmesgpt.dev/installation/cli-installation/) 了解如何安装 HolmesGPT。 ## 支持的 LLM 提供商 All Integration Providers 阅读 [LLM 提供商文档](https://holmesgpt.dev/ai-providers/) 了解如何设置您的 LLM API Key。 ## 使用 HolmesGPT 请参阅[演练文档](https://holmesgpt.dev/walkthrough/)获取使用指南,包括: - 用于提问和后续追问的[交互模式](https://holmesgpt.dev/walkthrough/interactive-mode/) - [调查 Prometheus 告警](https://holmesgpt.dev/walkthrough/investigating-prometheus-alerts/) - [CI/CD 故障排查](https://holmesgpt.dev/walkthrough/cicd-troubleshooting/) ## 🔐 数据隐私 根据设计,HolmesGPT 具有**只读访问权限**并遵守 RBAC 权限。在生产环境中运行是安全的。 ## 许可证 在 Apache 2.0 许可证下分发。请参阅 [LICENSE](https://github.com/HolmesGPT/holmesgpt/blob/master/LICENSE) 了解更多信息。 ## 如何贡献 请阅读我们的 [CONTRIBUTING.md](CONTRIBUTING.md) 了解指南和说明。 如需帮助,请在 [Slack](https://cloud-native.slack.com/archives/C0A1SPQM5PZ) 上联系我们或向 [DeepWiki AI](https://deepwiki.com/HolmesGPT/holmesgpt) 提问。 请务必遵守 CNCF 行为准则 - [详情见此](https://github.com/HolmesGPT/holmesgpt/blob/master/CODE_OF_CONDUCT.md)。 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/HolmesGPT/holmesgpt) [![OpenSSF Best Practices](https://www.bestpractices.dev/projects/11586/badge)](https://www.bestpractices.dev/projects/11586) [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/HolmesGPT/holmesgpt/badge)](https://scorecard.dev/viewer/?uri=github.com/HolmesGPT/holmesgpt)
标签:Agent, AIOps, AI助手, API集成, C2, CNCF, DLL 劫持, DNS解析, Grafana, HolmesGPT, LLM, SRE, Unmanaged PE, web渗透, 企业安全, 偏差过滤, 可观测性, 大语言模型, 开源项目, 报警系统, 故障排查, 根因分析, 监控, 网络资产管理, 自定义请求头, 请求拦截, 运维自动化, 逆向工具