weaviate/weaviate

GitHub: weaviate/weaviate

云原生开源向量数据库,融合语义搜索、结构化过滤与 RAG 能力,专为生产级 AI 应用而设计。

Stars: 15751 | Forks: 1203

# Weaviate Weaviate logo [![GitHub Repo stars](https://img.shields.io/github/stars/weaviate/weaviate?style=social)](https://github.com/weaviate/weaviate) [![Go Reference](https://pkg.go.dev/badge/github.com/weaviate/weaviate.svg)](https://pkg.go.dev/github.com/weaviate/weaviate) [![Build Status](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/156ea13f63230929.svg)](https://github.com/weaviate/weaviate/actions/workflows/.github/workflows/pull_requests.yaml) [![Go Report Card](https://goreportcard.com/badge/github.com/weaviate/weaviate)](https://goreportcard.com/report/github.com/weaviate/weaviate) [![Coverage Status](https://codecov.io/gh/weaviate/weaviate/branch/main/graph/badge.svg)](https://codecov.io/gh/weaviate/weaviate) [![Slack](https://img.shields.io/badge/slack--channel-blue?logo=slack)](https://weaviate.io/slack) **Weaviate** 是一个开源的云原生向量数据库,同时存储对象和向量,支持大规模语义搜索。它在单一查询接口中结合了向量相似性搜索、关键词过滤、检索增强生成(RAG)和重排序。常见的用例包括 RAG 系统、语义和图像搜索、推荐引擎、聊天机器人和内容分类。 Weaviate 支持两种存储向量的方法:使用[集成模型](https://docs.weaviate.io/weaviate/model-providers)(OpenAI、Cohere、HuggingFace 等)在导入时自动向量化,或直接导入[预计算的向量 Embedding](https://docs.weaviate.io/weaviate/starter-guides/custom-vectors)。生产部署受益于内置的多租户、副本、RBAC 授权以及[许多其他功能](#weaviate-features)。 为了快速入门,请查看以下教程之一: - [快速入门 - Weaviate Cloud](https://docs.weaviate.io/weaviate/quickstart) - [快速入门 - 本地 Docker 实例](https://docs.weaviate.io/weaviate/quickstart/local) ## 安装 Weaviate 提供多种安装和部署选项: - [Docker](https://docs.weaviate.io/deploy/installation-guides/docker-installation) - [Kubernetes](https://docs.weaviate.io/deploy/installation-guides/k8s-installation) - [Weaviate Cloud](https://console.weaviate.cloud) 请参阅[安装文档](https://docs.weaviate.io/deploy)了解更多部署选项,例如 [AWS](https://docs.weaviate.io/deploy/installation-guides/aws-marketplace) 和 [GCP](https://docs.weaviate.io/deploy/installation-guides/gcp-marketplace)。 ## 入门指南 您可以使用 [Docker](https://docs.docker.com/desktop/) 轻松启动 Weaviate 和一个本地向量 Embedding 模型。 创建一个 `docker-compose.yml` 文件: ``` services: weaviate: image: cr.weaviate.io/semitechnologies/weaviate:1.36.0 ports: - "8080:8080" - "50051:50051" environment: ENABLE_MODULES: text2vec-model2vec MODEL2VEC_INFERENCE_API: http://text2vec-model2vec:8080 # A lightweight embedding model that will generate vectors from objects during import text2vec-model2vec: image: cr.weaviate.io/semitechnologies/model2vec-inference:minishlab-potion-base-32M ``` 使用以下命令启动 Weaviate 和 Embedding 服务: ``` docker compose up -d ``` 安装 Python 客户端(或使用其他[客户端库](#client-libraries-and-apis)): ``` pip install -U weaviate-client ``` 以下 Python 示例展示了在 Weaviate 数据库中填充数据、创建向量 Embedding 并执行语义搜索是多么简单: ``` import weaviate from weaviate.classes.config import Configure, DataType, Property # 连接到 Weaviate client = weaviate.connect_to_local() # 创建 collection client.collections.create( name="Article", properties=[Property(name="content", data_type=DataType.TEXT)], vector_config=Configure.Vectors.text2vec_model2vec(), # Use a vectorizer to generate embeddings during import # vector_config=Configure.Vectors.self_provided() # If you want to import your own pre-generated embeddings ) # 插入对象并生成 embeddings articles = client.collections.get("Article") articles.data.insert_many( [ {"content": "Vector databases enable semantic search"}, {"content": "Machine learning models generate embeddings"}, {"content": "Weaviate supports hybrid search capabilities"}, ] ) # 执行语义搜索 results = articles.query.near_text(query="Search objects by meaning", limit=1) print(results.objects[0]) client.close() ``` 此示例使用 `Model2Vec` 向量化器,但您可以选择任何其他 [Embedding 模型提供商](https://docs.weaviate.io/weaviate/model-providers)或[自带预生成的向量](https://docs.weaviate.io/weaviate/starter-guides/custom-vectors)。 ## 客户端库和 API Weaviate 为多种编程语言提供客户端库: - [Python](https://docs.weaviate.io/weaviate/client-libraries/python) - [JavaScript/TypeScript](https://docs.weaviate.io/weaviate/client-libraries/typescript) - [Java](https://docs.weaviate.io/weaviate/client-libraries/java) - [Go](https://docs.weaviate.io/weaviate/client-libraries/go) - [C#/.NET](https://docs.weaviate.io/weaviate/client-libraries/csharp) 还有额外的[社区维护库](https://docs.weaviate.io/weaviate/client-libraries/community)。 Weaviate 暴露 [REST API](https://docs.weaviate.io/weaviate/api/rest)、[gRPC API](https://docs.weaviate.io/weaviate/api/grpc) 和 [GraphQL API](https://docs.weaviate.io/weaviate/api/graphql) 用于与数据库服务器通信。 ## Weaviate 功能 这些功能使您能够构建 AI 驱动的应用程序: - **⚡ 快速搜索性能**:在毫秒级内对数十亿个向量执行复杂的语义[搜索](https://docs.weaviate.io/weaviate/search/similarity)。Weaviate 的架构采用 Go 语言构建,旨在实现速度和可靠性,确保您的 AI 应用程序即使在高负载下也能高度响应。有关更多信息,请参阅我们的 [ANN 基准测试](https://docs.weaviate.io/weaviate/benchmarks/ann)。 - **🔌 灵活的向量化**:在导入时使用来自 OpenAI、Cohere、HuggingFace、Google 等的[集成向量化器](https://docs.weaviate.io/weaviate/model-providers)无缝地对数据进行向量化。或者,您可以导入[自己的向量 Embedding](https://docs.weaviate.io/weaviate/starter-guides/custom-vectors)。 - **🔍 高级混合和图像搜索**:将语义搜索的强大功能与传统的[关键词 (BM25) 搜索](https://docs.weaviate.io/weaviate/search/bm25)、[图像搜索](https://docs.weaviate.io/weaviate/search/image)和[高级过滤](https://docs.weaviate.io/weaviate/search/filters)相结合,通过单次 API 调用获得最佳结果。 - **🤖 集成 RAG 和重排序**:通过内置的[生成式搜索 (RAG)](https://docs.weaviate.io/weaviate/search/generative)和[重排序](https://docs.weaviate.io/weaviate/search/rerank)功能,超越简单的检索。直接从您的数据库支持复杂的问答系统、聊天机器人和摘要器,无需额外的工具。 - **📈 生产就绪且可扩展**:Weaviate 专为关键任务应用程序而构建。通过对[水平扩展](https://docs.weaviate.io/deploy/configuration/horizontal-scaling)、[多租户](https://docs.weaviate.io/weaviate/manage-collections/multi-tenancy)、[副本](https://docs.weaviate.io/deploy/configuration/replication)和细粒度的[基于角色的访问控制 (RBAC)]()的原生支持,从快速原型设计迈向大规模生产。 - **💰 具成本效益的运营**:通过内置的[向量压缩](https://docs.weaviate.io/weaviate/configuration/compression)大幅降低资源消耗和运营成本。向量量化和多向量编码减少了内存使用,同时对搜索性能的影响微乎其微。 - **⏱️ 对象 TTL**:通过每个集合可配置的[生存时间 (TTL)](https://docs.weaviate.io/weaviate/manage-collections/time-to-live)设置自动过期和删除陈旧数据,并完全支持 RBAC 和多租户。 有关所有功能的完整列表,请访问[官方 Weaviate 文档](https://docs.weaviate.io)。 ## 实用资源 ### AI Agent Skills [Weaviate Agent Skills](https://github.com/weaviate/agent-skills) 是 AI 编码 Agent(Claude Code、Cursor、GitHub Copilot 等)的技能集合,使它们能够更准确、高效地使用 Weaviate。技能涵盖搜索、查询、集合管理、数据导入和完整的应用程序蓝图(RAG、Agentic RAG、聊天机器人等)。 安装方式: ``` npx skills add weaviate/agent-skills ``` ### 演示项目和配方 这些演示是突显 Weaviate 部分功能的工作应用程序。其源代码可在 GitHub 上获取。 - [Elysia](https://elysia.weaviate.io) ([GitHub](https://github.com/weaviate/elysia)):Elysia 是一个基于决策树的 Agentic 系统,它能智能地决定使用哪些工具,获得了哪些结果,是否应该继续该过程,或者其目标是否已完成。 - [Verba](https://weaviate.io/blog/verba-open-source-rag-app) ([GitHub](https://github.com/weaviate/verba)):一个社区驱动的开源应用程序,旨在开箱即用地为检索增强生成(RAG)提供端到端、简化且用户友好的界面。 - [Healthsearch](https://weaviate.io/blog/healthsearch-demo) ([GitHub](https://github.com/weaviate/healthsearch-demo)):一个开源项目,旨在展示利用用户撰写的评论和查询来检索基于特定健康效果的补充剂产品的潜力。 - Awesome-Moviate ([GitHub](https://github.com/weaviate-tutorials/awesome-moviate)):一个电影搜索和推荐引擎,支持关键词(BM25)、语义和混合搜索。 我们还维护着大量的 **Jupyter Notebooks** 和 **TypeScript 代码片段** 库,涵盖如何使用 Weaviate 的功能和集成: - [Weaviate Python Recipes](https://github.com/weaviate/recipes/) - [Weaviate TypeScript Recipes](https://github.com/weaviate/recipes-ts/) ### 博客文章 - [什么是向量数据库](https://weaviate.io/blog/what-is-a-vector-database) - [什么是向量搜索](https://weaviate.io/blog/vector-search-explained) - [什么是混合搜索](https://weaviate.io/blog/hybrid-search-explained) - [如何选择 Embedding 模型](https://weaviate.io/blog/how-to-choose-an-embedding-model) - [什么是 RAG](https://weaviate.io/blog/introduction-to-rag) - [RAG 评估](https://weaviate.io/blog/rag-evaluation) - [高级 RAG 技术](https://weaviate.io/blog/advanced-rag) - [什么是多模态 RAG](https://weaviate.io/blog/multimodal-rag) - [什么是 Agentic RAG](https://weaviate.io/blog/what-is-agentic-rag) - [什么是 Graph RAG](https://weaviate.io/blog/graph-rag) - [后期交互模型概述](https://weaviate.io/blog/late-interaction-overview) ### 集成 Weaviate 与许多外部服务集成: | 类别 | 描述 | 集成 | | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **[云超大规模服务商](https://docs.weaviate.io/integrations/cloud-hyperscalers)** | 大规模计算和存储 | [AWS](https://docs.weaviate.io/integrations/cloud-hyperscalers/aws), [Google](https://docs.weaviate.io/integrations/cloud-hyperscalers/google) | | **[计算基础设施](https://docs.weaviate.io/integrations/compute-infrastructure)** | 运行和扩展容器化应用程序 | [Modal](https://docs.weaviate.io/integrations/compute-infrastructure/modal), [Replicate](https://docs.weaviate.io/integrations/compute-infrastructure/replicate), [Replicated](https://docs.weaviate.io/integrations/compute-infrastructure/replicated) | | **[数据平台](https://docs.weaviate.io/integrations/data-platforms)** | 数据摄取和网页抓取 | [Airbyte](https://docs.weaviate.io/integrations/data-platforms/airbyte), [Aryn](https://docs.weaviate.io/integrations/data-platforms/aryn), [Boomi](https://docs.weaviate.io/integrations/data-platforms/boomi), [Box](https://docs.weaviate.io/integrations/data-platforms/box), [Confluent](https://docs.weaviate.io/integrations/data-platforms/confluent), [Astronomer](https://docs.weaviate.io/integrations/data-platforms/astronomer), [Context Data](https://docs.weaviate.io/integrations/data-platforms/context-data), [Databricks](https://docs.weaviate.io/integrations/data-platforms/databricks), [Firecrawl](https://docs.weaviate.io/integrations/data-platforms/firecrawl), [IBM](https://docs.weaviate.io/integrations/data-platforms/ibm), [Unstructured](https://docs.weaviate.io/integrations/data-platforms/unstructured) | | **[LLM 和 Agent 框架](https://docs.weaviate.io/integrations/llm-agent-frameworks)** | 构建 Agent 和生成式 AI 应用程序 | [Agno](https://docs.weaviate.io/integrations/llm-agent-frameworks/agno), [Composio](https://docs.weaviate.io/integrations/llm-agent-frameworks/composio), [CrewAI](https://docs.weaviate.io/integrations/llm-agent-frameworks/crewai), [DSPy](https://docs.weaviate.io/integrations/llm-agent-frameworks/dspy), [Dynamiq](https://docs.weaviate.io/integrations/llm-agent-frameworks/dynamiq), [Haystack](https://docs.weaviate.io/integrations/llm-agent-frameworks/haystack), [LangChain](https://docs.weaviate.io/integrations/llm-agent-frameworks/langchain), [LlamaIndex](https://docs.weaviate.io/integrations/llm-agent-frameworks/llamaindex), [N8n](https://docs.weaviate.io/integrations/llm-agent-frameworks/n8n), [Semantic Kernel](https://docs.weaviate.io/integrations/llm-agent-frameworks/semantic-kernel) | | **[运营](https://docs.weaviate.io/integrations/operations)** | 用于监控和分析生成式 AI 工作流的工具 | [AIMon](https://docs.weaviate.io/integrations/operations/aimon), [Arize](https://docs.weaviate.io/integrations/operations/arize), [Cleanlab](https://docs.weaviate.io/integrations/operations/cleanlab), [Comet](https://docs.weaviate.io/integrations/operations/comet), [DeepEval](https://docs.weaviate.io/integrations/operations/deepeval), [Langtrace](https://docs.weaviate.io/integrations/operations/langtrace), [LangWatch](https://docs.weaviate.io/integrations/operations/langwatch), [Nomic](https://docs.weaviate.io/integrations/operations/nomic), [Patronus AI](https://docs.weaviate.io/integrations/operations/patronus), [Ragas](https://docs.weaviate.io/integrations/operations/ragas), [TruLens](https://docs.weaviate.io/integrations/operations/trulens), [Weights & Biases](https://docs.weaviate.io/integrations/operations/wandb) | ## 贡献 我们欢迎并感谢您的贡献!请参阅我们的[贡献者指南](https://docs.weaviate.io/contributor-guide)以了解开发设置、代码风格指南、测试要求和拉取请求流程。 加入我们的 [Slack 社区](https://weaviate.io/slack)或[社区论坛](https://forum.weaviate.io/)讨论想法并获取帮助。 ## 许可证 BSD 3-Clause 许可证。有关详细信息,请参阅 [LICENSE](./LICENSE)。
标签:AI基础设施, Apex, DLL 劫持, EVTX分析, Go语言, LLM, NLP, Python工具, RAG, RBAC, Unmanaged PE, Weaviate, 云原生数据库, 人工智能, 可扩展性, 向量搜索, 向量数据库, 图像搜索, 大语言模型, 子域名突变, 嵌入式向量, 开源数据库, 推荐系统, 日志审计, 机器学习, 检索增强生成, 混合搜索, 生成式AI, 用户模式Hook绕过, 相似度搜索, 程序破解, 索引, 语义化存储, 语义搜索, 请求拦截, 逆向工具