quickwit-oss/tantivy

GitHub: quickwit-oss/tantivy

用 Rust 编写的全文搜索引擎库，受 Apache Lucene 启发，为开发者提供高性能、可嵌入的搜索能力。

Stars: 15469 | Forks: 934

[![Docs](https://docs.rs/tantivy/badge.svg)](https://docs.rs/crate/tantivy/) [![Build Status](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/825be96f79135433.svg)](https://github.com/quickwit-oss/tantivy/actions/workflows/test.yml) [![codecov](https://codecov.io/gh/quickwit-oss/tantivy/branch/main/graph/badge.svg)](https://codecov.io/gh/quickwit-oss/tantivy) [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/quickwit-oss/tantivy/badge)](https://scorecard.dev/viewer/?uri=github.com/quickwit-oss/tantivy) [![Join the chat at https://discord.gg/MT27AG5EVE](https://shields.io/discord/908281611840282624?label=chat%20on%20discord)](https://discord.gg/MT27AG5EVE) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy)

## 使用 Rust 编写的快速全文搜索引擎库 **如果你正在寻找 Elasticsearch 或 Apache Solr 的替代方案，请查看 [Quickwit](https://github.com/quickwit-oss/quickwit)，这是我们基于 Tantivy 构建的分布式搜索引擎。** Tantivy 更接近于 [Apache Lucene](https://lucene.apache.org/)，而不是 [Elasticsearch](https://www.elastic.co/products/elasticsearch) 或 [Apache Solr](https://lucene.apache.org/solr/)，从某种意义上说，它不是一个开箱即用的搜索引擎服务器，而是一个可以用来构建此类搜索引擎的 crate。事实上，Tantivy 的设计深受 Lucene 的启发。 ## 基准测试以下[基准测试](https://tantivy-search.github.io/bench/)详细展示了不同查询/集合类型的性能表现。实际性能将因查询的性质及其负载而异。有关基准测试的详细信息可在该[仓库](https://github.com/quickwit-oss/search-benchmark-game)中找到。 ## 功能特性 - 全文搜索 - 可配置的分词器（支持 17 种拉丁语系的词干提取），通过第三方支持中文（[tantivy-jieba](https://crates.io/crates/tantivy-jieba) 和 [cang-jie](https://crates.io/crates/cang-jie)）、日文（[lindera](https://github.com/lindera-morphology/lindera-tantivy)、[Vaporetto](https://crates.io/crates/vaporetto_tantivy) 和 [tantivy-tokenizer-tiny-segmenter](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)）以及韩文（[lindera](https://github.com/lindera-morphology/lindera-tantivy) + [lindera-ko-dic-builder](https://github.com/lindera-morphology/lindera-ko-dic-builder)） - 高速（请查看 :racehorse: :sparkles: [基准测试](https://tantivy-search.github.io/bench/) :sparkles: :racehorse:） - 极短的启动时间（<10ms），非常适合命令行工具 - BM25 评分（与 Lucene 相同） - 自然查询语言（例如 `(michael AND jackson) OR "king of pop"`） - 短语查询搜索（例如 `"michael jackson"`） - 增量索引 - 多线程索引（在我的台式机上，索引英文 Wikipedia 耗时不到 3 分钟） - Mmap 目录 - 当平台/CPU 支持 SSE2 指令集时使用 SIMD 整数压缩 - 单值和多值的 u64、i64 和 f64 快速字段（相当于 Lucene 中的 doc values） - `&[u8]` 快速字段 - 文本、i64、u64、f64、日期、IP、布尔值和层级 facet 字段 - 压缩文档存储（LZ4、Zstd、无压缩） - 范围查询 - Faceted 搜索 - 可配置的索引（可选的词频和位置索引） - JSON 字段 - 聚合收集器：直方图、范围桶、平均值和统计指标 - 带删除的 LogMergePolicy - Searcher Warmer API - 带有马匹的俗气 Logo ### 不包含的功能分布式搜索不在 Tantivy 的范围内，但如果你正在寻找此功能，请查看 [Quickwit](https://github.com/quickwit-oss/quickwit/)。 ## 快速上手 Tantivy 可在 Rust 稳定版上运行，并支持 Linux、macOS 和 Windows。 - [Tantivy 简单搜索示例](https://tantivy-search.github.io/examples/basic_search.html) - [tantivy-cli 及其教程](https://github.com/quickwit-oss/tantivy-cli) - `tantivy-cli` 是一个实际的命令行界面，可让你轻松创建搜索引擎、索引文档，并通过 CLI 或带有 REST API 的小型服务器进行搜索。它将引导你在几分钟内启动并运行一个 Wikipedia 搜索引擎。 - [最新发布版本的参考文档](https://docs.rs/tantivy/) ## 如何支持本项目？支持本项目的方式有很多。 - 使用 Tantivy 并在 [Discord](https://discord.gg/MT27AG5EVE) 或通过电子邮件 (paul.masurel@gmail.com) 告诉我们你的使用体验 - 报告 Bug - 撰写博客文章 - 通过提问或提交 PR 来协助完善文档 - 贡献代码（你可以加入[我们的 Discord 服务器](https://discord.gg/MT27AG5EVE)） - 在你周围宣传 Tantivy ## 贡献代码我们使用 GitHub Pull Request 工作流：在提交 PR 时，请引用一个 GitHub issue，以及/或者包含详尽的提交信息。你可以随时在 CHANGELOG.md 中补充你的贡献。 ### 分词器在为 Tantivy 实现分词器时，请依赖于 `tantivy-tokenizer-api` crate。 ### 克隆并在本地构建 Tantivy 在 Rust 稳定版上编译。要检出并运行测试，只需运行： ``` git clone https://github.com/quickwit-oss/tantivy.git cd tantivy cargo test ``` ## 使用 Tantivy 的公司

Etsy ParadeDB Nuclia Humanfirst.ai Element.io Nuclia

## 常见问题解答 ### 可以在其他语言中使用 Tantivy 吗？ - Python → [tantivy-py](https://github.com/quickwit-oss/tantivy-py) - Ruby → [tantiny](https://github.com/baygeldin/tantiny) 你也可以在 [GitHub](https://github.com/search?q=tantivy) 上找到其他绑定，但它们可能缺乏维护。 ### Tantivy 有哪些使用示例？ - [seshat](https://github.com/matrix-org/seshat/)：Matrix 消息数据库/索引器 - [tantiny](https://github.com/baygeldin/tantiny)：适用于 Ruby 的轻量级全文搜索 - [lnx](https://github.com/lnx-search/lnx)：具有 REST API、自适应且容错的搜索引擎 - [Bichon](https://github.com/rustmailer/bichon)：带有 WebUI 的轻量级、高性能 Rust 邮件归档器 - 以及[更多](https://github.com/search?q=tantivy)！ ### 平均而言，Tantivy 比 Lucene 快多少？ - 根据我们的[搜索延迟基准测试](https://tantivy-search.github.io/bench/)，Tantivy 大约比 Lucene 快 2 倍。 ### tantivy 支持增量索引吗？ - 支持。 ### 如何编辑文档？ - tantivy 中的数据是不可变的。要编辑文档，需要先删除该文档，然后重新建立索引。 ### 索引期间我的文档何时才能被搜索到？ - 在 `IndexWriter` 上调用 `commit` 后，文档将变为可搜索状态。现有的 `IndexReader` 也需要重新加载才能反映出这些更改。最后，更改只对新建的 `Searcher` 可见。

标签：Apache Lucene, crates.io, Elasticsearch替代, Quickwit, Rust, 信息检索, 倒排索引, 全文搜索, 可视化界面, 大数据搜索, 库, 应急响应, 底层库, 开源, 搜索引擎, 搜索算法, 数据索引, 文本分析, 检索引擎, 网络流量审计, 通知系统