tobilg/polyglot

GitHub: tobilg/polyglot

基于 Rust/WASM 的高性能 SQL 转换器,支持 32+ 种数据库方言之间的解析、转换、生成和格式化。

Stars: 695 | Forks: 38

# Polyglot 基于 Rust/Wasm 的 SQL 转换器,支持 32+ 种方言,灵感来源于 [sqlglot](https://github.com/tobymao/sqlglot)。 Polyglot 可以跨 32+ 种数据库方言解析、生成、转换和格式化 SQL。它以以下形式发布: - Rust crate ([`polyglot-sql`](https://crates.io/crates/polyglot-sql/)) - TypeScript/WASM SDK ([`@polyglot-sql/sdk`](https://www.npmjs.com/package/@polyglot-sql/sdk)) - Python 包 ([`polyglot-sql`](https://pypi.org/project/polyglot-sql/)) 此外还有一个 [playground](https://polyglot-playground.gh.tobilg.com/) 供你在浏览器中试用,以及 [Rust API 文档](https://docs.rs/polyglot-sql/latest/polyglot_sql/)、[TypeScript API 文档](https://polyglot.gh.tobilg.com/) 和 [Python API 文档](https://polyglot-python.gh.tobilg.com/)。 发布说明记录在 [`CHANGELOG.md`](CHANGELOG.md) 中。 ## 功能 - **Transpile(转换)**:在任意 32 种方言对之间转换 SQL - **Parse(解析)**:将 SQL 解析为完全类型的 AST - **Generate(生成)**:从 AST 节点反向生成 SQL - **Format(格式化)**:美化打印 SQL - **Fluent builder API**:用于编程式构建查询 - **Validation(验证)**:支持语法、语义和 Schema 感知检查 - **AST visitor**:用于遍历、转换和分析查询的工具 - **C FFI**:用于多语言绑定的共享/静态库 (`polyglot-sql-ffi`) - **Python bindings**:由 PyO3 驱动(PyPI 上的 `polyglot-sql`) ## 支持的方言 (32) | | | | | | |---|---|---|---|---| | Athena | BigQuery | ClickHouse | CockroachDB | Databricks | | Doris | Dremio | Drill | Druid | DuckDB | | Dune | Exasol | Fabric | Hive | Materialize | | MySQL | Oracle | PostgreSQL | Presto | Redshift | | RisingWave | SingleStore | Snowflake | Solr | Spark | | SQLite | StarRocks | Tableau | Teradata | TiDB | | Trino | TSQL | | | | ## 快速开始 ### Rust ``` use polyglot_sql::{transpile, DialectType}; // Transpile MySQL to PostgreSQL let result = transpile( "SELECT IFNULL(a, b) FROM t", DialectType::MySQL, DialectType::Postgres, ).unwrap(); assert_eq!(result[0], "SELECT COALESCE(a, b) FROM t"); ``` ``` use polyglot_sql::builder::*; // Fluent query builder let query = select(["id", "name"]) .from("users") .where_(col("age").gt(lit(18))) .order_by(["name"]) .limit(10) .build(); ``` 查看完整的 [Rust crate README](crates/polyglot-sql/README.md) 获取更多示例。 ### TypeScript ``` npm install @polyglot-sql/sdk ``` ``` import { transpile, Dialect } from '@polyglot-sql/sdk'; // Transpile MySQL to PostgreSQL const result = transpile( 'SELECT IFNULL(a, b) FROM t', Dialect.MySQL, Dialect.PostgreSQL, ); console.log(result.sql[0]); // SELECT COALESCE(a, b) FROM t ``` ``` import { select, col, lit } from '@polyglot-sql/sdk'; // Fluent query builder const sql = select('id', 'name') .from('users') .where(col('age').gt(lit(18))) .orderBy(col('name').asc()) .limit(10) .toSql('postgresql'); ``` 查看完整的 [TypeScript SDK README](packages/sdk/README.md) 获取更多示例。 ### Python ``` pip install polyglot-sql ``` ``` import polyglot_sql result = polyglot_sql.transpile( "SELECT IFNULL(a, b) FROM t", read="mysql", write="postgres", ) print(result[0]) # SELECT COALESCE(a, b) FROM t ``` 查看完整的 [Python bindings README](crates/polyglot-sql-python/README.md)。 ## 格式化防护机制 SQL 格式化在 Rust 核心中通过防护限制运行,以防止病态输入耗尽内存: - `maxInputBytes`: `16 MiB` (默认) - `maxTokens`: `1_000_000` (默认) - `maxAstNodes`: `1_000_000` (默认) - `maxSetOpChain`: `256` (默认) 防护失败会在消息中返回错误代码 (`E_GUARD_INPUT_TOO_LARGE`, `E_GUARD_TOKEN_BUDGET_EXCEEDED`, `E_GUARD_AST_BUDGET_EXCEEDED`, `E_GUARD_SET_OP_CHAIN_EXCEEDED`)。 各运行时的配置接口: - Rust: 通过 `format_with_options` 配置。 - WASM: 通过 `format_sql_with_options` / `format_sql_with_options_value` 配置。 - TypeScript SDK: 通过 `formatWithOptions` 配置。 - C FFI: 通过 `polyglot_format_with_options` 配置。 - Python: 通过仅关键字参数 `format_sql(..., max_*)` 覆盖配置。 WASM 底层示例(来自 `polyglot-sql-wasm` 导出): ``` import init, { format_sql_with_options } from "./polyglot_sql_wasm.js"; await init(); const raw = format_sql_with_options( "SELECT a,b FROM t", "generic", JSON.stringify({ maxInputBytes: 2 * 1024 * 1024, maxTokens: 250000, maxAstNodes: 250000, maxSetOpChain: 128 }), ); const result = JSON.parse(raw); ``` ## 项目结构 ``` polyglot/ ├── crates/ │ ├── polyglot-sql/ # Core Rust library (parser, generator, builder) │ ├── polyglot-sql-function-catalogs/ # Optional dialect function catalogs (feature-gated data) │ ├── polyglot-sql-wasm/ # WASM bindings │ ├── polyglot-sql-ffi/ # C ABI bindings (.so/.dylib/.dll + .a/.lib + header) │ └── polyglot-sql-python/ # Python bindings (PyO3 + maturin, published on PyPI) ├── packages/ │ ├── sdk/ # TypeScript SDK (@polyglot-sql/sdk on npm) │ ├── documentation/ # TypeScript API documentation site │ ├── playground/ # Playground for testing the SDK (React 19, Tailwind v4, Vite) │ └── python-docs/ # Python API documentation site (Cloudflare Pages) ├── examples/ │ ├── rust/ # Rust example │ ├── typescript/ # TypeScript SDK example │ └── c/ # C FFI example └── tools/ ├── sqlglot-compare/ # Test extraction & comparison tool └── bench-compare/ # Performance benchmarks ``` ## 示例 独立的示例项目位于 [`examples/`](examples/) 目录中。每个项目都拉取最新发布的包,并且可以独立运行。 ### Rust ``` cargo run --manifest-path examples/rust/Cargo.toml ``` ### TypeScript ``` cd examples/typescript pnpm install --ignore-workspace && pnpm start ``` ## 从源码构建 ``` # 构建 Rust 核心 cargo build -p polyglot-sql # 构建 C FFI crate(共享/静态库 + 生成的头文件) cargo build -p polyglot-sql-ffi --profile ffi_release # 构建 Python 扩展 / wheel make develop-python make build-python # 构建 WASM + TypeScript SDK make build-all # 或分步执行: cd crates/polyglot-sql-wasm && wasm-pack build --target bundler --release cd packages/sdk && npm run build ``` ## C FFI Polyglot 在 `crates/polyglot-sql-ffi` 中提供了稳定的 C ABI。 - Crate README: [`crates/polyglot-sql-ffi/README.md`](crates/polyglot-sql-ffi/README.md) - 生成的头文件: `crates/polyglot-sql-ffi/polyglot_sql.h` - 示例程序: `examples/c/main.c` - Make 目标: - `make build-ffi` - `make generate-ffi-header` - `make build-ffi-example` - `make test-ffi` 对于标记版本 (`v*`),CI 还会将预构建的 FFI 构建产物和校验和附加到 GitHub Releases 上。 ## Python Bindings Polyglot 在 `crates/polyglot-sql-python` 中提供了第一方 Python bindings。 - Crate README: [`crates/polyglot-sql-python/README.md`](crates/polyglot-sql-python/README.md) - PyPI 上的包名: `polyglot-sql` - Make 目标: - `make develop-python` - `make test-python` - `make typecheck-python` - `make build-python` ## 函数目录 可选的方言函数目录通过 `crates/polyglot-sql-function-catalogs` 提供。 - Crate README: [`crates/polyglot-sql-function-catalogs/README.md`](crates/polyglot-sql-function-catalogs/README.md) - 核心 feature flags: - `function-catalog-clickhouse` - `function-catalog-duckdb` - `function-catalog-all-dialects` - 预期行为:编译时包含,核心中一次性加载,在 Schema 验证类型检查期间自动使用。 ## 测试 Polyglot 目前运行 **10,220 个 SQLGlot 固定用例** 以及额外的项目特定套件。所有严格的通过/失败套件在最新的验证运行中均达到 **100%**。 | Category | Count | Pass Rate | |----------|------:|:---------:| | SQLGlot generic identity | 956 | 100% | | SQLGlot dialect identity | 3,554 | 100% | | SQLGlot transpilation | 5,513 | 100% | | SQLGlot transpile (generic) | 145 | 100% | | SQLGlot parser | 29 | 100% | | SQLGlot pretty-print | 23 | 100% | | Lib unit tests | 835 | 100% | | Custom dialect identity | 276 | 100% | | Custom dialect transpilation | 347 | 100% | | ClickHouse parser corpus (non-skipped) | 7,047 | 100% | | FFI integration tests | 20 | 100% | | Python bindings tests (`make test-python`) | 69 | 100% | | **Total (strict Rust/FFI pass/fail case count)** | **18,745** | **100%** | ``` # 安装 fixtures(需执行一次) make setup-fixtures # 运行所有测试 make test-rust-all # All SQLGlot fixture suites make test-rust-lib # Lib unit tests (835) make test-rust-verify # Full strict verification suite make test-ffi # FFI crate integration tests # 单个测试套件 make test-rust-identity # 956 generic identity cases make test-rust-dialect # 3,554 dialect identity cases make test-rust-transpile # 5,513 transpilation cases make test-rust-transpile-generic # 145 generic transpile cases make test-rust-parser # 29 parser cases make test-rust-pretty # 23 pretty-print cases # 附加测试 make test-rust-roundtrip # Organized roundtrip unit tests make test-rust-matrix # Dialect matrix transpilation tests make test-rust-compat # SQLGlot compatibility tests make test-rust-errors # Error handling tests make test-rust-functions # Function normalization tests # TypeScript SDK 测试 cd packages/sdk && npm test # 与 Python SQLGlot 的全面对比 make test-compare ``` ### 基准测试 ``` make bench-compare # Compare polyglot-sql vs sqlglot performance make bench-rust # Rust benchmarks (JSON output) make bench-python # Python sqlglot benchmarks (JSON output) cargo bench -p polyglot-sql # Criterion benchmarks ``` ### 模糊测试 ``` cargo +nightly fuzz run fuzz_parser cargo +nightly fuzz run fuzz_roundtrip cargo +nightly fuzz run fuzz_transpile ``` ## Makefile 目标 | Target | Description | |--------|-------------| | `make help` | Show all available commands | | `make build-all` | Build core release + FFI + Python + bindings + WASM/SDK | | `make build-wasm` | Build WASM package + TypeScript SDK | | `make build-ffi` | Build C FFI crate (`ffi_release` profile) | | `make generate-ffi-header` | Generate C header via cbindgen/build.rs | | `make build-ffi-example` | Build + run C example against FFI lib | | `make develop-python` | Build/install Python extension in uv-managed env | | `make build-python` | Build Python wheels with maturin | | `make test-ffi` | Run FFI integration tests | | `make test-rust` | Run SQLGlot-named Rust tests in `polyglot-sql` | | `make test-rust-all` | Run all 10,220 SQLGlot fixture cases | | `make test-rust-lib` | Run 835 lib unit tests | | `make test-rust-verify` | Full verification suite | | `make test-rust-clickhouse-parser` | Run strict ClickHouse parser suite | | `make test-rust-clickhouse-coverage` | Run ClickHouse coverage suite (report-only) | | `make test-compare` | Compare against Python sqlglot | | `make bench-compare` | Performance comparison | | `make bench-rust-parsing-report` | Run `rust_parsing` bench and generate Markdown report | | `make bench-parse` | Core parse benchmark (polyglot vs sqlglot) | | `make bench-parse-quick` | Faster core parse benchmark mode | | `make bench-parse-full` | Parse benchmark including optional parsers | | `make extract-fixtures` | Regenerate JSON fixtures from Python | | `make setup-fixtures` | Create fixture symlink for Rust tests | | `make generate-bindings` | Generate TypeScript type bindings | | `make test-python` | Run Python bindings tests | | `make typecheck-python` | Run Python bindings type-check | | `make documentation-build` | Build documentation site | | `make documentation-deploy` | Deploy documentation to Cloudflare Pages | | `make python-docs-build` | Build Python API docs site | | `make python-docs-deploy` | Deploy Python API docs to Cloudflare Pages | | `make playground-build` | Build playground | | `make playground-deploy` | Deploy playground to Cloudflare Pages | | `make clean` | Remove all build artifacts | ## Rust 解析 Markdown 报告 如需获取一份适合发布说明的纯 Rust 解析基准测试摘要(`short`, `long`, `tpch`, `crazy`),请运行: ``` make bench-rust-parsing-report ``` 这将: - 运行 `cargo bench -p polyglot-sql --bench rust_parsing` - 从 `target/criterion/rust_parse_quick_equivalent/parse_one` 读取 Criterion 结果 - 生成包含 Markdown 表格的 `target/criterion/rust_parsing_report.md`(平均值、标准差、95% CI、基准增量(如果可用)) 最新生成结果快照: - 生成时间: 2026-02-26 13:43:10 UTC - 来源: `target/criterion/rust_parse_quick_equivalent/parse_one` | Query | Mean | Std Dev | 95% CI (mean) | Change vs baseline | |---|---:|---:|---:|---:| | short | 51.28 us | 481.03 ns | 51.09 us - 51.60 us | -8.40% | | long | 259.61 us | 666.53 ns | 259.23 us - 260.01 us | -5.21% | | tpch | 268.59 us | 776.85 ns | 268.15 us - 269.07 us | -0.03% | | crazy | 1.03 ms | 66.07 us | 992.65 us - 1.07 ms | +6.05% | ## 许可证 [MIT](LICENSE) [sqlglot MIT](licenses/SQLGLOT_LICENSE.md)
标签:AI工具, AST 抽象语法树, BigQuery, C FFI, DNS 反向解析, DuckDB, MITM代理, Polyglot, PostgreSQL, PyO3, Python 绑定, Rust, Snowflake, Spark SQL, SQL 方言, SQL 格式化, SQL 解析器, SQL 转译器, SQL 验证, TypeScript SDK, Wasm, WebAssembly, 代码示例, 代码转换, 前端后端通用, 可视化界面, 开发工具库, 数据分析, 数据库中间件, 数据库工具, 网络流量审计, 解析器生成器, 逆向工具, 通知系统