spaceandtimefdn/sxt-proof-of-sql

GitHub: spaceandtimefdn/sxt-proof-of-sql

Stars: 5434 | Forks: 577

# Proof of SQL

Proof of SQL TwitterDiscord Server

Proof of SQL is a high performance zero knowledge (ZK) prover developed by the [Space and Time](https://www.spaceandtime.io/) team, which cryptographically guarantees SQL queries were computed accurately against untampered data. It targets online latencies while proving computations over entire chain histories, an order of magnitude faster than state-of-the art zkVMs and coprocessors. Using Proof of SQL, developers can compute over both onchain and offchain datasets in a trustless manner, proving the result back to their smart contract (or offchain verifier) just-in-time during a transaction to power more sophisticated DeFi protocols with data-driven contracts. Proof of SQL can be integrated into any SQL database (such as [Google BigQuery](https://cloud.google.com/blog/topics/partners/how-space-and-times-proof-of-sql-integrates-with-bigquery)), centralized or decentralized, and is already securing some of the most prominent Web3 apps, financial institutions, and enterprises. ## For Developers Get started with Proof of SQL by using the published crate on [crates.io](https://crates.io/) or clone the repo and check out the examples. Check out the following sections of the README: * [Examples](#examples) * [Benchmarks](#benchmarks) * [Supported SQL Syntax](#supported-sql-syntax) * [Roadmap](#roadmap) * [Protocol Overview](#protocol-overview) ## Setup ### Prerequisites * Linux `x86_64` (NOTE: Most of the codebase _should_ work for most rust targets. However, proofs are accelerated using NVIDIA GPUs, so other targets would run very slowly and may require modification.) * NVIDIA GPU & Drivers (Strongly Recommended) * lld (`sudo apt install lld`) * clang (`sudo apt install clang`) * [Rust 1.81.0](https://www.rust-lang.org/tools/install)
Workaround for non-Linux and/or non-GPU machines. * Workaround #1: enable the CPU version of Blitzar by setting the `BLITZAR_BACKEND` environment variable. Example: export BLITZAR_BACKEND=cpu cargo test --all-features --all-targets * Workaround #2: disable the `blitzar` feature in the repo. Example cargo test --no-default-features --features="arrow cpu-perf"
## Examples Proof of SQL comes with example code demonstrating its usage. You can find the examples in the `crates/proof-of-sql/examples` folder. Below are explanations of how to run some of these examples: ### "Hello World" Example The "Hello World" example demonstrates generating and verifying a proof of the query `SELECT b FROM table WHERE a = 2` for the table: | a | b | |------------|-------------| | 1 | hi | | 2 | hello | | 3 | there | | 2 | world | #### Run cargo run --example hello_world #### Output Warming up GPU... 520.959485ms Loading data... 3.229767ms Parsing Query... 1.870256ms Generating Proof... 467.45371ms Verifying Proof... 7.106864ms Valid proof! Query result: OwnedTable { table: {Ident { value: "b", quote_style: None }: VarChar(["hello", "world"])} } For a detailed explanation of the example and its implementation, refer to the [README](https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/crates/proof-of-sql/examples/hello_world/README.md) and source code in [hello_world/main.rs](https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/crates/proof-of-sql/examples/hello_world/main.rs). ### CSV Database Example The CSV Database example demonstrates an implementation of a simple CSV-backed database with Proof of SQL capabilities. To install the example: cargo install --example posql_db --path crates/proof-of-sql #TODO: update once this is published to crates.io For detailed usage instructions and examples of how to create, append to, prove, and verify queries in the CSV-backed database, refer to the [README](https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/crates/proof-of-sql/examples/posql_db/README.md) and source code in [posql_db/main.rs](https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/crates/proof-of-sql/examples/posql_db/main.rs). ## Benchmarks Proof of SQL is optimized for speed and efficiency. Here's how it's so fast: 1. We use **native, precomputed commitments** to the data. In other words, when adding data to the database, we compute a "digest" of the data, which effectively "locks in" the data. Instead of using a merkle tree based commitment, like those used in most blockchains, we use the commitment scheme that is inherent to Proof of SQL itself. 2. SQL is conducive to a **natural arithmetization**, meaning that there is very little overhead compared with other proof systems that are designed around instructions/sequential compute. Instead, Proof of SQL is designed from the ground up with data processing and parallelism in mind. 3. We use **GPU acceleration** on the most expensive cryptography in the prover. We use [Blitzar](https://github.com/spaceandtimelabs/blitzar) as our acceleration framework. ### Setup We run benchmarks using NVIDIA A100 GPUs (NC A100 v4-series Azure VM). To run these benchmarks we first generate a large, randomly-filled table of data such as the following:

a (BIGINT) | b (BIGINT) | c (VARCHAR) ---|---|--- 17717 | -1 | Z 11651 | -3 | W -9563 | -2 | dS -6435 | -2 | x -8338 | -1 | jI 12420 | -2 | DX 11546 | -3 | 18292 | 2 | 6500 | -1 | C 16219 | 2 | D5

Then, we run the following 4 queries against these data, prove, and verify the results: * Filter - `SELECT b FROM bench_table WHERE a = 0` * Complex Filter - `SELECT * FROM bench_table WHERE (((a = 0) AND (b = 1)) OR ((c = 'a') AND (d = 'b')))` * Group By - `SELECT SUM(a), COUNT(*) FROM bench_table WHERE a = 0 GROUP BY b` * Join - `SELECT table_a.column, table_b.column FROM table_a JOIN table_b on table_a.column=table_b.column` ### Results The results for the `HyperKZG` commitment scheme are shown in the graphs below for a single and multiple A100 machine.

Proof Of SQL Benchmarks (200k - A100)Proof Of SQL Benchmarks (200k - Multi-A100)Proof Of SQL Benchmarks (10m - A100)Proof Of SQL Benchmarks (10m - Multi-A100)

## Supported SQL Syntax * `SELECT ... WHERE` * `GROUP BY` * Comparison operations: `=`, `>=`, `<=`, etc. * Logical operations: `AND`, `OR`, `NOT`. * Numerical operations `+`, `-`, `*`. * Aggregations: `SUM`, `COUNT` * Data Types: `BOOLEAN`, Integer types, `VARCHAR`, `DECIMAL75`, `TIMESTAMP`. ## Roadmap We are also currently undergoing robust security audits. Keep this in mind as you use this code. ## Protocol Overview See the [Space and Time Whitepaper](https://assets-global.website-files.com/642d91209f1e772d3740afa0/658edf3cf26933c4878ec965_whitepaper.pdf) for a more in-depth explanation. We will also be adding more technical documentation to this repo soon. We created this protocol with a few key goals. First, it needs to be super fast for data processing, both for verification and round-trip execution. This requires a design that is built from the ground up, as opposed to using arbitrary zkVMs. Second, we made it very developer-friendly. Using SQL, the most popular data query language, ensures a familiar experience for anyone building data-focused applications, or sophisticated data-driven contracts. Finally, our protocol is designed to handle complex data processing, not just simple serial compute or data retrieval. In this protocol, there are two main roles: the client sending the query (Verifier) and the database service returning the result (Prover). Of course, the Verifier doesn't always have to send the query; it can be any client, such as a smart contract, a dapp frontend, or a laptop. This setup is crucial for applications with limited compute or storage but still requires a security guarantee that data analytics are correctly executed and the data remains unaltered. The Prover handles heavy computations, while the Verifier is lightweight, suitable for client devices or smart contracts with limited resources. A key architectural feature is the concept of a commitment, or digest. To ensure data integrity, the Verifier maintains this commitment to detect any tampering. Think of it as a digital fingerprint—a lightweight digest representing the data in the table. ### Data Ingestion The initial interaction between the Verifier and the Prover involves data ingestion. In this process, when a service or client submits data for database inclusion, it first passes through the Verifier. Here, the Verifier generates (or updates) a commitment containing sufficient information to safeguard against tampering throughout the protocol. Once this commitment is established, the Verifier forwards the data to the database for storage, while retaining the commitment for future reference.

Data Ingestion Diagram

### Query Request The second interaction involves query requests, where the Verifier seeks data analytics on Prover-held data. When a service, client, or Verifier initiates a query request, it sends the request to the Prover. Here, the Prover parses the query, computes the result, and generates a proof, sent alongside the result to the Verifier, which is maintaining the commitment. The Verifier, armed with the proof and commitment, can verify the Prover's result against the query request.

Query Request Diagram

## License Proof of SQL is licensed under the Decentralized Open Software License 1.0. Please see the [LICENSE](https://github.com/spaceandtimelabs/sxt-proof-of-sql/blob/main/LICENSE) file for details.
标签:通知系统