synthdb/synthdb

GitHub: synthdb/synthdb

SynthDB 是一个零配置的 PostgreSQL 合成数据生成工具，能自动创建具有引用完整性和语义真实性的测试数据。

Stars: 11 | Forks: 0

# 🦀 SynthDB ### **通用数据库种子生成器** #### 生产级合成数据。零配置。上下文感知。 [![Crates.io](https://img.shields.io/crates/v/synthdb.svg)](https://crates.io/crates/synthdb) [![使用 Rust 构建](https://img.shields.io/badge/built_with-Rust-d33833.svg)](https://www.rust-lang.org/) [![许可证: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![文档](https://img.shields.io/badge/docs-latest-blue.svg)](https://docs.rs/synthdb)

## 📖 概述 **SynthDB** 是下一代数据库种子生成引擎，它能读取你现有的 PostgreSQL 模式，并自动生成**具有统计合理性、具有关联性的数据**。与传统生成随机乱码的工具不同，SynthDB 采用**深度语义引擎**来理解数据模型的上下文和关系，产生的数据看起来和感觉上都很真实。 ``` -- Instead of this garbage: INSERT INTO users VALUES ('XJ9K2', 'asdf@qwerty', '99999', 'ZZZ'); -- SynthDB generates this: INSERT INTO users VALUES ('John Doe', 'john.doe@techcorp.com', '+1-555-0142', 'San Francisco, CA'); ``` ## ✨ 功能特性 ### 🧠 **深度语义智能** SynthDB 理解列的*含义*，而不仅仅是它们的类型。 #### 🎯 上下文感知身份如果一个表包含 `first_name`、`last_name` 和 `email`，SynthDB 会确保它们完美匹配： - **姓名:** "Sarah Martinez" - **邮箱:** "sarah.martinez@company.com" - **用户名:** "smartinez" #### 🏷️ 智能分类自动检测并跨多个领域生成有效数据：

**💰 金融** - 信用卡号 (有效 Luhn 校验) - IBAN 和 SWIFT 代码 - 加密货币地址 - 货币代码和金额 **🌍 地理** - 连贯的地址 - 城市 ↔ 州 ↔ 邮政编码 - 纬度/经度对 - 时区 **🔬 科学** - 化学式 - DNA 序列 - 医疗/ICD 代码 - 实验室数值

**💻 技术** - IPv4 和 IPv6 地址 - MAC 地址 - 用户代理 - 文件路径和 URL **🏢 商业** - 公司名称 - 职位名称 - 部门名称 - 股票代码 **📱 个人信息** - 电话号码 - 社会安全号码 - 护照号码 - 驾驶执照 ID

### 🔗 **引用完整性** #### 📊 拓扑排序自动分析外键依赖关系，并以正确的顺序插入数据： ``` Users → Orders → OrderItems → Shipments ``` #### ✅ 零断链生成的外键**始终**引用有效、存在的父行。永远不会有孤立记录。 ``` -- Parent record created first INSERT INTO customers (id, name) VALUES (1, 'Acme Corp'); -- Child record references existing parent INSERT INTO orders (id, customer_id, total) VALUES (101, 1, 1299.99); ``` ### 🛡️ **生产就绪** | 功能特性 | 描述 | |---------|------| | **严格精度** | 尊重 `NUMERIC(10,2)`、`VARCHAR(15)` 及所有约束类型 | | **智能空值** | 智能地对可选字段应用 NULL 值，同时保持关键数据填充 | | **唯一约束** | 保证具有 UNIQUE 或 PRIMARY KEY 约束的列的唯一性 | | **检查约束** | 遵循 CHECK 约束和枚举类型 | | **零配置** | 无需 YAML 文件，无需映射规则。只需指向你的数据库 | | **性能** | 使用 Rust 🦀 编写，实现极速数据生成 | ## ⚡ 快速开始 ### 📥 安装 ``` # 通过 Cargo cargo install synthdb ``` ### 🎯 基本用法 **步骤 1：** 创建一个包含你模式的目标数据库（表必须存在） **步骤 2：** 运行 SynthDB ``` synthdb clone \ --url "postgres://user:pass@localhost:5432/my_staging_db" \ --rows 1000 \ --output seed.sql ``` **步骤 3：** 应用生成的数据 ``` psql -d my_staging_db -f seed.sql ``` ### 🔧 高级选项 ``` # 直接生成数据到数据库（无 SQL 文件） synthdb clone --url "postgres://..." --rows 5000 --execute # 为每张表指定自定义行数 synthdb clone --url "postgres://..." --config counts.json # 排除特定表 synthdb clone --url "postgres://..." --exclude "logs,temp_*" # 设置数据区域设置 synthdb clone --url "postgres://..." --locale "en_GB" ``` ## 💡 示例 ### 🎨 SynthDB 如何处理数据

列名	生成的值	逻辑
`merchant_name`	`'Acme Corporation'`	🏢 检测到公司实体
`support_email`	`'support@acmecorp.com'`	📧 与公司名称匹配
`mac_address`	`'00:1A:2B:3C:4D:5E'`	🔧 有效的十六进制格式
`ipv6_address`	`'2001:0db8:85a3::8a2e:0370'`	🌐 有效的 IPv6 格式
`contract_value`	`45021.50`	💯 遵循 `NUMERIC(10,2)` 约束
`tracking_code`	`'TRK-9281-A02'`	🎯 语义 ID 生成
`audit_log_path`	`'/var/logs/audit/2024-11.log'`	📁 上下文感知的文件路径
`birth_date`	`'1985-06-15'`	🎂 真实的年龄分布
`website_url`	`'https://acmecorp.com'`	🔗 与公司域名匹配

### 🗂️ 真实模式示例 ``` -- Your existing schema CREATE TABLE companies ( id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL, website VARCHAR(255), industry VARCHAR(50) ); CREATE TABLE employees ( id SERIAL PRIMARY KEY, company_id INTEGER REFERENCES companies(id), first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) UNIQUE NOT NULL, phone VARCHAR(20), job_title VARCHAR(100), salary NUMERIC(10,2), hire_date DATE NOT NULL ); ``` **SynthDB 生成：** ``` -- Coherent company data INSERT INTO companies VALUES (1, 'TechVision Solutions', 'https://techvision.io', 'Software'), (2, 'Global Logistics Inc', 'https://globallogistics.com', 'Transportation'); -- Employees with matching company context INSERT INTO employees VALUES (1, 1, 'Alice', 'Chen', 'alice.chen@techvision.io', '+1-555-0123', 'Senior Software Engineer', 125000.00, '2022-03-15'), (2, 1, 'Bob', 'Kumar', 'bob.kumar@techvision.io', '+1-555-0124', 'Product Manager', 135000.00, '2021-08-22'), (3, 2, 'Carol', 'Rodriguez', 'carol.rodriguez@globallogistics.com', '+1-555-0198', 'Operations Director', 145000.00, '2020-01-10'); ``` ## 🏗️ 架构 ``` ┌─────────────────────────────────────────────────────────┐ │ SynthDB Engine │ ├─────────────────────────────────────────────────────────┤ │ 1. Schema Introspection │ │ └─ Read tables, columns, constraints, relationships │ │ │ │ 2. Dependency Analysis │ │ └─ Build dependency graph via topological sort │ │ │ │ 3. Semantic Classification │ │ └─ Detect column meaning from names & types │ │ │ │ 4. Context-Aware Generation │ │ └─ Generate coherent, relational data │ │ │ │ 5. Constraint Validation │ │ └─ Ensure all DB constraints are satisfied │ │ │ │ 6. Output │ │ └─ SQL file or direct database insertion │ └─────────────────────────────────────────────────────────┘ ``` ## 🗺️ 路线图 - [x] PostgreSQL 支持 - [x] 语义列检测 - [x] 外键解析 - [ ] MySQL/MariaDB 支持 - [ ] SQLite 支持 - [ ] 自定义数据提供程序 - [ ] GraphQL 模式支持 - [ ] 性能基准测试套件 - [ ] 配置 Web UI - [ ] 基于机器学习的模式检测 ## 🤝 贡献我们喜欢 Rustaceans！🦀 欢迎并感谢贡献。 ### 如何贡献 1. **Fork 仓库** 2. **创建功能分支** git checkout -b feature/amazing-feature 3. **进行更改** cargo fmt cargo clippy cargo test 4. **提交更改** git commit -m '添加一个很棒的功能' 5. **推送到你的 fork** git push origin feature/amazing-feature 6. **开启一个 Pull Request** ### 开发设置 ``` # 克隆仓库 git clone https://github.com/yourusername/synthdb.git cd synthdb # 构建项目 cargo build # 运行测试 cargo test # 运行示例 cargo run -- clone --url "postgres://localhost/testdb" --rows 100 ``` ### 行为准则贡献前请阅读我们的[行为准则](CODE_OF_CONDUCT.md)。 ## 🙏 致谢使用 ❤️ 构建，基于： - [Rust](https://www.rust-lang.org/) - 系统编程语言 - [Tokio](https://tokio.rs/) - 异步运行时 - [SQLx](https://github.com/launchbadge/sqlx) - 数据库工具包 - [Fake](https://github.com/cksac/fake-rs) - 数据生成库 ## 📄 许可证在 **MIT 许可证** 下分发。详情请见 [LICENSE](LICENSE)。 ## 💬 社区与支持 - **问题:** [GitHub Issues](https://github.com/synthdb/synthdb/issues) - **讨论:** [GitHub Discussions](https://github.com/synthdb/synthdb/discussions) - **支持:** [Buymeacoffee](https://buymeacoffee.com/synthdb)

**如果 SynthDB 对你的项目有帮助，请考虑在 GitHub 上给它一个 ⭐！** 由 SynthDB 团队使用 🦀 制作

标签：PostgreSQL, Rust语言, SEO优化, SOC Prime, 上下文感知, 信用卡生成, 化学公式, 参考完整性, 可视化界面, 合成数据生成, 地址生成, 地理信息, 工具软件, 开发工具, 数据库, 数据生成器, 智能分类, 测试数据, 测试用例, 深度语义引擎, 种子数据, 科学数据, 自动化数据生成, 语义现实, 通知系统, 金融数据, 零配置