Data-Centric-AI-Community/fg-data-synthetic
GitHub: Data-Centric-AI-Community/fg-data-synthetic
基于多种前沿 GAN 架构和高斯混合模型的合成数据生成库,支持表格数据与时间序列数据的隐私合规生成与数据增强。
Stars: 1642 | Forks: 258


[](https://pypi.org/project/fg-data-synthetic/)

[](https://pypi.org/project/fg-data-synthetic/)

> [YData Fabric](https://ydata.ai/products/synthetic_data) enables the generation of high-quality datasets within a full UI experience, from data preparation to synthetic data generation and evaluation.
> Check out the [Community Version](https://ydata.ai/ydata-fabric-free-trial). # fg-data-synthetic This repository contains material related with architectures and models for synthetic data, from Generative Adversarial Networks (GANs) to Gaussian Mixtures. The repo includes a full ecosystem for synthetic data generation, that includes different models for the generation of synthetic structure data and time-series. All the Deep Learning models are implemented leveraging Tensorflow 2.0. Several example Jupyter Notebooks and Python scripts are included, to show how to use the different architectures. Are you ready to learn more about synthetic data and the bext-practices for synthetic data generation? ## Quickstart The source code is currently hosted on GitHub at: https://github.com/Data-Centric-AI-Community/fg-data-synthetic Binary installers for the latest released version are available at the [Python Package Index (PyPI).](https://pypi.org/project/fg-data-synthetic/) ```bash pip install fg-data-synthetic ``` ### 合成数据生成的 UI 指南 YData synthetic 现在提供了一个 UI 界面,可以指导你完成生成结构化表格数据的步骤和输入。 该 streamlit 应用从 *v1.0.0* 版本开始提供,并支持以下流程: - 训练 synthesizer 模型 - 生成并分析合成数据样本 #### 安装 ``` pip install fg-data-synthetic[streamlit] ``` #### 快速开始 在 python 文件中使用以下代码片段(不支持 Jupyter Notebooks): ``` from data_synthetic import streamlit_app streamlit_app.run() ``` 或者使用在 [示例文件夹](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/examples/streamlit_app.py) 中可以找到的 streamlit_app.py 文件。 ``` python -m streamlit_app ``` 支持以下模型: - CGAN - WGAN - WGANGP - DRAGAN - CRAMER - CTGAN [](https://youtu.be/ep0PhwsFx0A) ### 示例 在这里你可以找到该包和模型合成表格数据的使用示例。 - 在 adult census income 数据集上的快速表格数据合成 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/regular/models/Fast_Adult_Census_Income_Data.ipynb) - 在 adult census income 数据集上使用 CTGAN 生成表格合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/regular/models/CTGAN_Adult_Census_Income_Data.ipynb) - 在 stock 数据集上使用 TimeGAN 生成时间序列合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/timeseries/TimeGAN_Synthetic_stock_data.ipynb) - 在 FCC MBA 数据集上使用 DoppelGANger 生成时间序列合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/timeseries/DoppelGANger_FCC_MBA_Dataset.ipynb) - 更多示例会不断添加,可以在 `/examples` 目录中找到。 ### 供你实验的数据集 以下是一些示例数据集,供你使用 synthesizers 进行尝试: #### 表格数据集 - [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income) - [Credit card fraud](https://www.kaggle.com/mlg-ulb/creditcardfraud) - [Cardiovascular Disease dataset](https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset) #### 序列数据集 - [Stock data](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/data) - [FCC MBA data](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/data) ## 项目资源 在此仓库中,你可以找到用于创建 synthesizers 的多种 GAN 架构: ### 表格数据 - [GAN](https://arxiv.org/abs/1406.2661) - [CGAN (Conditional GAN)](https://arxiv.org/abs/1411.1784) - [WGAN (Wasserstein GAN)](https://arxiv.org/abs/1701.07875) - [WGAN-GP (Wassertein GAN with Gradient Penalty)](https://arxiv.org/abs/1704.00028) - [DRAGAN (On Convergence and stability of GANS)](https://arxiv.org/pdf/1705.07215.pdf) - [Cramer GAN (The Cramer Distance as a Solution to Biased Wasserstein Gradients)](https://arxiv.org/abs/1705.10743) - [CWGAN-GP (Conditional Wassertein GAN with Gradient Penalty)](https://cameronfabbri.github.io/papers/conditionalWGAN.pdf) - [CTGAN (Conditional Tabular GAN)](https://arxiv.org/pdf/1907.00503.pdf) - [Gaussian Mixture](https://towardsdatascience.com/gaussian-mixture-models-explained-6986aaf5a95) ### 序列数据 - [TimeGAN](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf) - [DoppelGANger](https://dl.acm.org/doi/pdf/10.1145/3419394.3423643) ## 支持 如需获取使用本库的支持,请加入我们的 Discord 服务器。我们的 Discord 社区非常友好,并且非常乐意快速解答有关该库使用和开发的问题。[点击此处加入我们的 Discord 社区!](https://tiny.ydata.ai/dcai-ydata-synthetic) ## 常见问题 有疑问?请查看关于 `fg-data-synthetic` 的[常见问题解答](https://ydata.ai/resources/10-most-asked-questions-on-ydata-synthetic)。如果你觉得缺少了什么,欢迎随时[与我们预约一次非常随意的聊天](https://meetings.hubspot.com/fabiana-clemente)。 ## 许可证 [MIT License](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/blob/master/LICENSE)

> [YData Fabric](https://ydata.ai/products/synthetic_data) enables the generation of high-quality datasets within a full UI experience, from data preparation to synthetic data generation and evaluation.
> Check out the [Community Version](https://ydata.ai/ydata-fabric-free-trial). # fg-data-synthetic This repository contains material related with architectures and models for synthetic data, from Generative Adversarial Networks (GANs) to Gaussian Mixtures. The repo includes a full ecosystem for synthetic data generation, that includes different models for the generation of synthetic structure data and time-series. All the Deep Learning models are implemented leveraging Tensorflow 2.0. Several example Jupyter Notebooks and Python scripts are included, to show how to use the different architectures. Are you ready to learn more about synthetic data and the bext-practices for synthetic data generation? ## Quickstart The source code is currently hosted on GitHub at: https://github.com/Data-Centric-AI-Community/fg-data-synthetic Binary installers for the latest released version are available at the [Python Package Index (PyPI).](https://pypi.org/project/fg-data-synthetic/) ```bash pip install fg-data-synthetic ``` ### 合成数据生成的 UI 指南 YData synthetic 现在提供了一个 UI 界面,可以指导你完成生成结构化表格数据的步骤和输入。 该 streamlit 应用从 *v1.0.0* 版本开始提供,并支持以下流程: - 训练 synthesizer 模型 - 生成并分析合成数据样本 #### 安装 ``` pip install fg-data-synthetic[streamlit] ``` #### 快速开始 在 python 文件中使用以下代码片段(不支持 Jupyter Notebooks): ``` from data_synthetic import streamlit_app streamlit_app.run() ``` 或者使用在 [示例文件夹](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/examples/streamlit_app.py) 中可以找到的 streamlit_app.py 文件。 ``` python -m streamlit_app ``` 支持以下模型: - CGAN - WGAN - WGANGP - DRAGAN - CRAMER - CTGAN [](https://youtu.be/ep0PhwsFx0A) ### 示例 在这里你可以找到该包和模型合成表格数据的使用示例。 - 在 adult census income 数据集上的快速表格数据合成 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/regular/models/Fast_Adult_Census_Income_Data.ipynb) - 在 adult census income 数据集上使用 CTGAN 生成表格合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/regular/models/CTGAN_Adult_Census_Income_Data.ipynb) - 在 stock 数据集上使用 TimeGAN 生成时间序列合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/timeseries/TimeGAN_Synthetic_stock_data.ipynb) - 在 FCC MBA 数据集上使用 DoppelGANger 生成时间序列合成数据 [](https://colab.research.google.com/github/Data-Centric-AI-Community/fg-data-synthetic/blob/master/examples/timeseries/DoppelGANger_FCC_MBA_Dataset.ipynb) - 更多示例会不断添加,可以在 `/examples` 目录中找到。 ### 供你实验的数据集 以下是一些示例数据集,供你使用 synthesizers 进行尝试: #### 表格数据集 - [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income) - [Credit card fraud](https://www.kaggle.com/mlg-ulb/creditcardfraud) - [Cardiovascular Disease dataset](https://www.kaggle.com/datasets/sulianova/cardiovascular-disease-dataset) #### 序列数据集 - [Stock data](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/data) - [FCC MBA data](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/tree/master/data) ## 项目资源 在此仓库中,你可以找到用于创建 synthesizers 的多种 GAN 架构: ### 表格数据 - [GAN](https://arxiv.org/abs/1406.2661) - [CGAN (Conditional GAN)](https://arxiv.org/abs/1411.1784) - [WGAN (Wasserstein GAN)](https://arxiv.org/abs/1701.07875) - [WGAN-GP (Wassertein GAN with Gradient Penalty)](https://arxiv.org/abs/1704.00028) - [DRAGAN (On Convergence and stability of GANS)](https://arxiv.org/pdf/1705.07215.pdf) - [Cramer GAN (The Cramer Distance as a Solution to Biased Wasserstein Gradients)](https://arxiv.org/abs/1705.10743) - [CWGAN-GP (Conditional Wassertein GAN with Gradient Penalty)](https://cameronfabbri.github.io/papers/conditionalWGAN.pdf) - [CTGAN (Conditional Tabular GAN)](https://arxiv.org/pdf/1907.00503.pdf) - [Gaussian Mixture](https://towardsdatascience.com/gaussian-mixture-models-explained-6986aaf5a95) ### 序列数据 - [TimeGAN](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf) - [DoppelGANger](https://dl.acm.org/doi/pdf/10.1145/3419394.3423643) ## 支持 如需获取使用本库的支持,请加入我们的 Discord 服务器。我们的 Discord 社区非常友好,并且非常乐意快速解答有关该库使用和开发的问题。[点击此处加入我们的 Discord 社区!](https://tiny.ydata.ai/dcai-ydata-synthetic) ## 常见问题 有疑问?请查看关于 `fg-data-synthetic` 的[常见问题解答](https://ydata.ai/resources/10-most-asked-questions-on-ydata-synthetic)。如果你觉得缺少了什么,欢迎随时[与我们预约一次非常随意的聊天](https://meetings.hubspot.com/fabiana-clemente)。 ## 许可证 [MIT License](https://github.com/Data-Centric-AI-Community/fg-data-synthetic/blob/master/LICENSE)
标签:Apex, Data-Centric AI, GAN, Kubernetes, pypi包, Python, 人工智能, 变分自编码器, 合成数据, 开源库, 搜索引擎爬虫, 数据增强, 数据生成器, 数据科学, 数据隐私, 无后门, 时序数据, 时间序列, 机器学习, 深度学习, 生成式AI, 生成模型, 用户模式Hook绕过, 索引, 表格数据, 资源验证, 逆向工具