opendp/smartnoise-sdk

GitHub: opendp/smartnoise-sdk

为表格和关系型数据提供差分隐私保护的 SQL 查询执行与合成数据生成工具包

Stars: 292 | Forks: 79

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) # SmartNoise SDK:表格数据差分隐私工具 SmartNoise SDK 包含 2 个包: * [smartnoise-sql](sql/):执行差分隐私 SQL 查询 * [smartnoise-synth](synth/):生成差分隐私合成数据 入门请参考下方示例。点击进入各项目以查看更详细的示例。 ## SQL [![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C3.9%20%7C%203.10-blue)](https://www.python.org/) ### 安装 ``` pip install smartnoise-sql ``` ### 查询 ``` import snsql from snsql import Privacy import pandas as pd csv_path = 'PUMS.csv' meta_path = 'PUMS.yaml' data = pd.read_csv(csv_path) privacy = Privacy(epsilon=1.0, delta=0.01) reader = snsql.from_connection(data, privacy=privacy, metadata=meta_path) result = reader.execute('SELECT sex, AVG(age) AS age FROM PUMS.PUMS GROUP BY sex') print(result) ``` `PUMS.csv` 和 `PUMS.yaml` 可在 [datasets](datasets/) 文件夹中找到。 请参阅 [SQL 项目](sql/README.md) ## Synthesizers [![Python](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue)](https://www.python.org/) ### 安装 ``` pip install smartnoise-synth ``` ### MWEM ``` import pandas as pd import numpy as np pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/ pums = pums.drop(['income'], axis=1) nf = pums.to_numpy().astype(int) synth = snsynth.MWEMSynthesizer(epsilon=1.0, split_factor=nf.shape[1]) synth.fit(nf) sample = synth.sample(10) # get 10 synthetic rows print(sample) ``` ### PATE-CTGAN ``` import pandas as pd import numpy as np from snsynth.pytorch.nn import PATECTGAN from snsynth.pytorch import PytorchDPSynthesizer pums = pd.read_csv(pums_csv_path, index_col=None) # in datasets/ pums = pums.drop(['income'], axis=1) synth = PytorchDPSynthesizer(1.0, PATECTGAN(regularization='dragan'), None) synth.fit(pums, categorical_columns=pums.columns.values.tolist()) sample = synth.sample(10) # synthesize 10 rows print(sample) ``` 请参阅 [Synthesizers 项目](synth/README.md) ## 发布与贡献 如果您遇到 Bug,请通过[创建 issue](https://github.com/opendp/smartnoise-sdk/issues) 告知我们。 我们感谢所有的贡献。请查阅[贡献者指南](contributing.rst)。我们欢迎提交包含 Bug 修复的 Pull Request,无需事先讨论。 如果您计划为此系统贡献新功能、实用函数或扩展,请先开启一个 issue 并与我们讨论该功能。
标签:Apex, GAN, MWEM, PATE-CTGAN, Python, PyTorch, SmartNoise, 代码示例, 关系型数据, 凭据扫描, 匿名化, 合成数据, 多线程, 差分隐私, 开源库, 搜索引擎爬虫, 数据分析, 数据发布, 数据脱敏, 无后门, 机器学习, 深度学习, 统计噪声, 网络安全, 联邦学习, 表格数据, 逆向工具, 隐私保护, 隐私计算