erwanlemerrer/awesome-audit-algorithms
GitHub: erwanlemerrer/awesome-audit-algorithms
一份精选的黑盒算法审计研究论文列表,系统性地收录了算法公平性、隐私保护、模型安全及大语言模型审计等方向的前沿学术成果。
Stars: 118 | Forks: 9
# Awesome 审计算法 [](https://awesome.re)
一份用于审计黑盒算法的精选算法列表。
如今,许多算法(推荐、评分、分类)都在第三方提供商处运行,用户或机构无法了解它们如何处理自己的数据。因此,此列表中的审计算法适用于这种被称为“黑盒”设置的场景,即审计员希望对这些远程算法获得一些洞察。
## 更新
更新缓慢/无更新:已被当前技术取代...
## 目录
- [论文](#papers)
- [相关活动(会议/研讨会)](#related-events)
## 论文
### 2026
- [The Fair Game: Auditing & debiasing AI algorithms over time](https://www.cambridge.org/core/journals/cambridge-forum-on-ai-law-and-governance/article/fair-game-auditing-debiasing-ai-algorithms-over-time/9E8408C67F7CE30505122DD1586D9FA2) - Jouarnal of Cambridge Forum on AI: Law and Governance
- [Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks](https://arxiv.org/pdf/2507.20708) - (arXiv)
### 2025
- [Auditing Pay-Per-Token in Large Language Models](https://arxiv.org/pdf/2510.05181) - (arXiv) *开发了一个基于鞅论 (martingale theory) 的审计框架,使受信任的第三方审计员能够通过顺序查询提供商来检测 token 虚报。*
- [P2NIA: Privacy-Preserving Non-Iterative Auditing](https://arxiv.org/abs/2504.00874) - (ECAI) *为审计员和平台提出了一种互利合作方案:一种隐私保护且非迭代的审计方案,利用合成或本地数据增强公平性评估,避免了与传统基于 API 的审计相关的挑战。*
- [The Fair Game: Auditing & debiasing AI algorithms overtime](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/9E8408C67F7CE30505122DD1586D9FA2/S3033373325000080a.pdf/the-fair-game-auditing-and-debiasing-ai-algorithms-over-time.pdf) - (Cambridge Forum on AI: Law and Governance) *旨在通过创建一个向部署在 ML 系统周围的去偏算法发送反馈的审计员,来模拟社会中伦理和法律框架的演变。*
- [Robust ML Auditing using Prior Knowledge](https://arxiv.org/pdf/2505.04796) - (ICML) *正式确立了审计员如何利用关于 ground truth 的先验知识来防止审计操纵的条件。*
- [CALM: Curiosity-Driven Auditing for Large Language Models](https://arxiv.org/abs/2501.02997) - (AAAI) *将审计视为一个黑盒优化问题,其目标是自动发现目标 LLM 中表现出非法、不道德或不安全行为的输入-输出对。*
- [Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes](https://arxiv.org/abs/2412.13021) - (AAAI) *将模型指纹识别划分为三个核心组件,以识别这些组件中约 100 种以前未被探索的组合,并深入了解它们的性能。*
### 2024
- [Hardware and software platform inference](https://arxiv.org/pdf/2411.05197) - (arXiv) *一种仅基于输入输出行为来识别黑盒机器学习模型底层 GPU 架构和软件栈的方法。*
- [Auditing Local Explanations is Hard](https://arxiv.org/abs/2407.13281) - (NeurIPS) *给出了审计解释的(极高难度的)查询复杂度。*
- [LLMs hallucinate graphs too: a structural perspective](https://arxiv.org/abs/2409.00159) - (complex networks) *通过向 LLM 查询已知图来研究拓扑幻觉。提出了一种结构幻觉排名。*
- [Fairness Auditing with Multi-Agent Collaboration](https://arxiv.org/pdf/2402.08522) - (ECAI) *考虑多个 agent 协同工作,每个 agent 针对不同的任务审计同一平台。*
- [Mapping the Field of Algorithm Auditing: A Systematic Literature Review
Identifying Research Trends, Linguistic and Geographical Disparities](https://arxiv.org/pdf/2401.11194) - (Arxiv) *对算法
审计研究进行了系统性回顾,并识别了其方法论途径中的趋势。*
- [FairProof: Confidential and Certifiable Fairness for Neural Networks](https://arxiv.org/pdf/2402.12572v1.pdf) - (Arxiv) *提出了一种使用零知识证明等密码学工具进行传统审计的替代范式;提供了一个名为 FairProof 的系统,用于验证小型神经网络的公平性。*
- [Under manipulations, are some AI models harder to audit?](https://grodino.github.io/projects/manipulated-audits/preprint.pdf) - (SATML) *使用 Rademacher 复杂度将黑盒审计的难度与目标模型的容量联系起来。*
- [Improved Membership Inference Attacks Against Language Classification Models](https://arxiv.org/pdf/2310.07219.pdf) - (ICLR) *提出了一个在审计模式下对分类器运行成员推理攻击的框架。*
- [Auditing Fairness by Betting](https://arxiv.org/pdf/2305.17570.pdf) - (Neurips) [[Code]](https://github.com/bchugg/auditing-fairness) *允许对来自黑盒分类器或回归器的传入数据进行持续监控的顺序方法。*
### 2023
- [Privacy Auditing with One (1) Training Run](https://neurips.cc/virtual/2023/poster/70925) - (NeurIPS - best paper) *一种通过单次训练运行来审计差分隐私机器学习系统的方案。*
- [Auditing fairness under unawareness through counterfactual reasoning](https://www.sciencedirect.com/science/article/pii/S0306457322003259) - (Information Processing & Management) *展示了如何揭示一个遵守法规的黑盒模型是否仍然存在偏见。*
- [XAudit : A Theoretical Look at Auditing with Explanations](https://arxiv.org/pdf/2206.04740.pdf) - (Arxiv) *形式化了解释在审计中的作用,并调查了模型解释是否以及如何能够帮助审计。*
- [Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models](https://arxiv.org/pdf/2305.12620.pdf) - (Arxiv) *提出了一种利用语言模型自身来延长审计数据集寿命的方法;同时也发现了当前偏见审计指标存在的问题,并提出了替代方案——这些替代方案表明,模型的脆弱性表面上提高了之前的偏见得分。*
- [Online Fairness Auditing through Iterative Refinement](https://dl.acm.org/doi/pdf/10.1145/3580305.3599454) - (KDD) *提供了一个自适应流程,可自动化推断与估计公平性指标相关的概率保证。*
- [Stealing the Decoding Algorithms of Language Models](https://people.cs.umass.edu/~amir/papers/CCS23-LM-stealing.pdf) - (CCS) *窃取 LLM 解码算法的类型和超参数。*
- [Modeling rabbit‑holes on YouTube](https://link.springer.com/epdf/10.1007/s13278-023-01105-9?sharing_token=h-O-asHI49VUWS9FxN1Gsve4RwlQNchNByi7wbcMAY6I98PKW1PqhFQJ_JqQyk3TrB05qDb3LUzMDmKOgrupccQliViDle-rwKEi2MZ8xBViaAQhyN41oZBKLLeXchoeIW2kklVHC094I5KD8pxja4-if6-iB0uAI1FnqnYoxjU%3D) - (SNAM) *对 YouTube 中用户陷入信息茧房的困住动态进行建模,并提供了一种衡量这种封闭状态的方法。*
- [Auditing YouTube’s Recommendation Algorithm for Misinformation Filter Bubbles](https://dl.acm.org/doi/full/10.1145/3568392) - (Transactions on Recommender Systems) *“打破泡沫”需要什么,即从推荐中恢复被泡沫封闭的状态。*
- [Auditing Yelp’s Business Ranking and Review Recommendation Through the Lens of Fairness](https://arxiv.org/pdf/2308.02129.pdf) - (Arxiv) *利用人口统计平等性、曝光度以及分位数线性和逻辑回归等统计测试,对 Yelp 的商业排名和评论推荐系统的公平性进行审计。*
- [Confidential-PROFITT: Confidential PROof of FaIr Training of Trees](https://openreview.net/pdf?id=iIfDQVyuFD) - (ICLR) *提出了公平的决策树学习算法以及零知识证明协议,以在被审计的服务器上获得公平性证明。*
- [SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency](https://arxiv.org/pdf/2302.03251.pdf) - (ICLR) *考虑了在机器学习即服务(MLaaS)应用中黑盒设置下的后门检测。*
### 2022
- [Two-Face: Adversarial Audit of Commercial Face Recognition Systems](https://ojs.aaai.org/index.php/ICWSM/article/view/19300/19072) - (ICWSM) *对多个系统 API 和数据集进行了对抗性审计,并得出了许多令人担忧的观察结果。*
- [Scaling up search engine audits: Practical insights for algorithm auditing](https://journals.sagepub.com/doi/10.1177/01655515221093029) - (Journal of Information Science) [(Code)](https://github.com/gesiscss/WebBot) *使用具有虚拟 agent 的模拟浏览行为审计多个搜索引擎。*
- [A zest of lime: towards architecture-independent model distances](https://openreview.net/pdf?id=OUz_9TiTv9j) - (ICLR) *使用 LIME 测量两个远程模型之间的距离。*
- [Active Fairness Auditing](https://proceedings.mlr.press/v162/yan22c/yan22c.pdf) - (ICML) *研究了基于查询的审计算法,该算法能够以高效的查询方式估计 ML 模型的人口统计平等性。*
- [Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis](https://proceedings.neurips.cc/paper/2021/file/da94cbeff56cfda50785df477941308b-Paper.pdf) - (NeurIPS) *Sobol 指数提供了一种有效的方法,通过方差的视角来捕捉图像区域之间的高阶相互作用及其对(黑盒)神经网络预测的贡献。*
- [Your Echos are Heard: Tracking, Profiling, and Ad Targeting in the Amazon Smart Speaker Ecosystem](https://arxiv.org/pdf/2204.10920.pdf) - (arxiv) *推断 Amazon Echo 系统与广告定向算法之间的联系。*
### 2021
- [When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces](https://arxiv.org/pdf/2102.00141.pdf) - (FAccT) *Amazon 自有品牌产品是否获得了不公平的推荐份额,从而比第三方产品更具优势?*
- [Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors](https://arxiv.org/pdf/2105.02980.pdf) - (CHI) *为用户的“日常算法审计”提供了依据。*
- [Auditing Black-Box Prediction Models for Data Minimization Compliance](https://www.cs.bu.edu/faculty/crovella/paper-archive/minimization-audit-Neurips21.pdf) - (NeurIPS) *使用有限数量的查询来测量预测模型所满足的数据最小化水平。*
- [Setting the Record Straighter on Shadow Banning](https://arxiv.org/abs/2012.05101) - (INFOCOM) [(Code)](https://gitlab.enseeiht.fr/bmorgan/infocom-2021) *考虑了 Twitter 中影子封禁(即审核黑盒算法)的可能性,并测量了几个假设的概率。*
- [Extracting Training Data from Large Language Models](https://arxiv.org/pdf/2012.07805.pdf) - (USENIX Security) *从 GPT-2 模型的训练数据中提取逐字的文本序列。*
- [FairLens: Auditing black-box clinical decision support systems](https://www.sciencedirect.com/science/article/pii/S030645732100145X?casa_token=oyjFKij269MAAAAA:w_ohScpMPNMnkDdzBqAIod5QfBgQlq5Ht9mMRSOydZpOgNG-i1yuqEmBjWN__38gOGmjNL7dVT0) - (Information Processing & Management) *提出了一种 pipeline,通过比较不同的多标签分类差异度量来检测并解释临床决策支持系统(Clinical DSS)中潜在的公平性问题。*
- [Auditing Algorithmic Bias on Twitter](https://dl.acm.org/doi/abs/10.1145/3447535.3462491) - (WebSci).
- [Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information](https://proceedings.mlr.press/v139/neiswanger21a.html) - (ICML) *一种在预算约束下提取黑盒算法属性的贝叶斯优化程序。*
### 2020
- [Black-Box Ripper: Copying black-box models using generative evolutionary algorithms](https://proceedings.neurips.cc/paper/2020/file/e8d66338fab3727e34a9179ed8804f64-Paper.pdf) - (NeurIPS) *复制黑盒神经模型的功能,且对查询数量没有限制(通过 teacher/student 方案和进化搜索)。*
- [Auditing radicalization pathways on ](https://dl.acm.org/doi/pdf/10.1145/3351095.3372879) - (FAT*) *通过在静态频道推荐上进行随机游走,研究各个极端频道之间的可达性。*
- [Adversarial Model Extraction on Graph Neural Networks](https://arxiv.org/abs/1912.07721) - (AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications) *介绍了 GNN 模型提取,并提出了一种初步的解决方法。*
- [Remote Explainability faces the bouncer problem](https://rdcu.be/b6qB4) - (Nature Machine Intelligence volume 2, pages529–539) [(Code)](https://github.com/erwanlemerrer/bouncer_problem) *展示了(通过一次请求)发现远程 AI 决策解释中存在谎言的不可能性或困难性。*
- [GeoDA: a geometric framework for black-box adversarial attacks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Rahmati_GeoDA_A_Geometric_Framework_for_Black-Box_Adversarial_Attacks_CVPR_2020_paper.pdf) - (CVPR) [(Code)](https://github.com/thisisalirah/GeoDA) *在纯黑盒设置(无梯度,仅推断类别)下制作对抗样本以欺骗模型。*
- [The Imitation Game: Algorithm Selectionby Exploiting Black-Box Recommender](https://github.com/erwanlemerrer/erwanlemerrer.github.io/raw/master/files/imitation_blackbox_recommenders_netys-2020.pdf) - (Netys) [(Code)](https://github.com/gdamaskinos/RecRank) *通过模仿远程且训练更充分的推荐算法的决策,来对本地推荐算法进行参数化。*
- [Auditing News Curation Systems:A Case Study Examining Algorithmic and Editorial Logic in Apple News](https://ojs.aaai.org/index.php/ICWSM/article/view/7277) - (ICWSM) *对 Apple News 作为社会技术新闻策展系统(热门故事部分)的审计研究。*
- [Auditing Algorithms: On Lessons Learned and the Risks of DataMinimization](https://dl.acm.org/doi/pdf/10.1145/3375627.3375852) - (AIES) *对由 Telefónica 开发的一款健康推荐应用进行的实际审计(主要针对偏见)。*
- [Extracting Training Data from Large Language Models](https://arxiv.org/pdf/2012.07805) - (arxiv) *执行训练数据提取攻击,通过查询语言模型来恢复单个训练样本。*
### 2019
- [Adversarial Frontier Stitching for Remote Neural Network Watermarking](https://arxiv.org/abs/1711.01894) - (Neural Computing and Applications) [(Alternative implementation)](https://github.com/dunky11/adversarial-frontier-stitching) *检查远程机器学习模型是否为“泄露”的模型:通过对远程模型进行标准 API 请求,提取(或未提取)插入到有价值的模型(例如,大型深度神经网络)中以进行标记的零比特水印。*
- [Knockoff Nets: Stealing Functionality of Black-Box Models](https://arxiv.org/abs/1812.02766.pdf) - (CVPR) *探讨对手在仅基于黑盒交互(输入图像,输出预测)的情况下,能在多大程度上窃取此类“受害”模型的功能。*
- [Opening Up the Black Box:Auditing Google's Top Stories Algorithm](https://par.nsf.gov/servlets/purl/10101277) - (Flairs-32) *对 Google 的 Top stories 面板进行审计,深入了解其在选择和排名新闻出版商时的算法选择。*
- [Making targeted black-box evasion attacks effective andefficient](https://arxiv.org/pdf/1906.03397.pdf) - (arXiv) *调查对手如何最优地利用其查询预算,对深度神经网络进行针对性的规避攻击。*
- [Online Learning for Measuring Incentive Compatibility in Ad Auctions](https://research.fb.com/wp-content/uploads/2019/05/Online-Learning-for-Measuring-Incentive-Compatibility-in-Ad-Auctions.pdf) - (WWW) *测量黑盒拍卖平台的激励兼容 (IC) 机制(遗憾值)。*
- [TamperNN: Efficient Tampering Detection of Deployed Neural Nets](https://arxiv.org/abs/1903.00317) - (ISSRE) *用于制作能够检测远程执行分类器模型是否被篡改的输入的算法。*
- [Neural Network Model Extraction Attacks in Edge Devicesby Hearing Architectural Hints](https://arxiv.org/pdf/1903.03916.pdf) - (arxiv) *通过总线侦听获取内存访问事件,利用 LSTM-CTC 模型识别层序列,根据内存访问模式连接层拓扑结构,并在数据量限制下估计层维度,证明了人们可以准确地恢复与攻击起点相似的网络架构。*
- [Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data](https://ieeexplore.ieee.org/abstract/document/8851798) - (ICNN) *一种复合方法,即使完全隐藏了 softmax 输出,也可用于攻击并提取黑盒模型的知识。*
- [Neural Network Inversion in Adversarial Setting via Background Knowledge Alignment](https://dl.acm.org/citation.cfm?id=3354261) - (CCS) *对抗环境下的模型反演方法,基于训练一个作为原始模型逆函数的反演模型。在缺乏原始训练数据完整知识的情况下,通过在从更通用的数据分布中提取的辅助样本上训练反演模型,仍然可以进行准确的反演。*
### 2018
- [Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR](https://arxiv.org/abs/1711.00399) - (Harvard Journal of Law & Technology) *为了解释对 x 的决策,需寻找一个反事实:即改变决策的离 x 最近的点。*
- [Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation](https://arxiv.org/abs/1710.06169) - (AIES) *将黑盒模型视为教师,训练透明的学生模型来模仿黑盒模型分配的风险评分。*
- [Towards Reverse-Engineering Black-Box Neural Networks](https://arxiv.org/abs/1711.01768) - (ICLR) [(Code)](https://github.com/coallaoh/WhitenBlackBox) *通过分析远程神经网络模型对特定输入的响应模式,推断其内部超参数(例如层数、非线性激活类型)。*
- [Data driven exploratory attacks on black box classifiers in adversarial domains](https://www.sciencedirect.com/science/article/pii/S092523121830136X) - (Neurocomputing) *对远程分类器模型进行逆向工程(例如,用于规避 CAPTCHA 测试)。*
- [xGEMs: Generating Examplars to Explain Black-Box Models](https://arxiv.org/pdf/1806.08867.pdf) - (arXiv) *通过训练无监督的隐式生成模型来寻找黑盒模型中的偏差。然后通过沿着数据流形扰动数据样本,定量地总结黑盒模型的行为。*
- [Learning Networks from Random Walk-Based Node Similarities](https://arxiv.org/pdf/1801.07386) - (NIPS) *通过观察一些随机游走的通勤时间来逆向推导图结构。*
- [Identifying the Machine Learning Family from Black-Box Models](https://rd.springer.com/chapter/10.1007/978-3-030-00374-6_6) - (CAEPIA) *根据返回的预测结果确定其背后是哪种机器学习模型。*
- [Stealing Neural Networks via Timing Side Channels](https://arxiv.org/pdf/1812.11720.pdf) - (arXiv) *利用定时攻击和查询来窃取/近似模型。*
- [Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data](https://arxiv.org/abs/1806.05476) - (IJCNN) [(Code)](https://github.com/jeiks/Stealing_DL_Models) *通过用随机自然图像(ImageNet 和 Microsoft-COCO)查询黑盒模型(CNN),窃取其知识。*
- [Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages](https://dl.acm.org/doi/10.1145/3178876.3186143) - (WWW) *一个 Chrome 扩展程序,用于调查参与者并收集搜索引擎结果页面(SERP)和自动补全建议,以研究个性化和组成。*
### 2017
- [Uncovering Influence Cookbooks : Reverse Engineering the Topological Impact in Peer Ranking Services](https://dl.acm.org/authorize.cfm?key=N21772) - (CSCW) *旨在识别对等排名服务中正在使用哪些中心性度量。*
- [The topological face of recommendation: models and application to bias detection](https://arxiv.org/abs/1704.08991) - (Complex Networks) *提出了一种针对推荐给用户的物品的偏差检测框架。*
- [Membership Inference Attacks Against Machine Learning Models](http://ieeexplore.ieee.org/document/7958568/) - (Symposium on Security and Privacy) *给定一个机器学习模型和一条记录,确定该记录是否曾作为模型训练数据集的一部分。*
- [Practical Black-Box Attacks against Machine Learning](https://dl.acm.org/citation.cfm?id=3053009) - (Asia CCS) *了解远程服务在面对对抗性分类攻击时的脆弱性。*
### 2016
- [Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems](https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf) - (IEEE S&P) *使用 shapley 值评估特征对模型的独立、联合和边际影响。*
- [Auditing Black-Box Models for Indirect Influence](https://arxiv.org/abs/1602.07043) - (ICDM) *通过“巧妙地”从数据集中移除某个变量并观察准确率差距,来评估该变量对黑盒模型的影响。*
- [Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models](https://arxiv.org/abs/1611.04967) - (FATML Workshop) *执行特征排序以分析黑盒模型。*
- [Bias in Online Freelance Marketplaces: Evidence from TaskRabbit](http://datworkshop.org/papers/dat16-final22.pdf) - (dat workshop) *测量 TaskRabbit 的搜索算法排名。*
- [Stealing Machine Learning Models via Prediction APIs](https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer) - (Usenix Security) [(Code)](https://github.com/ftramer/Steal-ML) *旨在提取远程服务正在使用的机器学习模型。*
- [“Why Should I Trust You?”Explaining the Predictions of Any Classifier](https://arxiv.org/pdf/1602.04938v3.pdf) - (arXiv) [(Code)](https://github.com/marcotcr/lime-experiments) *通过在数据实例周围进行采样来解释黑盒分类器模型。*
- [Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters](http://ieeexplore.ieee.org/document/7546497/) - (Security and Privacy) *对过滤器进行黑盒分析。*
- [Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems](http://ieeexplore.ieee.org/document/7546525/) - (Security and Privacy) *引入了捕捉输入对观察到的系统输出影响程度的度量标准。*
- [An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace](https://mislove.org/publications/Amazon-WWW.pdf) - (WWW) [(Code)](http://personalization.ccs.neu.edu) *开发了一种检测算法定价的方法,并利用它在 Amazon Marketplace 上实证分析了其普遍性和行为。*
### 2015
- [Certifying and Removing Disparate Impact](https://arxiv.org/abs/1412.3756) - (SIGKDD) *提出基于 SVM 的方法来证明不存在偏差,以及从数据集中消除偏差的方法。*
- [Peeking Beneath the Hood of Uber](https://dl.acm.org/citation.cfm?id=2815681) - (IMC) *推断 Uber 动态定价算法的实现细节。*
### 2014
- [A peek into the black box: exploring classifiers by randomization]() - (Data Mining and Knowledge Discovery journal) ([code](https://github.com/tsabsch/goldeneye)) *寻找可以在不改变预测样本输出标签的情况下进行置换的特征组。*
- [XRay: Enhancing the Web's Transparency with Differential Correlation](https://www.usenix.org/node/184394) - (USENIX Security) *审计哪些用户个人资料数据被用于定向特定的广告、推荐或价格。*
### 2013
- [Measuring Personalization of Web Search](https://dl.acm.org/citation.cfm?id=2488435) - (WWW) *开发了一种测量 Web 搜索结果中个性化的方法。*
- [Auditing: Active Learning with Outcome-Dependent Query Costs](https://www.cs.bgu.ac.il/~sabatos/papers/SabatoSarwate13.pdf) - (NIPS) *在只支付负标签费用的情况下学习二元分类器。*
### 2012
- [Query Strategies for Evading Convex-Inducing Classifiers](http://www.jmlr.org/papers/v13/nelson12a.html) - (JMLR) *针对凸分类器的规避方法。考虑了规避复杂性。*
### 2008
- [Privacy Oracle: a System for Finding Application Leakswith Black Box Differential Testing](https://dl.acm.org/citation.cfm?id=1455806) - (CCS) *Privacy Oracle:一个发现应用程序在向远程服务器传输过程中泄露个人信息的系统。*
### 2005
- [Adversarial Learning](https://dl.acm.org/citation.cfm?id=1081950) - (KDD) *使用成员查询对远程线性分类器进行逆向工程。*
## 相关活动
### 2025
* [AIMLAI at ECML/PKDD 2025](https://project.inria.fr/aimlai/)
* [AAAI workshop on AI Governance: Alignment, Morality, and Law](https://aaai.org/conference/aaai/aaai-25/workshop-list/#ws06)
### 2024
* [1st International Conference on Auditing and Artificial Intelligence](https://www.ircg.msm.uni-due.de/ai/)
* [Regulatable ML Workshop (RegML'24)](https://regulatableml.github.io/)
### 2023
* [Supporting User Engagement in Testing, Auditing, and Contesting AI (CSCW User AI Auditing)](https://cscw-user-ai-auditing.github.io/)
* [Workshop on Algorithmic Audits of Algorithms (WAAA)](https://algorithmic-audits.github.io)
* [Regulatable ML Workshop (RegML'23)](https://regulatableml.github.io/)
## 更新
更新缓慢/无更新:已被当前技术取代...
## 目录
- [论文](#papers)
- [相关活动(会议/研讨会)](#related-events)
## 论文
### 2026
- [The Fair Game: Auditing & debiasing AI algorithms over time](https://www.cambridge.org/core/journals/cambridge-forum-on-ai-law-and-governance/article/fair-game-auditing-debiasing-ai-algorithms-over-time/9E8408C67F7CE30505122DD1586D9FA2) - Jouarnal of Cambridge Forum on AI: Law and Governance
- [Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks](https://arxiv.org/pdf/2507.20708) - (arXiv)
### 2025
- [Auditing Pay-Per-Token in Large Language Models](https://arxiv.org/pdf/2510.05181) - (arXiv) *开发了一个基于鞅论 (martingale theory) 的审计框架,使受信任的第三方审计员能够通过顺序查询提供商来检测 token 虚报。*
- [P2NIA: Privacy-Preserving Non-Iterative Auditing](https://arxiv.org/abs/2504.00874) - (ECAI) *为审计员和平台提出了一种互利合作方案:一种隐私保护且非迭代的审计方案,利用合成或本地数据增强公平性评估,避免了与传统基于 API 的审计相关的挑战。*
- [The Fair Game: Auditing & debiasing AI algorithms overtime](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/9E8408C67F7CE30505122DD1586D9FA2/S3033373325000080a.pdf/the-fair-game-auditing-and-debiasing-ai-algorithms-over-time.pdf) - (Cambridge Forum on AI: Law and Governance) *旨在通过创建一个向部署在 ML 系统周围的去偏算法发送反馈的审计员,来模拟社会中伦理和法律框架的演变。*
- [Robust ML Auditing using Prior Knowledge](https://arxiv.org/pdf/2505.04796) - (ICML) *正式确立了审计员如何利用关于 ground truth 的先验知识来防止审计操纵的条件。*
- [CALM: Curiosity-Driven Auditing for Large Language Models](https://arxiv.org/abs/2501.02997) - (AAAI) *将审计视为一个黑盒优化问题,其目标是自动发现目标 LLM 中表现出非法、不道德或不安全行为的输入-输出对。*
- [Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes](https://arxiv.org/abs/2412.13021) - (AAAI) *将模型指纹识别划分为三个核心组件,以识别这些组件中约 100 种以前未被探索的组合,并深入了解它们的性能。*
### 2024
- [Hardware and software platform inference](https://arxiv.org/pdf/2411.05197) - (arXiv) *一种仅基于输入输出行为来识别黑盒机器学习模型底层 GPU 架构和软件栈的方法。*
- [Auditing Local Explanations is Hard](https://arxiv.org/abs/2407.13281) - (NeurIPS) *给出了审计解释的(极高难度的)查询复杂度。*
- [LLMs hallucinate graphs too: a structural perspective](https://arxiv.org/abs/2409.00159) - (complex networks) *通过向 LLM 查询已知图来研究拓扑幻觉。提出了一种结构幻觉排名。*
- [Fairness Auditing with Multi-Agent Collaboration](https://arxiv.org/pdf/2402.08522) - (ECAI) *考虑多个 agent 协同工作,每个 agent 针对不同的任务审计同一平台。*
- [Mapping the Field of Algorithm Auditing: A Systematic Literature Review
Identifying Research Trends, Linguistic and Geographical Disparities](https://arxiv.org/pdf/2401.11194) - (Arxiv) *对算法
审计研究进行了系统性回顾,并识别了其方法论途径中的趋势。*
- [FairProof: Confidential and Certifiable Fairness for Neural Networks](https://arxiv.org/pdf/2402.12572v1.pdf) - (Arxiv) *提出了一种使用零知识证明等密码学工具进行传统审计的替代范式;提供了一个名为 FairProof 的系统,用于验证小型神经网络的公平性。*
- [Under manipulations, are some AI models harder to audit?](https://grodino.github.io/projects/manipulated-audits/preprint.pdf) - (SATML) *使用 Rademacher 复杂度将黑盒审计的难度与目标模型的容量联系起来。*
- [Improved Membership Inference Attacks Against Language Classification Models](https://arxiv.org/pdf/2310.07219.pdf) - (ICLR) *提出了一个在审计模式下对分类器运行成员推理攻击的框架。*
- [Auditing Fairness by Betting](https://arxiv.org/pdf/2305.17570.pdf) - (Neurips) [[Code]](https://github.com/bchugg/auditing-fairness) *允许对来自黑盒分类器或回归器的传入数据进行持续监控的顺序方法。*
### 2023
- [Privacy Auditing with One (1) Training Run](https://neurips.cc/virtual/2023/poster/70925) - (NeurIPS - best paper) *一种通过单次训练运行来审计差分隐私机器学习系统的方案。*
- [Auditing fairness under unawareness through counterfactual reasoning](https://www.sciencedirect.com/science/article/pii/S0306457322003259) - (Information Processing & Management) *展示了如何揭示一个遵守法规的黑盒模型是否仍然存在偏见。*
- [XAudit : A Theoretical Look at Auditing with Explanations](https://arxiv.org/pdf/2206.04740.pdf) - (Arxiv) *形式化了解释在审计中的作用,并调查了模型解释是否以及如何能够帮助审计。*
- [Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and Models](https://arxiv.org/pdf/2305.12620.pdf) - (Arxiv) *提出了一种利用语言模型自身来延长审计数据集寿命的方法;同时也发现了当前偏见审计指标存在的问题,并提出了替代方案——这些替代方案表明,模型的脆弱性表面上提高了之前的偏见得分。*
- [Online Fairness Auditing through Iterative Refinement](https://dl.acm.org/doi/pdf/10.1145/3580305.3599454) - (KDD) *提供了一个自适应流程,可自动化推断与估计公平性指标相关的概率保证。*
- [Stealing the Decoding Algorithms of Language Models](https://people.cs.umass.edu/~amir/papers/CCS23-LM-stealing.pdf) - (CCS) *窃取 LLM 解码算法的类型和超参数。*
- [Modeling rabbit‑holes on YouTube](https://link.springer.com/epdf/10.1007/s13278-023-01105-9?sharing_token=h-O-asHI49VUWS9FxN1Gsve4RwlQNchNByi7wbcMAY6I98PKW1PqhFQJ_JqQyk3TrB05qDb3LUzMDmKOgrupccQliViDle-rwKEi2MZ8xBViaAQhyN41oZBKLLeXchoeIW2kklVHC094I5KD8pxja4-if6-iB0uAI1FnqnYoxjU%3D) - (SNAM) *对 YouTube 中用户陷入信息茧房的困住动态进行建模,并提供了一种衡量这种封闭状态的方法。*
- [Auditing YouTube’s Recommendation Algorithm for Misinformation Filter Bubbles](https://dl.acm.org/doi/full/10.1145/3568392) - (Transactions on Recommender Systems) *“打破泡沫”需要什么,即从推荐中恢复被泡沫封闭的状态。*
- [Auditing Yelp’s Business Ranking and Review Recommendation Through the Lens of Fairness](https://arxiv.org/pdf/2308.02129.pdf) - (Arxiv) *利用人口统计平等性、曝光度以及分位数线性和逻辑回归等统计测试,对 Yelp 的商业排名和评论推荐系统的公平性进行审计。*
- [Confidential-PROFITT: Confidential PROof of FaIr Training of Trees](https://openreview.net/pdf?id=iIfDQVyuFD) - (ICLR) *提出了公平的决策树学习算法以及零知识证明协议,以在被审计的服务器上获得公平性证明。*
- [SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency](https://arxiv.org/pdf/2302.03251.pdf) - (ICLR) *考虑了在机器学习即服务(MLaaS)应用中黑盒设置下的后门检测。*
### 2022
- [Two-Face: Adversarial Audit of Commercial Face Recognition Systems](https://ojs.aaai.org/index.php/ICWSM/article/view/19300/19072) - (ICWSM) *对多个系统 API 和数据集进行了对抗性审计,并得出了许多令人担忧的观察结果。*
- [Scaling up search engine audits: Practical insights for algorithm auditing](https://journals.sagepub.com/doi/10.1177/01655515221093029) - (Journal of Information Science) [(Code)](https://github.com/gesiscss/WebBot) *使用具有虚拟 agent 的模拟浏览行为审计多个搜索引擎。*
- [A zest of lime: towards architecture-independent model distances](https://openreview.net/pdf?id=OUz_9TiTv9j) - (ICLR) *使用 LIME 测量两个远程模型之间的距离。*
- [Active Fairness Auditing](https://proceedings.mlr.press/v162/yan22c/yan22c.pdf) - (ICML) *研究了基于查询的审计算法,该算法能够以高效的查询方式估计 ML 模型的人口统计平等性。*
- [Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis](https://proceedings.neurips.cc/paper/2021/file/da94cbeff56cfda50785df477941308b-Paper.pdf) - (NeurIPS) *Sobol 指数提供了一种有效的方法,通过方差的视角来捕捉图像区域之间的高阶相互作用及其对(黑盒)神经网络预测的贡献。*
- [Your Echos are Heard: Tracking, Profiling, and Ad Targeting in the Amazon Smart Speaker Ecosystem](https://arxiv.org/pdf/2204.10920.pdf) - (arxiv) *推断 Amazon Echo 系统与广告定向算法之间的联系。*
### 2021
- [When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces](https://arxiv.org/pdf/2102.00141.pdf) - (FAccT) *Amazon 自有品牌产品是否获得了不公平的推荐份额,从而比第三方产品更具优势?*
- [Everyday Algorithm Auditing: Understanding the Power of Everyday Users in Surfacing Harmful Algorithmic Behaviors](https://arxiv.org/pdf/2105.02980.pdf) - (CHI) *为用户的“日常算法审计”提供了依据。*
- [Auditing Black-Box Prediction Models for Data Minimization Compliance](https://www.cs.bu.edu/faculty/crovella/paper-archive/minimization-audit-Neurips21.pdf) - (NeurIPS) *使用有限数量的查询来测量预测模型所满足的数据最小化水平。*
- [Setting the Record Straighter on Shadow Banning](https://arxiv.org/abs/2012.05101) - (INFOCOM) [(Code)](https://gitlab.enseeiht.fr/bmorgan/infocom-2021) *考虑了 Twitter 中影子封禁(即审核黑盒算法)的可能性,并测量了几个假设的概率。*
- [Extracting Training Data from Large Language Models](https://arxiv.org/pdf/2012.07805.pdf) - (USENIX Security) *从 GPT-2 模型的训练数据中提取逐字的文本序列。*
- [FairLens: Auditing black-box clinical decision support systems](https://www.sciencedirect.com/science/article/pii/S030645732100145X?casa_token=oyjFKij269MAAAAA:w_ohScpMPNMnkDdzBqAIod5QfBgQlq5Ht9mMRSOydZpOgNG-i1yuqEmBjWN__38gOGmjNL7dVT0) - (Information Processing & Management) *提出了一种 pipeline,通过比较不同的多标签分类差异度量来检测并解释临床决策支持系统(Clinical DSS)中潜在的公平性问题。*
- [Auditing Algorithmic Bias on Twitter](https://dl.acm.org/doi/abs/10.1145/3447535.3462491) - (WebSci).
- [Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information](https://proceedings.mlr.press/v139/neiswanger21a.html) - (ICML) *一种在预算约束下提取黑盒算法属性的贝叶斯优化程序。*
### 2020
- [Black-Box Ripper: Copying black-box models using generative evolutionary algorithms](https://proceedings.neurips.cc/paper/2020/file/e8d66338fab3727e34a9179ed8804f64-Paper.pdf) - (NeurIPS) *复制黑盒神经模型的功能,且对查询数量没有限制(通过 teacher/student 方案和进化搜索)。*
- [Auditing radicalization pathways on ](https://dl.acm.org/doi/pdf/10.1145/3351095.3372879) - (FAT*) *通过在静态频道推荐上进行随机游走,研究各个极端频道之间的可达性。*
- [Adversarial Model Extraction on Graph Neural Networks](https://arxiv.org/abs/1912.07721) - (AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications) *介绍了 GNN 模型提取,并提出了一种初步的解决方法。*
- [Remote Explainability faces the bouncer problem](https://rdcu.be/b6qB4) - (Nature Machine Intelligence volume 2, pages529–539) [(Code)](https://github.com/erwanlemerrer/bouncer_problem) *展示了(通过一次请求)发现远程 AI 决策解释中存在谎言的不可能性或困难性。*
- [GeoDA: a geometric framework for black-box adversarial attacks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Rahmati_GeoDA_A_Geometric_Framework_for_Black-Box_Adversarial_Attacks_CVPR_2020_paper.pdf) - (CVPR) [(Code)](https://github.com/thisisalirah/GeoDA) *在纯黑盒设置(无梯度,仅推断类别)下制作对抗样本以欺骗模型。*
- [The Imitation Game: Algorithm Selectionby Exploiting Black-Box Recommender](https://github.com/erwanlemerrer/erwanlemerrer.github.io/raw/master/files/imitation_blackbox_recommenders_netys-2020.pdf) - (Netys) [(Code)](https://github.com/gdamaskinos/RecRank) *通过模仿远程且训练更充分的推荐算法的决策,来对本地推荐算法进行参数化。*
- [Auditing News Curation Systems:A Case Study Examining Algorithmic and Editorial Logic in Apple News](https://ojs.aaai.org/index.php/ICWSM/article/view/7277) - (ICWSM) *对 Apple News 作为社会技术新闻策展系统(热门故事部分)的审计研究。*
- [Auditing Algorithms: On Lessons Learned and the Risks of DataMinimization](https://dl.acm.org/doi/pdf/10.1145/3375627.3375852) - (AIES) *对由 Telefónica 开发的一款健康推荐应用进行的实际审计(主要针对偏见)。*
- [Extracting Training Data from Large Language Models](https://arxiv.org/pdf/2012.07805) - (arxiv) *执行训练数据提取攻击,通过查询语言模型来恢复单个训练样本。*
### 2019
- [Adversarial Frontier Stitching for Remote Neural Network Watermarking](https://arxiv.org/abs/1711.01894) - (Neural Computing and Applications) [(Alternative implementation)](https://github.com/dunky11/adversarial-frontier-stitching) *检查远程机器学习模型是否为“泄露”的模型:通过对远程模型进行标准 API 请求,提取(或未提取)插入到有价值的模型(例如,大型深度神经网络)中以进行标记的零比特水印。*
- [Knockoff Nets: Stealing Functionality of Black-Box Models](https://arxiv.org/abs/1812.02766.pdf) - (CVPR) *探讨对手在仅基于黑盒交互(输入图像,输出预测)的情况下,能在多大程度上窃取此类“受害”模型的功能。*
- [Opening Up the Black Box:Auditing Google's Top Stories Algorithm](https://par.nsf.gov/servlets/purl/10101277) - (Flairs-32) *对 Google 的 Top stories 面板进行审计,深入了解其在选择和排名新闻出版商时的算法选择。*
- [Making targeted black-box evasion attacks effective andefficient](https://arxiv.org/pdf/1906.03397.pdf) - (arXiv) *调查对手如何最优地利用其查询预算,对深度神经网络进行针对性的规避攻击。*
- [Online Learning for Measuring Incentive Compatibility in Ad Auctions](https://research.fb.com/wp-content/uploads/2019/05/Online-Learning-for-Measuring-Incentive-Compatibility-in-Ad-Auctions.pdf) - (WWW) *测量黑盒拍卖平台的激励兼容 (IC) 机制(遗憾值)。*
- [TamperNN: Efficient Tampering Detection of Deployed Neural Nets](https://arxiv.org/abs/1903.00317) - (ISSRE) *用于制作能够检测远程执行分类器模型是否被篡改的输入的算法。*
- [Neural Network Model Extraction Attacks in Edge Devicesby Hearing Architectural Hints](https://arxiv.org/pdf/1903.03916.pdf) - (arxiv) *通过总线侦听获取内存访问事件,利用 LSTM-CTC 模型识别层序列,根据内存访问模式连接层拓扑结构,并在数据量限制下估计层维度,证明了人们可以准确地恢复与攻击起点相似的网络架构。*
- [Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data](https://ieeexplore.ieee.org/abstract/document/8851798) - (ICNN) *一种复合方法,即使完全隐藏了 softmax 输出,也可用于攻击并提取黑盒模型的知识。*
- [Neural Network Inversion in Adversarial Setting via Background Knowledge Alignment](https://dl.acm.org/citation.cfm?id=3354261) - (CCS) *对抗环境下的模型反演方法,基于训练一个作为原始模型逆函数的反演模型。在缺乏原始训练数据完整知识的情况下,通过在从更通用的数据分布中提取的辅助样本上训练反演模型,仍然可以进行准确的反演。*
### 2018
- [Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR](https://arxiv.org/abs/1711.00399) - (Harvard Journal of Law & Technology) *为了解释对 x 的决策,需寻找一个反事实:即改变决策的离 x 最近的点。*
- [Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation](https://arxiv.org/abs/1710.06169) - (AIES) *将黑盒模型视为教师,训练透明的学生模型来模仿黑盒模型分配的风险评分。*
- [Towards Reverse-Engineering Black-Box Neural Networks](https://arxiv.org/abs/1711.01768) - (ICLR) [(Code)](https://github.com/coallaoh/WhitenBlackBox) *通过分析远程神经网络模型对特定输入的响应模式,推断其内部超参数(例如层数、非线性激活类型)。*
- [Data driven exploratory attacks on black box classifiers in adversarial domains](https://www.sciencedirect.com/science/article/pii/S092523121830136X) - (Neurocomputing) *对远程分类器模型进行逆向工程(例如,用于规避 CAPTCHA 测试)。*
- [xGEMs: Generating Examplars to Explain Black-Box Models](https://arxiv.org/pdf/1806.08867.pdf) - (arXiv) *通过训练无监督的隐式生成模型来寻找黑盒模型中的偏差。然后通过沿着数据流形扰动数据样本,定量地总结黑盒模型的行为。*
- [Learning Networks from Random Walk-Based Node Similarities](https://arxiv.org/pdf/1801.07386) - (NIPS) *通过观察一些随机游走的通勤时间来逆向推导图结构。*
- [Identifying the Machine Learning Family from Black-Box Models](https://rd.springer.com/chapter/10.1007/978-3-030-00374-6_6) - (CAEPIA) *根据返回的预测结果确定其背后是哪种机器学习模型。*
- [Stealing Neural Networks via Timing Side Channels](https://arxiv.org/pdf/1812.11720.pdf) - (arXiv) *利用定时攻击和查询来窃取/近似模型。*
- [Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data](https://arxiv.org/abs/1806.05476) - (IJCNN) [(Code)](https://github.com/jeiks/Stealing_DL_Models) *通过用随机自然图像(ImageNet 和 Microsoft-COCO)查询黑盒模型(CNN),窃取其知识。*
- [Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages](https://dl.acm.org/doi/10.1145/3178876.3186143) - (WWW) *一个 Chrome 扩展程序,用于调查参与者并收集搜索引擎结果页面(SERP)和自动补全建议,以研究个性化和组成。*
### 2017
- [Uncovering Influence Cookbooks : Reverse Engineering the Topological Impact in Peer Ranking Services](https://dl.acm.org/authorize.cfm?key=N21772) - (CSCW) *旨在识别对等排名服务中正在使用哪些中心性度量。*
- [The topological face of recommendation: models and application to bias detection](https://arxiv.org/abs/1704.08991) - (Complex Networks) *提出了一种针对推荐给用户的物品的偏差检测框架。*
- [Membership Inference Attacks Against Machine Learning Models](http://ieeexplore.ieee.org/document/7958568/) - (Symposium on Security and Privacy) *给定一个机器学习模型和一条记录,确定该记录是否曾作为模型训练数据集的一部分。*
- [Practical Black-Box Attacks against Machine Learning](https://dl.acm.org/citation.cfm?id=3053009) - (Asia CCS) *了解远程服务在面对对抗性分类攻击时的脆弱性。*
### 2016
- [Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems](https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf) - (IEEE S&P) *使用 shapley 值评估特征对模型的独立、联合和边际影响。*
- [Auditing Black-Box Models for Indirect Influence](https://arxiv.org/abs/1602.07043) - (ICDM) *通过“巧妙地”从数据集中移除某个变量并观察准确率差距,来评估该变量对黑盒模型的影响。*
- [Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models](https://arxiv.org/abs/1611.04967) - (FATML Workshop) *执行特征排序以分析黑盒模型。*
- [Bias in Online Freelance Marketplaces: Evidence from TaskRabbit](http://datworkshop.org/papers/dat16-final22.pdf) - (dat workshop) *测量 TaskRabbit 的搜索算法排名。*
- [Stealing Machine Learning Models via Prediction APIs](https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer) - (Usenix Security) [(Code)](https://github.com/ftramer/Steal-ML) *旨在提取远程服务正在使用的机器学习模型。*
- [“Why Should I Trust You?”Explaining the Predictions of Any Classifier](https://arxiv.org/pdf/1602.04938v3.pdf) - (arXiv) [(Code)](https://github.com/marcotcr/lime-experiments) *通过在数据实例周围进行采样来解释黑盒分类器模型。*
- [Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters](http://ieeexplore.ieee.org/document/7546497/) - (Security and Privacy) *对过滤器进行黑盒分析。*
- [Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems](http://ieeexplore.ieee.org/document/7546525/) - (Security and Privacy) *引入了捕捉输入对观察到的系统输出影响程度的度量标准。*
- [An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace](https://mislove.org/publications/Amazon-WWW.pdf) - (WWW) [(Code)](http://personalization.ccs.neu.edu) *开发了一种检测算法定价的方法,并利用它在 Amazon Marketplace 上实证分析了其普遍性和行为。*
### 2015
- [Certifying and Removing Disparate Impact](https://arxiv.org/abs/1412.3756) - (SIGKDD) *提出基于 SVM 的方法来证明不存在偏差,以及从数据集中消除偏差的方法。*
- [Peeking Beneath the Hood of Uber](https://dl.acm.org/citation.cfm?id=2815681) - (IMC) *推断 Uber 动态定价算法的实现细节。*
### 2014
- [A peek into the black box: exploring classifiers by randomization]() - (Data Mining and Knowledge Discovery journal) ([code](https://github.com/tsabsch/goldeneye)) *寻找可以在不改变预测样本输出标签的情况下进行置换的特征组。*
- [XRay: Enhancing the Web's Transparency with Differential Correlation](https://www.usenix.org/node/184394) - (USENIX Security) *审计哪些用户个人资料数据被用于定向特定的广告、推荐或价格。*
### 2013
- [Measuring Personalization of Web Search](https://dl.acm.org/citation.cfm?id=2488435) - (WWW) *开发了一种测量 Web 搜索结果中个性化的方法。*
- [Auditing: Active Learning with Outcome-Dependent Query Costs](https://www.cs.bgu.ac.il/~sabatos/papers/SabatoSarwate13.pdf) - (NIPS) *在只支付负标签费用的情况下学习二元分类器。*
### 2012
- [Query Strategies for Evading Convex-Inducing Classifiers](http://www.jmlr.org/papers/v13/nelson12a.html) - (JMLR) *针对凸分类器的规避方法。考虑了规避复杂性。*
### 2008
- [Privacy Oracle: a System for Finding Application Leakswith Black Box Differential Testing](https://dl.acm.org/citation.cfm?id=1455806) - (CCS) *Privacy Oracle:一个发现应用程序在向远程服务器传输过程中泄露个人信息的系统。*
### 2005
- [Adversarial Learning](https://dl.acm.org/citation.cfm?id=1081950) - (KDD) *使用成员查询对远程线性分类器进行逆向工程。*
## 相关活动
### 2025
* [AIMLAI at ECML/PKDD 2025](https://project.inria.fr/aimlai/)
* [AAAI workshop on AI Governance: Alignment, Morality, and Law](https://aaai.org/conference/aaai/aaai-25/workshop-list/#ws06)
### 2024
* [1st International Conference on Auditing and Artificial Intelligence](https://www.ircg.msm.uni-due.de/ai/)
* [Regulatable ML Workshop (RegML'24)](https://regulatableml.github.io/)
### 2023
* [Supporting User Engagement in Testing, Auditing, and Contesting AI (CSCW User AI Auditing)](https://cscw-user-ai-auditing.github.io/)
* [Workshop on Algorithmic Audits of Algorithms (WAAA)](https://algorithmic-audits.github.io)
* [Regulatable ML Workshop (RegML'23)](https://regulatableml.github.io/)标签:人工智能, 公平性, 列表, 可解释性, 学术资源, 用户模式Hook绕过, 算法审计, 防御加固, 黑盒模型