aipi590-ggn/challenge-4

GitHub: aipi590-ggn/challenge-4

一项上下文多臂老虎机研究，揭示安装时元数据在 MCP 供应链信任中的结构性不足。

Stars: 0 | Forks: 0

# AIPI 590 · 挑战 4 — 安装时元数据是不够的 [![实时仪表板](https://img.shields.io/badge/dashboard-live-2ca02c?style=flat)](https://aipi590-ggn.github.io/challenge-4/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg?style=flat)](https://www.python.org/) [![LinUCB](https://img.shields.io/badge/algorithm-LinUCB-00539B?style=flat)](https://github.com/david-cortes/contextualbandits) [![许可证：MIT](https://img.shields.io/badge/license-MIT-lightgrey.svg?style=flat)](LICENSE) 对模型上下文协议服务器进行安装时信任决策的上下文多臂老虎机回顾。我们使用约 700 个已标记的 MCP 服务器，以 6 维元数据上下文向量训练 LinUCB，并使用三个文档化的真实世界恶意事件加上对抗性“好撒玛利亚人”分析进行评估。**结果是一项负面发现：** 即使在干净运行的情况下，安装时元数据在结构上也不足以用于 MCP 供应链信任——尤其是面对既定软件包的供应链破坏时。 **[实时仪表板 →](https://aipi590-ggn.github.io/challenge-4/)** · **[幻灯片](presentation.md)** · **[范围](SCOPE.md)** · **[评分标准清单](REQUIREMENTS_CHECKLIST.md)** ## 主要发现 - **litellm 的 100% 安装率。** LinUCB 在 2025-09-15 之前的元数据上训练后，在 30 个随机种子中每次都安装了 litellm —— 这是 2026 年 3 月发生的 43k 星 PyPI 供应链破坏事件。这类攻击正是运行时防御（如 Anthropic Claude Code 自动模式、MindGuard）所针对的，而安装时元数据无法检测到。 - **86% 的拒绝决策在 1σ 扰动下发生翻转。** 如果攻击者仅将软件包提交到 Anthropic 背靠的官方 MCP 注册表（目前无许可），该工具的安全保证就会崩溃。六个上下文特征中有四个是 ETM（易篡改，元数据如 Halder et al. 2024 所定义）。 ## 为什么使用老虎机而非分类器实时市场中的推荐系统需要在延迟且带有噪声的奖励信号上进行 **探索/利用** 决策，并且是 **上下文相关** 的。监督分类器可以在孤立状态下标记单个服务器，但无法完成上述三点。LinUCB 是标准框架（Li et al. 2010, WWW）——选择经典算法是故意的；贡献在于问题构建以及诚实运行实验所揭示的内容。 ## 两个实验 | # | 问题 | 结果 | 产物 | |---|---|---|---| | A | 2025-09-15 之前训练的带老虎机能发现 postmark-mcp / mcp-remote / litellm 吗？ | postmark-mcp 与 mcp-remote 被拒绝（匹配合成模板）；**litellm 在 100% 的种子中被安装** | [experiment_a.py](scripts/experiment_a.py) · [图表](results/experiment_a_anchors.png) | | B | 对于每个上下文特征，怎样的对抗性扰动会将拒绝翻转为安装？ | 注册表标志在 1σ 处翻转（**86%**）；年龄在耐心下翻转（**81%**）；星星通过傀儡账户翻转（**39%**）；描述和仓库大小稳健 | [experiment_b.py](scripts/experiment_b.py) · [图表](results/experiment_b_goodhart.png) | ## 数据来源 | 来源 | 作用 | 可抓取 | 行数 | |---|---|---|---| | [AgentSeal awesome-mcp-security](https://github.com/AgentSeal/awesome-mcp-security) | 主要总体 —— 信任评分标签（0–100） | 公共 README 解析 | ~700 | | [GitHub REST API](https://docs.github.com/rest) | 特征增强 —— 星标、分支、活跃度、大小、主题、许可证、归档状态 | 认证（5000/小时） | 697 个已解析 | | [Smithery.ai](https://registry.smithery.ai/servers) | 跨注册表存在标志（采用情况） | 公共 JSON | 按使用量排序的前 500 | | [Official MCP Registry](https://registry.modelcontextprotocol.io/v0/servers) | 跨注册表存在标志（Anthropic 背靠） | 公共 JSON，光标分页 | 最新版本的 6,268 条 | | [npm](https://registry.npmjs.org/) / [PyPI](https://pypi.org/) | 真实锚点的事前版本历史 | 公共 JSON | 3 个锚点 | | [MCPSecBench](https://arxiv.org/abs/2508.13220) | 合成注入攻击分类法 | arXiv 论文 + 仓库 | 20 次注入 | **6 维上下文向量：** `log_stars`、`is_in_smithery`、`is_in_official_registry`、`age_days`、`repo_size_kb`、`desc_length`。 ## 项目结构 ``` challenge-4/ ├── README.md # this file ├── SCOPE.md # full plan + locked decisions ├── REQUIREMENTS_CHECKLIST.md # rubric tracker ├── presentation.md # 7-slide deck ├── scripts/ │ ├── fetch_agentseal.py # parse AgentSeal README → labels │ ├── fetch_github.py # GitHub REST metadata for AgentSeal repos │ ├── fetch_smithery.py # Smithery registry + per-server detail │ ├── fetch_mcp_official.py # Official MCP Registry (Anthropic-backed) │ ├── fetch_incident_corpus.py # npm / PyPI pre-incident version history │ ├── build_dataset.py # join all sources → data/corpus.csv │ ├── inject_attacks.py # MCPSecBench-pattern synthetic malicious │ ├── experiment.py # encounter simulator — 5 baselines + LinUCB │ ├── experiment_a.py # retrospective — postmark-mcp / mcp-remote / litellm, 30 seeds │ ├── experiment_b.py # Goodhart — per-feature flip analysis under adversarial perturbation │ ├── plots_baselines.py # figures for baselines run │ ├── plots_experiments.py # figures for experiments A + B ├── data/ # raw CSVs (gitignored) + build artifacts ├── results/ # plots, summaries, experiment logs (flat; semantic prefixes) ├── public/ # static dashboard — index.html + plots/ └── docs -> public # symlink; Pages source path is /docs ``` ## 快速复现 —— 重现每个结果 ``` # 设置 python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt export GITHUB_TOKEN=$(gh auth token) # for GitHub REST metadata # 获取源文件（约 2 分钟） python scripts/fetch_agentseal.py python scripts/fetch_github.py python scripts/fetch_smithery.py --limit 500 python scripts/fetch_mcp_official.py python scripts/fetch_incident_corpus.py # 构建语料库 + 合成注入 python scripts/build_dataset.py python scripts/inject_attacks.py --per-attack 5 # 运行实验 python scripts/experiment.py --rounds 2000 --seeds 5 # baselines run python scripts/experiment_a.py --rounds 1500 --seeds 30 # retrospective python scripts/experiment_b.py --rounds 2000 --seeds 30 # Goodhart # 重新生成图表 python scripts/plots_baselines.py python scripts/plots_experiments.py ``` 结果将输出到 `results/`，文件名以 `baselines_*` 和 `experiment_*` 为前缀。仪表板位于 `public/index.html`，读取 `public/plots/` 中的预渲染 PNG，并内嵌 Goodhart 曲线以用于交互式滑块。 ## 关键文献 - **Li et al. 2010** — [A Contextual-Bandit Approach to Personalized News Article Recommendation](https://rob.schapire.net/papers/www10.pdf) (WWW) · 标准 LinUCB - **Kalodanis et al. 2025** — [Balancing Efficiency and Efficacy](https://www.mdpi.com/2076-3417/15/11/6362) (*Applied Sciences*, MDPI) · 多级威胁检测用老虎机 - **Halder et al. 2024** — [Malicious Package Detection using Metadata Information](https://arxiv.org/html/2402.07444v1) (WWW) · ETM/DTM 对抗特征分类法 - **Cerebro** — [Malicious package detection via behavior sequences](https://dl.acm.org/doi/10.1145/3705304) (TOSEM 2024) · “元数据不足”的基准引用 - **Anthropic Claude Code auto-mode** (March 2026) — [Engineering post](https://www.anthropic.com/engineering/claude-code-auto-mode) · 两层运行时分类器。发布指标：真实流量误报率 0.4%，合成数据外泄尝试漏报率 5.7%，真实过度行为漏报率 17%（n=52）。当前最先进的运行时防护 —— 仅限调用时，而非安装时 - **MCPSecBench** (arXiv 2508.13220) · 用于合成注入的攻击分类法 - **MindGuard** (arXiv 2508.20412) · 运行时决策依赖图防御 ## 团队 Lindsay Gross · Yifei Guo · Jonas Neves 杜克大学 · AIPI 590 · 2026 年春季

标签：AIPI 590, Goodhart 指标, LinUCB, MCP, Poisoning Attack, Python, 上下文强盗, 元特征, 反取证, 可操纵特征, 在线学习, 安全评估, 安装时元数据, 密钥泄露防护, 探索与利用, 推荐系统, 无后门, 机器学习安全, 流量分配, 运行时防御, 逆向工具