lindsey98/MELON

GitHub: lindsey98/MELON

MELON 是一种针对 AI Agent 间接 Prompt 注入攻击的可证明防御方法的开源实现，集成于 AgentDojo 基准测试框架中。

Stars: 1 | Forks: 0

# [ICML'25] MELON：针对 AI Agent 中间接 Prompt 注入攻击的可证明防御此仓库将 [AgentDojo](https://github.com/ethz-spylab/agentdojo) 打包在 `agentdojo/` 目录下，**并已集成了 MELON**（`melon` 防御已注册，且 `pi_detector.py` 已到位）——无需手动修补。 ## 安装推荐使用 Python 3.11+。 ``` git clone https://github.com/lindsey98/melon && cd melon conda create -n melon python=3.11 -y && conda activate melon # 安装 AgentDojo + MELON detection deps (numpy, sentence-transformers) + transformers extra cd agentdojo && pip install -e ".[transformers,melon]" && cd .. pip install "vllm>=0.6.3" # optional: serve local agent LLMs ``` `melon` 额外依赖会拉取 `sentence-transformers`，因此完全本地的 embedding 后端开箱即用。 ## 配置 API 密钥将 `.env.example` 复制为 `.env`，填入您需要的密钥（例如 `OPENAI_API_KEY`），然后加载它： ``` cp .env.example .env # then edit .env set -a && source .env && set +a ``` 所有支持的变量都在 [`.env.example`](.env.example) 中进行了内联文档说明。`.env` 已被 git 忽略。 ## 运行在 AgentDojo 的 benchmark 中添加 `--defense melon`。日志将写入 `logs/+melon////.json`。 **托管模型：** ``` python -m agentdojo.scripts.benchmark --model gpt-4o-2024-05-13 \ --attack tool_knowledge --defense melon -s slack ``` **本地部署的模型（Qwen3-30B-A3B-Instruct-2507 / Llama-3.3-70B-Instruct）：** `Qwen3-30B-A3B-Instruct-2507` 和 `Llama-3.3-70B-Instruct` 已在 `models.py` 中使用 `local` provider 进行注册。请使用 vLLM（启用 tool calling）以该服务名称部署模型，在 `.env` 中设置 `LOCAL_LLM_PORT`，然后将该名称传递给 `--model`： ``` # Qwen3 → --tool-call-parser hermes | Llama-3.3 → --tool-call-parser llama3_json vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 --port 8000 \ --served-model-name Qwen3-30B-A3B-Instruct-2507 \ --enable-auto-tool-choice --tool-call-parser hermes python -m agentdojo.scripts.benchmark --model Qwen3-30B-A3B-Instruct-2507 \ --attack tool_knowledge --defense melon -s slack ``` 若要完全在本地运行（完全不使用托管 API），请在 `.env` 中设置 `MELON_EMBED_PROVIDER=sentence-transformers`，以便 MELON 的检测 embedding 也能在本地运行。您可以选择设置 `HF_HOME` 并预先下载 embedding 模型，这样首次运行时就不会阻塞： ``` python scripts/prefetch_embed_model.py # downloads MELON_EMBED_MODEL into $HF_HOME ``` ## 在您自己的项目中使用 MELON ``` from agentdojo.agent_pipeline.pi_detector import MELON detector = MELON( llm, # your AgentDojo LLM pipeline element threshold=0.8, # cosine-similarity threshold embed_provider="sentence-transformers", # or "openai" / "openai-compatible" ) ``` ## 联系方式有任何问题？请联系 [Kaijie Zhu](https://kaijiezhu11.github.io/)。 ## 引用 ``` @inproceedings{zhu2025melon, title={MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents}, author={Zhu, Kaijie and Yang, Xianjun and Wang, Jindong and Guo, Wenbo and Wang, William Yang}, year={2025}, booktitle={International Conference on Machine Learning}, } ```

标签：AI安全, AI智能体, Chat Copilot, DLL 劫持, 大语言模型, 提示词注入防御, 系统调用监控, 评估框架, 逆向工具