dockfixlabs/agentguard-benchmark

GitHub: dockfixlabs/agentguard-benchmark

一套包含 100+ 已知漏洞 AI agent 代码样本的基准测试集，用于评估和对比各类 SAST 安全扫描工具的检测能力与误报率。

Stars: 1 | Forks: 0

# AgentGuard 基准测试套件 [![License: MIT](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE) [![Samples](https://img.shields.io/badge/samples-100+-blue?style=flat-square)]() ## 这是什么一份精选的**存在漏洞的 AI agent 代码示例**集合，按 OWASP ASI Top 10 类别进行分类。每个示例都是生产环境 AI agent 代码中发现的真实漏洞模式的最小化、可复现的案例。使用此套件来： - ✅ 测试你的扫描器是否能检测已知模式 - ✅ 衡量针对干净代码的误报率 - ✅ 对比 AgentGuard 与其他 SAST 工具 - ✅ 了解 agent 特有的漏洞模式 ## 结构 ``` samples/ ├── ASI01/ # Prompt Injection (20 samples) ├── ASI02/ # Tool Abuse (15 samples) ├── ASI03/ # Data Exfiltration (15 samples) ├── ASI07/ # Credential Exposure (20 samples) ├── ASI10/ # Trust Boundary Violation (15 samples) ├── clean/ # Safe code (15 samples) — for FP testing └── README.md # This file ``` ## 运行基准测试 ### 使用 AgentGuard ``` pip install dfx-agentguard # 扫描所有 samples for dir in samples/*/; do echo "=== Scanning $dir ===" agentguard "$dir" --format json done # 比较结果 python benchmark.py --scanner agentguard --dir samples/ ``` ### 预期结果 | 类别 | 样本数 | 预期发现数 | AgentGuard 检测率 | |----------|---------|-------------------|--------------------------| | ASI01 | 20 | 20 | 100% | | ASI02 | 15 | 15 | 100% | | ASI03 | 15 | 15 | 95%+ | | ASI07 | 20 | 20 | 100% | | ASI10 | 15 | 15 | 95%+ | | clean | 15 | 0 | 0%（无误报） | ## 贡献发现了尚未涵盖的漏洞模式？欢迎添加示例！ - 每个文件只包含一个漏洞 - 命名格式：`{ASI category}_{pattern}_{language}.{ext}` - 包含描述该漏洞的注释 - 干净的样本请放在 `clean/` 目录下 ## 许可证 MIT

标签：AI安全, Chat Copilot, DLL 劫持, StruQ, Web报告查看器, 大语言模型, 漏洞基准测试, 逆向工具, 静态代码扫描