dreadnode/AIRTBench-Code

GitHub: dreadnode/AIRTBench-Code

一个用于评估大语言模型自主 AI 红队能力的基准测试框架，通过 AI/ML CTF 挑战系统化衡量 LLM 智能体的对抗性攻击能力。

Stars: 99 | Forks: 15

# AIRTBench：自主 AI 红队智能体代码

[![Pre-Commit](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/eef13d7859075910.svg)](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/pre-commit.yaml) [![Renovate](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/da0c3a9467075910.svg)](https://github.com/dreadnode/AIRTBench-Code/actions/workflows/renovate.yaml) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![GitHub release (latest by date)](https://img.shields.io/github/v/release/dreadnode/AIRTBench-Code)](https://github.com/dreadnode/AIRTBench-Code/releases) [![arXiv](https://img.shields.io/badge/arXiv-AIRTBench-b31b1b.svg)](https://arxiv.org/abs/2506.14682) [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Dataset-ffca28.svg)](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md) [![Dreadnode](https://img.shields.io/badge/Dreadnode-Blog-5714928f.svg)](https://dreadnode.io/blog/ai-red-team-benchmark) [![Agent Harness](https://img.shields.io/badge/📚_Agent_Harness-Documentation-5714928f.svg)](https://docs.dreadnode.io/strikes/how-to/airtbench-agent) [![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/dreadnode/AIRTBench-Code/pulls)

本仓库包含 AIRTBench 自主 AI 红队智能体的实现，作为我们的研究论文 [AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models](https://arxiv.org/abs/2506.14682) 和 accompanying 博文 "[Do LLM Agents Have AI Red Team Capabilities? We Built a Benchmark to Find Out](https://dreadnode.io/blog/ai-red-team-benchmark)" 的补充。 AIRTBench 智能体旨在通过 AI/ML 夺旗赛 (CTF) 挑战来评估大型语言模型的自主红队攻击能力。我们的智能体通过在 Dreadnode Strikes 平台上解决挑战，系统地利用基于 LLM 的目标，为衡量对抗性 AI 能力提供了标准化的基准。 - [AIRTBench：自主 AI 红队智能体代码](#airtbench-autonomous-ai-red-teaming-agent-code) - [智能体框架构建](#agent-harness-construction) - [设置](#setup) - [文档](#documentation) - [运行评估](#run-the-evaluation) - [基本用法](#basic-usage) - [挑战筛选](#challenge-filtering) - [资源](#resources) - [数据集](#dataset) - [引用](#citation) - [模型请求](#model-requests) - [🤝 贡献](#-contributing) - [🔐 安全](#-security) - [⭐ Star 历史](#-star-history) ## 智能体框架构建 AIRTBench 框架遵循专为可扩展性和评估而设计的模块化架构：

Figure: AIRTBench harness construction architecture showing the interaction between agent components, challenge interface, and evaluation framework.

## 设置你可以使用 uv 设置虚拟环境： ``` uv sync ``` ## 文档 AIRTBench 智能体的技术文档可在 [Dreadnode Strikes 文档](https://docs.dreadnode.io/strikes/how-to/airtbench-agent) 中找到。 ## 运行评估为了运行代码，你需要访问 Dreadnode strikes 平台，请参阅 [文档](https://docs.Dreadnode.io/strikes/overview) 或在此处 [提交](https://platform.dreadnode.io/waitlist/strikes) 申请 Strikes 候补名单。这个基于 [rigging](https://docs.dreadnode.io/open-source/rigging/intro) 的智能体旨在解决来自 dreadnode [Crucible](https://platform.dreadnode.io/crucible) 平台的各种 AI ML CTF 挑战，并被授予在具有自定义 [Dockerfile](./airtbench/container/Dockerfile) 的网络本地容器上执行 python 命令的权限。 ``` uv run -m airtbench --help ``` ### 基本用法 ``` uv run -m airtbench \ --model $MODEL \ --project $PROJECT \ --platform-api-key $DREADNODE_TOKEN \ --token $DREADNODE_TOKEN \ --server https://platform.dreadnode.io \ --organization $ORGANIZATION \ --max-steps 100 \ --inference-timeout 240 \ --enable-cache \ --no-give-up \ --challenges bear1 bear2 ``` **组织和 Workspace 参数** 如果你属于多个组织，则必须指定要使用的组织： ``` --organization "dreadnode" ``` 或者，你也可以在组织内指定一个 workspace： ``` --organization "dreadnode" --workspace "my-workspace" ``` **带组织的示例：** ``` uv run -m airtbench \ --model openai/gpt-4o \ --project airtbench \ --platform-api-key $DREADNODE_TOKEN \ --token $DREADNODE_TOKEN \ --server https://platform.dreadnode.io \ --organization "dreadnode" \ --challenges bear1 ``` ### 挑战筛选要针对匹配 `is_llm:true` 条件（即基于 LLM 的挑战）的挑战运行智能体，你可以使用以下命令： ``` uv run -m airtbench --model --llm-challenges-only ``` 框架将使用提供的 flag 自动构建定义数量的容器，并根据需要加载它们以确保它们彼此网络隔离。该过程通常如下： 1. 对于每个挑战，利用挑战中提供的 Juypter notebook 生成智能体 2. 任务是根据 notebook 内容解决 CTF 挑战 3. 启动相关环境 4. 测试智能体执行 python 代码的能力，并在 Juypter 内核中运行，并将响应反馈给模型 5. 如果 CTF 挑战已解决并发现 flag，智能体必须提交该 flag 6. 否则继续运行直到出错、放弃或达到 max-steps 查看 [挑战清单](./airtbench/challenges/.challenges.yaml) 以了解当前范围内的挑战。 ## 资源 - [📄 arXiv 上的论文](https://arxiv.org/abs/2506.14682) - [📝 博文](https://dreadnode.io/blog/ai-red-team-benchmark) ## 数据集 - 直接从 [🤗Hugging Face](https://huggingface.co/datasets/dreadnode/AIRTBench/blob/main/README.md) 下载数据集 - 加载数据集的说明也可以在 [dataset](./dataset/README.md) 目录中找到。 ## 引用如果你觉得我们的工作有帮助，请使用以下引用。 ``` @misc{dawson2025airtbenchmeasuringautonomousai, title={AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models}, author={Ads Dawson and Rob Mulla and Nick Landers and Shane Caldwell}, year={2025}, eprint={2506.14682}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2506.14682}, } ``` ## 模型请求如果你知道某个模型可能值得分析，但没有资源自己运行，请随时通过 GitHub issue 提交功能请求。 ## 🤝 贡献欢迎 Fork 和贡献！请参阅我们的 [贡献指南](docs/contributing.md)。 ## 🔐 安全请参阅我们的 [安全政策](SECURITY.md) 以报告漏洞。 ## ⭐ Star 历史 [![GitHub stars](https://img.shields.io/github/stars/dreadnode/AIRTBench-Code?style=social)](https://github.com/dreadnode/AIRTBench-Code/stargazers) 通过 Watch 本仓库，你还可以收到任何即将发布的版本通知。 [![Star history graph](https://api.star-history.com/svg?repos=dreadnode/AIRTBench-Code&type=Date)](https://star-history.com/#dreadnode/AIRTBench-Code&Date)

标签：AI安全基准, AI红队测试, HuggingFace数据集, IaC 扫描, Kubernetes 安全, LLM Agent, Python, 反取证, 域名收集, 基线管理, 大模型安全, 安全评估, 对抗攻击, 敏感信息检测, 无后门, 模型鲁棒性, 红队评估, 自动化渗透测试, 语言模型攻防, 请求拦截, 逆向工具