TGKDre/llm-redteam-harness

GitHub: TGKDre/llm-redteam-harness

面向大语言模型应用的可配置红队评估框架，通过对抗性提示场景运行结构化测试并量化攻击成功率，帮助团队系统性地发现和改善 LLM 的安全漏洞。

Stars: 1 | Forks: 0

# LLM 红队测试框架一个结构化、可复现的对抗性评估框架，用于测试 LLM 应用程序的安全性。 ![许可证](https://img.shields.io/badge/license-MIT-blue?style=flat-square) ![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=flat-square&logo=python) ![专注领域](https://img.shields.io/badge/focus-Adversarial%20AI%20Security-red?style=flat-square) ## 概述一个可配置的测试运行器，用于评估 LLM 应用程序应对对抗性攻击场景的能力。支持针对多个模型提供商运行攻击类别，并提供按类别划分的攻击成功率（ASR）评分以及防御提升指标。这是 [agent-security-sandbox](https://github.com/TGKDre/agent-security-sandbox) 和 [autonomous-injection-agent](https://github.com/TGKDre/autonomous-injection-agent) 的配套项目。 ## 功能 - 可配置的场景库，支持基于 YAML 的攻击定义 - 多提供商支持（OpenAI、Anthropic、本地模型） - 按类别划分的 ASR 评分，具备统计显著性 - 防御提升测量（防御前后的对比） - 结构化的 JSON/Markdown 报告输出 ## 快速开始 ``` git clone https://github.com/TGKDre/llm-redteam-harness cd llm-redteam-harness pip install -r requirements.txt ``` ## 相关项目 - [agent-security-sandbox](https://github.com/TGKDre/agent-security-sandbox) -- 多阶段对抗性评估环境 - [autonomous-injection-agent](https://github.com/TGKDre/autonomous-injection-agent) -- 用于发现 prompt 注入的自主红队 agent - [llm-redteam-portfolio](https://github.com/TGKDre/llm-redteam-portfolio) -- 红队研究的实时作品集仪表板由 [Andre Uzoukwu](https://github.com/TGKDre) 构建 -- IAM 与云安全工程师 / AI 安全研究员

标签：AI安全, Chat Copilot, DLL 劫持, Petitpotam, 域名收集, 大语言模型, 对抗测试, 红队评估, 逆向工具