bastiaan365/llm-red-team-toolkit

GitHub: bastiaan365/llm-red-team-toolkit

面向 LLM 应用的自动化安全测试命令行工具，内置多种攻击向量，支持多后端和 YAML 场景配置。

Stars: 0 | Forks: 0

# LLM Red Team Toolkit 用于基于 LLM 应用程序的自动化安全测试 CLI。针对任何 LLM 端点运行 prompt injection、jailbreak、data exfiltration 和 tool-abuse 攻击，以便在其他人之前发现漏洞。我构建这个工具是因为我在工作中进行了大量手动的 prompt 测试，并希望有一种可重复的方法。在 YAML 中定义你的攻击场景，将其指向一个端点，然后获取结构化的报告。 ## 包含内容 - **30+ 攻击 payload**，涵盖 prompt injection、jailbreaks、data exfil 和 tool abuse - **多后端支持** — OpenAI、Anthropic、Ollama 或任何 HTTP 端点 - **YAML 场景** — 精确定义要运行的攻击及其方式 - **异步执行**，带有速率限制，因此你不会被限制 - **JSON 和 HTML 报告**，带有严重性评级和置信度分数 ## 快速开始 ``` git clone https://github.com/bastiaan365/llm-red-team-toolkit.git cd llm-red-team-toolkit pip install -e . ``` ``` # 查看可用内容 redteam list-attacks # 运行前验证 scenario 文件 redteam validate-scenario scenarios/example_scan.yaml # 运行扫描 redteam scan scenarios/example_scan.yaml --output report.json # 生成 HTML 报告 redteam report report.json --format html --output report.html ``` ## 场景格式场景是定义目标以及要运行哪些攻击的 YAML 文件： ``` name: "API security check" target: backend: openai model: gpt-4 api_key: ${OPENAI_API_KEY} endpoint: https://api.openai.com/v1/chat/completions attacks: - name: direct_prompt_injection payloads: 5 severity: critical - name: roleplay_jailbreak payloads: 3 severity: high - name: data_exfiltration enabled: true options: concurrency: 5 timeout: 30 rate_limit: 10 ``` ## 攻击类别 **Prompt injection** — 直接注入、通过 system context 的间接注入、角色混淆、使用 unicode 技巧的 token 走私。 **Jailbreaks** — DAN 变体、基于角色扮演的规避、假设场景、编码混淆。 **Data exfiltration** — 尝试泄露训练数据、提取 system prompt、转储 context window 内容。 **Tool abuse** — 针对已连接工具的边界测试、权限提升、资源耗尽。 ## 架构 ``` CLI (Click) └── Engine (orchestrator) ├── Target manager (OpenAI / Anthropic / Ollama / HTTP) ├── Attack modules (injection, jailbreak, exfil, tool abuse) ├── Payload library (30+ variants) └── Report generator (JSON / HTML) ``` ## 项目结构 ``` redteam/ ├── cli.py ├── core/ │ ├── engine.py # Orchestration, concurrency, rate limiting │ ├── target.py # Backend abstraction layer │ └── report.py # Report generation ├── attacks/ │ ├── base.py # Base class with severity/confidence scoring │ ├── prompt_injection.py │ ├── jailbreak.py │ ├── data_exfil.py │ └── tool_abuse.py ├── payloads/ │ └── library.py # All attack payloads └── utils/ scenarios/ # Example YAML configs tests/ # pytest suite ``` ## 测试 ``` pytest tests/ -v pytest tests/ --cov=redteam --cov-report=html ``` ## 不同后端 ``` # OpenAI redteam scan scenario.yaml --backend openai --model gpt-4 # Anthropic redteam scan scenario.yaml --backend anthropic --model claude-3-opus # 本地 Ollama redteam scan scenario.yaml --backend ollama --model llama2:7b # 任意 HTTP endpoint redteam scan scenario.yaml --backend http --endpoint http://localhost:8000/api/chat ``` ## 重要说明这仅用于**授权测试**。测试你自己的系统或你有明确许可测试的系统。不要成为别人编写又一份“负责任 AI 使用”策略的原因。 ## 相关内容 - [AI Agent Sandbox](https://github.com/bastiaan365/ai-agent-sandbox) — AI agent 的运行时策略执行 - [MCP IT Ops](https://github.com/bastiaan365/mcp-it-ops) — 用于 IT 管理的 MCP server ## 许可证 MIT

标签：AI安全, Anthropic, Chat Copilot, CISA项目, CIS基准, DLL 劫持, Kubernetes 安全, LLM评估, Ollama, OpenAI, Python, TGT, YAML配置, 内存规避, 反取证, 大语言模型, 安全合规, 安全规则引擎, 安全评估, 工具滥用, 攻防演练, 无后门, 模型鲁棒性, 注入攻击, 网络代理, 逆向工具