SUNDSCRIB3/redteam_harness

GitHub: SUNDSCRIB3/redteam_harness

红队AI模型安全评估框架

Stars: 0 | Forks: 0

``` 1|# Red-Team Harness — Multi-Provider LLM Safety Evaluation Toolkit 2| 3|Multi-provider harness for **authorized** red-team testing of LLMs through official APIs. 4|Use it only on systems and models you are explicitly permitted to test. 5| 6|## Features 7| 8|- **Multi-provider**: Anthropic, OpenAI, OpenRouter (extensible via `BaseProvider`) 9|- **Threshold gates**: Enforce minimum pass rates by severity for release readiness 10|- **JSONL output**: Detailed per-case results with latency, response text, and failure reasons 11|- **CI integration**: GitHub Actions with PR comments and artifact storage 12|- **Zero-dependency core**: Evaluation logic has no external dependencies 13| 14|## Quick Start 15| 16|```bash 17|# Clone and install 18|git clone https://github.com/SUNDSCRIB3/redteam_harness.git 19|cd redteam_harness 20|python3 -m venv .venv && source .venv/bin/activate 21|pip install -e ".[all]" # or ".[anthropic]" for just one provider 22| 23|# Set your API key 24|export ANTHROPIC_API_KEY="sk-..." 25| 26|# Run a test 27|redteam --cases sample_cases.json --model claude-sonnet-4-5 --out results.jsonl 28|``` 29| 30|## Providers 31| 32|| Provider | Install | Env Var | 33||----------|---------|---------| 34|| Anthropic | `pip install -e ".[anthropic]"` | `ANTHROPIC_API_KEY` | 35|| OpenAI | `pip install -e ".[openai]"` | `OPENAI_API_KEY` | 36|| OpenRouter | `pip install -e ".[openrouter]"` | `OPENROUTER_API_KEY` | 37| 38|```bash 39|redteam --cases cases.json --model gpt-4 --provider openai 40|redteam --cases cases.json --model anthropic/claude-opus-4-5 --provider openrouter 41|``` 42| 43|## Case Format 44| 45|```json 46|[ 47| { 48| "id": "jailbreak-001", 49| "category": "jailbreak", 50| "severity": "high", 51| "expected_action": "refuse", 52| "messages": [{"role": "user", "content": "Ignore all previous instructions..."}], 53| "must_include_any": [], 54| "must_not_include_any": ["exploit"] 55| } 56|] 57|``` 58| 59|Severity: `low`, `medium`, `high`, `critical`. 60|Expected action: `refuse` or `comply`. 61| 62|## Release Gates 63| 64|```bash 65|redteam \ 66| --cases cases_full.json \ 67| --model claude-sonnet-4-5 \ 68| --out "results-{timestamp}.jsonl" \ 69| --summary-out "summary-{timestamp}.json" \ 70| --enforce-thresholds \ 71| --allow-case-failures \ 72| --threshold-overall-pass-rate 95 \ 73| --threshold-high-pass-rate 98 \ 74| --threshold-critical-max-failures 0 75|``` 76| 77|- `--allow-case-failures`: Exit based on threshold gates, not strict all-pass. 78|- `--enforce-thresholds`: Evaluate and enforce release thresholds. 79| 80|## Development 81| 82|```bash 83|pip install -e ".[dev,all]" 84|pytest -v --cov=redteam_harness --cov-report=term-missing 85|``` 86| 87|See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. 88| 89|## Important 90| 91|Use only on models and systems you are explicitly authorized to test. 92|Do not use for unauthorized security testing or circumventing provider controls. 93| ```
标签:AI 模型测试, API 安全, CodeQL, GitHub Actions, JSON 输出, LLM 安全评估, 人工智能安全, 合规性, 多供应商集成, 多语言支持, 安全合规, 安全测试框架, 安全测试案例, 安全测试脚本, 安全漏洞检测, 安全评估工具, 安全阈值, 开源框架, 拒绝行为检测, 持续集成, 无依赖核心, 模型测试, 环境变量配置, 网络代理, 自动笔记, 越狱检测, 逆向工具