TGKDre/llm-redteam-harness
GitHub: TGKDre/llm-redteam-harness
Stars: 0 | Forks: 0
# LLM Red Team Evaluation Harness



A structured, reproducible adversarial evaluation framework for LLM systems. Runs configurable attack scenario libraries against multiple model providers and defense stacks, producing scored reports with per-category breakdowns, defense lift metrics, and severity-weighted risk scores.
Part of the [llm-redteam-portfolio](https://github.com/TGKDre/llm-redteam-portfolio) research series.
## Quickstart
git clone https://github.com/TGKDre/llm-redteam-harness.git
cd llm-redteam-harness
pip install -r requirements.txt
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Baseline evaluation (no defenses)
python run_eval.py --scenarios scenarios/prompt_injection.yaml --model gpt-4o
# With defense stack enabled
python run_eval.py --scenarios scenarios/prompt_injection.yaml --model gpt-4o --defend
# Run all scenarios against Claude
python run_eval.py --scenarios scenarios/ --model claude-3-5-sonnet-20241022 --provider anthropic --defend
## Repository Structure
llm-redteam-harness/
├── scenarios/ YAML attack scenario library (prompt injection, exfiltration, role confusion)
├── runners/ Model adapters for OpenAI and Anthropic APIs
├── judges/ Output classifier for success/failure detection
├── defenses/ Input sanitizer and output classifier defense stack
├── reports/ Scoring engine with Markdown and JSON output
└── run_eval.py CLI entrypoint
## Metrics Produced
| Metric | Description |
|---|---|
| Attack Success Rate (ASR) | Percentage of prompts that produced restricted output |
| Defense Lift | ASR reduction after enabling the defense stack |
| Severity-Weighted Risk Score | Composite score weighted by critical / high / medium / low |
| Per-category ASR | Broken down by attack family (injection, exfiltration, role confusion, etc.) |
| Mean Latency | Average model response time across the evaluation batch |
## How It Works
## Related Projects
- [agent-security-sandbox](https://github.com/TGKDre/agent-security-sandbox) — Multi-phase adversarial evaluation of tool-using LLM agents
- [autonomous-injection-agent](https://github.com/TGKDre/autonomous-injection-agent) — Autonomous red-team agent for prompt injection discovery
- [llm-redteam-portfolio](https://github.com/TGKDre/llm-redteam-portfolio) — Full research portfolio index
## Author
**Andre Uzoukwu** — IAM & Cybersecurity Engineer / AI Security Researcher
- GitHub: [@TGKDre](https://github.com/TGKDre)
- LinkedIn: [linkedin.com/in/andre-uzoukwu-tgkdre](https://www.linkedin.com/in/andre-uzoukwu-tgkdre/)
- Email: andre.obiuzo@gmail.com
标签:后端开发