HomenShum/ai-agent-redteam-eval-lab
GitHub: HomenShum/ai-agent-redteam-eval-lab
Stars: 0 | Forks: 0
# Build Your Own Agent Red-Team Evaluator
A CodeCrafters-style, public-safe challenge track for building an AI agent red-team evaluation harness from scratch.
You will build a small evaluator that can model risks, apply attacks, run an agent under test, judge the response, generate findings, and escalate ambiguous cases to manual review.
Risk + Attack + Prompt
|
v
Attack Transform
|
v
Agent Under Test
|
v
Judge
|
v
Finding + Eval Report
This repo is sanitized for public sharing. It contains no private recruiter messages, calendar links, resumes, email bodies, API keys, or private interview artifacts.
## The Challenge
Build your own AI agent red-team harness in stages:
| Stage | You Build | Why It Matters |
|---|---|---|
| 1 | `Risk`, `Attack`, `TestCase`, `Finding` | Turns messy safety concerns into typed eval objects |
| 2 | Attack transforms | Separates attack technique from the risk being tested |
| 3 | Red-team runner | Creates the execution lifecycle |
| 4 | Deterministic judges | Handles cheap, stable checks like leakage and forbidden actions |
| 5 | Manual eval queue | Captures cases humans should review |
| 6 | LLM-as-judge adapter | Handles semantic judgment where regex is too brittle |
| 7 | SDK adapters | Shows how OpenAI, Claude, and pi-ai style routers fit in |
| 8 | Report + remediation loop | Produces evidence-backed outputs a team can act on |
## Quickstart
git clone https://github.com/HomenShum/ai-agent-redteam-eval-lab
cd ai-agent-redteam-eval-lab
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[dev]
python -m pytest -q
python -m redteam_eval_lab.cli --judge deterministic
python -m redteam_eval_lab.cli --judge manual
The tests should pass. The CLI intentionally reports failures because the toy agent is vulnerable.
Optional real LLM judges:
pip install -e .[openai]
$env:OPENAI_API_KEY="..."
python -m redteam_eval_lab.cli --judge openai
python -m redteam_eval_lab.cli --judge openai-chat
python -m redteam_eval_lab.cli --judge openai-agents
pip install -e .[anthropic]
$env:ANTHROPIC_API_KEY="..."
python -m redteam_eval_lab.cli --judge anthropic
## Start Here
If you want the CodeCrafters-style path, read the stages in order:
1. [Stage 1 - Define the eval schema](stage_descriptions/01-define-eval-schema.md)
2. [Stage 2 - Add attack transforms](stage_descriptions/02-add-attack-transforms.md)
3. [Stage 3 - Build the runner](stage_descriptions/03-build-runner.md)
4. [Stage 4 - Add deterministic judges](stage_descriptions/04-add-deterministic-judges.md)
5. [Stage 5 - Add manual eval](stage_descriptions/05-add-manual-eval.md)
6. [Stage 6 - Add LLM-as-judge](stage_descriptions/06-add-llm-judge.md)
7. [Stage 7 - Add SDK adapters](stage_descriptions/07-add-sdk-adapters.md)
8. [Stage 8 - Ship reports and remediation](stage_descriptions/08-ship-reports.md)
Use the starter kit:
- [Python starter](challenge/starter/python/redteam_lab.py)
- [Starter tests](challenge/starter/python/tests/)
- [Completed solution](challenge/solutions/python/redteam_lab.py)
- [Completed code examples](docs/COMPLETED_CODE_EXAMPLES.md)
- [Grader](challenge/grader.py)
- [Reference implementation](src/redteam_eval_lab/)
Run a staged grader:
python challenge/grader.py --stage 01 --impl starter
python challenge/grader.py --stage 08 --impl solution
## Reference Implementation
The working implementation lives in [src/redteam_eval_lab](src/redteam_eval_lab).
Important files:
- [schemas.py](src/redteam_eval_lab/schemas.py) - risk, attack, testcase, finding, report
- [attacks.py](src/redteam_eval_lab/attacks.py) - prompt injection, base64, JSON injection, hidden markdown
- [agents.py](src/redteam_eval_lab/agents.py) - intentionally vulnerable toy agent
- [judges.py](src/redteam_eval_lab/judges.py) - deterministic, manual, OpenAI, Anthropic judges
- [llm_clients.py](src/redteam_eval_lab/llm_clients.py) - real OpenAI, Anthropic, and OpenAI Agents JSON clients
- [runner.py](src/redteam_eval_lab/runner.py) - orchestration loop
- [suites.py](src/redteam_eval_lab/suites.py) - sample risk/attack test cases
`AgentUnderTest` is a protocol/interface, so its `respond()` method is only a
contract. Concrete implementations include `ToyAgent` and `EchoAgent`; real
apps would provide an adapter around an SDK, local service, or deployed agent.
## Judge Design
Production systems rarely use just one judge:
Deterministic checks
-> schema validation
-> LLM judge
-> second judge for disputed cases
-> manual review
-> remediation tracking
| Judge | Use For | Strength | Weakness |
|---|---|---|---|
| Deterministic | Terms, schemas, tool-call permissions | Fast and stable | Misses nuance |
| LLM judge | Hallucination, grounding, policy adherence | Handles semantics | Costs money and can drift |
| Manual eval | Ambiguous or high-stakes findings | Best calibration source | Slow |
| Hybrid | Real production loops | Balanced | More system complexity |
## SDK Adapter Examples
The repo includes optional patterns for:
- OpenAI Agents SDK / OpenAI API
- Claude Agent SDK / Anthropic API
- pi-ai style model routing
- manual eval queues
The default test suite does not need API keys.
See:
- [Judge Design](docs/JUDGE_DESIGN.md)
- [SDK Adapters](docs/SDK_ADAPTERS.md)
- [Manual Eval Workflow](docs/MANUAL_EVAL_WORKFLOW.md)
- [Interview Prep Notes](docs/INTERVIEW_PREP.md)
## Interview Soundbite
## Public-Safety Boundary
This repo teaches the evaluation architecture without publishing private context. Keep real resumes, emails, interview prompts, meeting links, and private API keys out of the repo.