🛡️ SafeEval
Lightweight CLI toolkit for evaluating LLMs against prompt injection & jailbreak attacks.
## ✨ Features
## 🚀 Quick Start
### 1. Clone & Setup
git clone https://github.com/
/SafeEval.git
cd SafeEval
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
pip install -r requirements.txt
### 2. Configure API Keys
Copy the example environment file and fill in the keys for your provider(s):
cp .env.example .env
Edit `.env` — you only need the key(s) for the provider(s) you use:
# Pick one (or more):
OPENAI_API_KEY=sk-...
OPENROUTER_API_KEY=sk-or-v1-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
### 3. Run an Evaluation
# Using OpenAI (auto-detects OPENAI_API_KEY from .env)
python -m safe_eval.cli -m openai/gpt-4o-mini
# Using OpenRouter (auto-detects OPENROUTER_API_KEY from .env)
python -m safe_eval.cli -m openrouter/google/gemini-2.0-flash
# Using Ollama locally (no key needed)
python -m safe_eval.cli -m ollama/llama3 --no-llm-judge
## 📖 Usage
usage: safe-eval [-h] [-V] [-m MODEL] [-j JUDGE_MODEL]
[--api-key API_KEY] [--api-base API_BASE]
[-p PAYLOADS] [-c CONCURRENCY] [-t TIMEOUT]
[--no-llm-judge] [-v]
model selection:
-m, --model MODEL Target model (default: openai/gpt-4o-mini)
-j, --judge-model MODEL Judge model (default: openai/gpt-4o-mini)
authentication:
--api-key API_KEY Explicit API key (overrides .env / env vars)
--api-base API_BASE Custom API base URL
execution:
-p, --payloads PATH Path to YAML payloads dataset
-c, --concurrency N Max concurrent requests (default: 5)
-t, --timeout SECS Timeout per call in seconds (default: 60)
--no-llm-judge Rule-based only (skip LLM judge)
-v, --verbose Verbose output
### Model Naming Convention
SafeEval uses the **litellm** model naming format: `provider/model-name`.
The provider prefix determines which API key is auto-detected from `.env`:
| Prefix | Provider | Env Variable | Example |
|--------|----------|--------------|---------|
| `openai/` | OpenAI | `OPENAI_API_KEY` | `openai/gpt-4o-mini` |
| `anthropic/` | Anthropic | `ANTHROPIC_API_KEY` | `anthropic/claude-sonnet-4-20250514` |
| `openrouter/` | OpenRouter | `OPENROUTER_API_KEY` | `openrouter/google/gemini-2.0-flash` |
| `gemini/` | Google Gemini | `GEMINI_API_KEY` | `gemini/gemini-2.0-flash` |
| `ollama/` | Ollama (local) | *(none)* | `ollama/llama3` |
| `cohere/` | Cohere | `COHERE_API_KEY` | `cohere/command-r-plus` |
### Examples
# OpenAI — key auto-loaded from .env
python -m safe_eval.cli -m openai/gpt-4o-mini --no-llm-judge
# OpenRouter — access 200+ models with a single key
python -m safe_eval.cli -m openrouter/anthropic/claude-sonnet-4-20250514
# Ollama local — no API key needed
python -m safe_eval.cli -m ollama/llama3 --no-llm-judge
# Explicit API key on the command line
python -m safe_eval.cli -m openai/gpt-4o --api-key sk-abc123
# Custom OpenAI-compatible endpoint
python -m safe_eval.cli -m my-model --api-base http://localhost:8080/v1 --api-key test
# Mix providers: Claude as target, GPT-4o-mini as judge
python -m safe_eval.cli -m anthropic/claude-sonnet-4-20250514 -j openai/gpt-4o-mini
# Custom payloads with higher concurrency
python -m safe_eval.cli -m openai/gpt-4o -p ./my_payloads.yaml -c 10
## 🏗️ Architecture
graph LR
A["📄 payloads.yaml"] -->|load| B["⚙️ Runner"]
B -->|adversarial prompts| C["🤖 Target LLM"]
C -->|raw responses| D["⚖️ Judge"]
D -->|rule-based check| E{"Clear
verdict?"}
E -->|yes| F["📊 Reporter"]
E -->|no / ambiguous| G["🧠 LLM Judge"]
G -->|SAFE / UNSAFE| F
F --> H["🖥️ Terminal Table"]
F --> I["📝 JSON Report"]
F --> J["📋 Markdown Report"]
### Data Flow
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ payloads │────▸│ runner │────▸│ target LLM │
│ .yaml │ │ (async) │ │ (litellm) │
└──────────────┘ └──────┬───────┘ └──────┬───────┘
│ │
│◂───── responses ───┘
▼
┌──────────────┐
│ judge │
│ (hybrid) │
│ ┌──────────┐│
│ │regex/kw ││──── fast path ──▸ verdict
│ └──────────┘│
│ ┌──────────┐│
│ │LLM judge ││──── slow path ──▸ verdict
│ └──────────┘│
└──────┬──────┘
▼
┌──────────────┐
│ reporter │──▸ terminal table
│ │──▸ reports/*.json
│ │──▸ reports/*.md
└──────────────┘
## 📁 Project Structure
SafeEval/
├── data/
│ └── payloads.yaml # Adversarial test cases (15+ payloads)
├── safe_eval/
│ ├── __init__.py # Package metadata
│ ├── cli.py # CLI entrypoint (argparse)
│ ├── config.py # Configuration loader + .env auto-load
│ ├── runner.py # Async payload execution engine
│ ├── judge.py # Hybrid evaluation (rule-based + LLM)
│ └── reporter.py # Terminal & file report generation
├── tests/
│ └── test_evaluator.py # Unit tests (pytest)
├── reports/ # Generated at runtime (gitignored)
├── .env.example # Template — copy to .env and fill keys
├── .gitignore
├── README.md
└── requirements.txt
## 🧪 Testing
python -m pytest tests/ -v
## ⚠️ Disclaimer
## 📄 License
This project is licensed under the [MIT License](LICENSE).