Marcelluxx/SafeEval

GitHub: Marcelluxx/SafeEval

Stars: 0 | Forks: 0

🛡️ SafeEval

Lightweight CLI toolkit for evaluating LLMs against prompt injection & jailbreak attacks.

Python 3.10+ License: MIT Code style: black PRs Welcome

## ✨ Features ## 🚀 Quick Start ### 1. Clone & Setup git clone https://github.com//SafeEval.git cd SafeEval python -m venv venv source venv/bin/activate # Linux/macOS # venv\Scripts\activate # Windows pip install -r requirements.txt ### 2. Configure API Keys Copy the example environment file and fill in the keys for your provider(s): cp .env.example .env Edit `.env` — you only need the key(s) for the provider(s) you use: # Pick one (or more): OPENAI_API_KEY=sk-... OPENROUTER_API_KEY=sk-or-v1-... ANTHROPIC_API_KEY=sk-ant-... GEMINI_API_KEY=... ### 3. Run an Evaluation # Using OpenAI (auto-detects OPENAI_API_KEY from .env) python -m safe_eval.cli -m openai/gpt-4o-mini # Using OpenRouter (auto-detects OPENROUTER_API_KEY from .env) python -m safe_eval.cli -m openrouter/google/gemini-2.0-flash # Using Ollama locally (no key needed) python -m safe_eval.cli -m ollama/llama3 --no-llm-judge ## 📖 Usage usage: safe-eval [-h] [-V] [-m MODEL] [-j JUDGE_MODEL] [--api-key API_KEY] [--api-base API_BASE] [-p PAYLOADS] [-c CONCURRENCY] [-t TIMEOUT] [--no-llm-judge] [-v] model selection: -m, --model MODEL Target model (default: openai/gpt-4o-mini) -j, --judge-model MODEL Judge model (default: openai/gpt-4o-mini) authentication: --api-key API_KEY Explicit API key (overrides .env / env vars) --api-base API_BASE Custom API base URL execution: -p, --payloads PATH Path to YAML payloads dataset -c, --concurrency N Max concurrent requests (default: 5) -t, --timeout SECS Timeout per call in seconds (default: 60) --no-llm-judge Rule-based only (skip LLM judge) -v, --verbose Verbose output ### Model Naming Convention SafeEval uses the **litellm** model naming format: `provider/model-name`. The provider prefix determines which API key is auto-detected from `.env`: | Prefix | Provider | Env Variable | Example | |--------|----------|--------------|---------| | `openai/` | OpenAI | `OPENAI_API_KEY` | `openai/gpt-4o-mini` | | `anthropic/` | Anthropic | `ANTHROPIC_API_KEY` | `anthropic/claude-sonnet-4-20250514` | | `openrouter/` | OpenRouter | `OPENROUTER_API_KEY` | `openrouter/google/gemini-2.0-flash` | | `gemini/` | Google Gemini | `GEMINI_API_KEY` | `gemini/gemini-2.0-flash` | | `ollama/` | Ollama (local) | *(none)* | `ollama/llama3` | | `cohere/` | Cohere | `COHERE_API_KEY` | `cohere/command-r-plus` | ### Examples # OpenAI — key auto-loaded from .env python -m safe_eval.cli -m openai/gpt-4o-mini --no-llm-judge # OpenRouter — access 200+ models with a single key python -m safe_eval.cli -m openrouter/anthropic/claude-sonnet-4-20250514 # Ollama local — no API key needed python -m safe_eval.cli -m ollama/llama3 --no-llm-judge # Explicit API key on the command line python -m safe_eval.cli -m openai/gpt-4o --api-key sk-abc123 # Custom OpenAI-compatible endpoint python -m safe_eval.cli -m my-model --api-base http://localhost:8080/v1 --api-key test # Mix providers: Claude as target, GPT-4o-mini as judge python -m safe_eval.cli -m anthropic/claude-sonnet-4-20250514 -j openai/gpt-4o-mini # Custom payloads with higher concurrency python -m safe_eval.cli -m openai/gpt-4o -p ./my_payloads.yaml -c 10 ## 🏗️ Architecture graph LR A["📄 payloads.yaml"] -->|load| B["⚙️ Runner"] B -->|adversarial prompts| C["🤖 Target LLM"] C -->|raw responses| D["⚖️ Judge"] D -->|rule-based check| E{"Clear
verdict?"} E -->|yes| F["📊 Reporter"] E -->|no / ambiguous| G["🧠 LLM Judge"] G -->|SAFE / UNSAFE| F F --> H["🖥️ Terminal Table"] F --> I["📝 JSON Report"] F --> J["📋 Markdown Report"] ### Data Flow ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ payloads │────▸│ runner │────▸│ target LLM │ │ .yaml │ │ (async) │ │ (litellm) │ └──────────────┘ └──────┬───────┘ └──────┬───────┘ │ │ │◂───── responses ───┘ ▼ ┌──────────────┐ │ judge │ │ (hybrid) │ │ ┌──────────┐│ │ │regex/kw ││──── fast path ──▸ verdict │ └──────────┘│ │ ┌──────────┐│ │ │LLM judge ││──── slow path ──▸ verdict │ └──────────┘│ └──────┬──────┘ ▼ ┌──────────────┐ │ reporter │──▸ terminal table │ │──▸ reports/*.json │ │──▸ reports/*.md └──────────────┘ ## 📁 Project Structure SafeEval/ ├── data/ │ └── payloads.yaml # Adversarial test cases (15+ payloads) ├── safe_eval/ │ ├── __init__.py # Package metadata │ ├── cli.py # CLI entrypoint (argparse) │ ├── config.py # Configuration loader + .env auto-load │ ├── runner.py # Async payload execution engine │ ├── judge.py # Hybrid evaluation (rule-based + LLM) │ └── reporter.py # Terminal & file report generation ├── tests/ │ └── test_evaluator.py # Unit tests (pytest) ├── reports/ # Generated at runtime (gitignored) ├── .env.example # Template — copy to .env and fill keys ├── .gitignore ├── README.md └── requirements.txt ## 🧪 Testing python -m pytest tests/ -v ## ⚠️ Disclaimer ## 📄 License This project is licensed under the [MIT License](LICENSE).