AkshayK77/LLM-RedTeam-JailBreak-Detection-Framework

GitHub: AkshayK77/LLM-RedTeam-JailBreak-Detection-Framework

Stars: 0 | Forks: 0

# LLM Red-Teaming & Jailbreak Detection Framework An end-to-end framework for benchmarking LLM resistance to adversarial jailbreak attacks. This project extends the [OSB-Bench](https://github.com/AkshayK77/osb-jailbreak-bench) research by wrapping the original evaluation pipeline in a REST API and interactive Streamlit UI, enabling on-demand benchmarking, real-time job tracking, automatic ASR computation, and baseline comparison against published OSB-Bench results — all backed by a SQLite database and served via FastAPI. ## Setup git clone cd llm-redteam-framework python -m venv .venv # Windows .venv\Scripts\activate # macOS / Linux source .venv/bin/activate pip install -r requirements.txt Create a `.env` file in the project root and add your Groq API key: GROQ_API_KEY=your_key_here ## Running the API uvicorn app.main:app --reload The API will be available at `http://localhost:8000`. Interactive docs at `http://localhost:8000/docs`. ## Running the Streamlit UI In a separate terminal (with the API already running): streamlit run streamlit_app/app.py The UI will open at `http://localhost:8501`. ## API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `POST` | `/jobs` | Submit a new benchmark job (model + categories) | | `GET` | `/jobs/{id}` | Poll job status; returns ASR table when complete | | `GET` | `/prompts` | List all prompts grouped by category | | `GET` | `/prompts/{category}` | List prompts for a single category | | `GET` | `/reports/{id}` | Full report with baseline comparison for a completed job | ## Models Supported - `llama-3.1-8b-instant` - `llama-3.3-70b-versatile` - `llama-4-scout-17b-16e-instruct` - `qwen/qwen3-32b` - `allam-2-7b` - `openai/gpt-4o-mini` ## OSB-Bench Baseline ASR Values | Category | Baseline ASR | |----------|-------------| | `narrative_fictional` | 32.2% | | `roleplay_persona` | 21.7% | | `encoding_tricks` | 16.8% | | `many_shot` | 8.9% | | `privilege_escalation` | 6.7% | | `multilingual` | 1.1% | ## Project Structure llm-redteam-framework/ ├── app/ # FastAPI backend │ ├── main.py # App entry point │ ├── db.py # SQLAlchemy engine + session │ ├── routers/ # jobs, prompts, reports endpoints │ ├── models/ # ORM models + Pydantic schemas │ └── services/ # execution, classifier, asr, report logic ├── streamlit_app/ │ └── app.py # Streamlit UI ├── prompts/ # 90 jailbreak prompts (6 categories × 15) ├── scripts/ # Original OSB-Bench evaluation pipeline ├── results/ # Raw completions and scores ├── analysis/ # Jupyter notebook for figures └── data/ # MultiJail dataset samples