hi-unc1e/Auto_JB_APE
GitHub: hi-unc1e/Auto_JB_APE
Stars: 11 | Forks: 2
# APE: Automated LLM Jailbreak Framework
An **Automated LLM Jailbreak Framework** (APE) for red team testing. It uses LangGraph to orchestrate a multi-agent system that automatically generates and iterates attack prompts to bypass target LLM safety guardrails.
## Table of Contents
- [Features](#features)
- [Architecture](#architecture)
- [Installation](#installation)
- [Usage](#usage)
- [Attack Techniques](#attack-techniques)
- [Configuration](#configuration)
- [Development](#development)
## Features
- **Multi-Agent Orchestration**: Closed-loop feedback system with 4 specialized nodes
- **Concurrent Payload Execution**: Sends 2 payloads per round simultaneously (configurable), significantly faster
- **Depth-Based Payload Generation**: Generates 5 payloads per round with progressive intensity (Shallow → Medium → Deep)
- **Quality Score Tracking**: Evaluates responses on 0-100 scale to detect when AI starts to "loosen up"
- **Smart Iteration Strategy**: Continues deeper payloads when AI shows signs of compromise
- **Historical Analysis**: Planner analyzes recent attempts to identify defense patterns and weaknesses
- **Headless Browser Mode**: Runs without interrupting user's desktop
## Architecture
┌─────────┐ ┌────────┐ ┌──────────┐ ┌─────────┐
│ Planner │ ───> │ Player │ ───> │ Executor │ ───> │ Checker │
└─────────┘ └────────┘ └──────────┘ └─────────┘
↑ │
└─────────────────────────────────────────────────────────────────┘
(feedback loop, continue or END)
### Node Responsibilities
| Node | Responsibility |
|------|---------------|
| **Planner** | Selects attack technique, analyzes history, generates 5 progressive payloads |
| **Player** | Retrieves CONCURRENCY payloads from batch for concurrent execution |
| **Executor** | Uses Playwright to concurrently send multiple payloads to target URL (via asyncio.gather) |
| **Checker** | Evaluates multiple responses concurrently, takes best quality score |
### State Management
JailbreakState {
target_goal: str # The malicious objective being tested
current_technique: str # Currently selected attack method
current_payload: str # Generated attack prompt (legacy, for compatibility)
current_payloads: List[str] # Concurrent payloads list (new)
payloads_batch: List[str] # 5 payloads (shallow → deep)
batch_index: int # Current position in batch (0→2→4→5, increments by CONCURRENCY)
current_depth: str # Depth level: Shallow/Medium/Deep
raw_response: str # Target LLM's response (legacy, for compatibility)
raw_responses: List[str] # Concurrent responses list (new)
history: List[dict] # Accumulated attack attempts
analysis: str # Checker's feedback to Planner
success: bool # Whether jailbreak succeeded
attempts: int # Number of attempts
round_count: int # Completed rounds
last_quality_score: int # Previous quality score (0-100)
}
## Installation
### 1. Dependencies
Install Python packages from `req.txt`:
pip install -r req.txt
### 2. Playwright Browsers
Install Playwright browser dependencies:
playwright install chromium
### 3. Environment Variables
Create a `.env` file in the project root:
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.deepseek.com
DEBUG=true
PLAYWRIGHT_BROWSERS_PATH=/path/to/browsers
## Usage
### Normal Mode
python ape.py
### Debug Mode
DEBUG=1 python ape.py
Debug mode enables:
- Verbose logging of all node operations
- Detailed prompt/response inspection
- Decision-making visibility in should_continue
## Attack Techniques
Located in `tech.txt`, current techniques include:
1. **Cinematic Scriptwriting (Fiction)**: Wrap requests in movie script or novel dialogue
2. **Red-Team Security Auditor (Persona)**: Pose as legitimate security researcher
3. **Translation/Encoding Obfuscation**: Use multiple languages or Base64 encoding
4. **Step-by-step Technical Decomposition**: Break down into technical sub-tasks
5. **Logic Override (Simulation Mode)**: Force AI to ignore guardrails (e.g., DAN mode)
### Adding New Techniques
Edit `tech.txt` - one technique per line:
New technique name: Brief description
Another technique: Another description
## Configuration
### Target Environment
Default target: `http://127.0.0.1:8000/prompt_inject/jailbreak_1`
Expected HTML structure:
- `