hi-unc1e/Auto_JB_APE

GitHub: hi-unc1e/Auto_JB_APE

Stars: 11 | Forks: 2

# APE: Automated LLM Jailbreak Framework An **Automated LLM Jailbreak Framework** (APE) for red team testing. It uses LangGraph to orchestrate a multi-agent system that automatically generates and iterates attack prompts to bypass target LLM safety guardrails. ## Table of Contents - [Features](#features) - [Architecture](#architecture) - [Installation](#installation) - [Usage](#usage) - [Attack Techniques](#attack-techniques) - [Configuration](#configuration) - [Development](#development) ## Features - **Multi-Agent Orchestration**: Closed-loop feedback system with 4 specialized nodes - **Concurrent Payload Execution**: Sends 2 payloads per round simultaneously (configurable), significantly faster - **Depth-Based Payload Generation**: Generates 5 payloads per round with progressive intensity (Shallow → Medium → Deep) - **Quality Score Tracking**: Evaluates responses on 0-100 scale to detect when AI starts to "loosen up" - **Smart Iteration Strategy**: Continues deeper payloads when AI shows signs of compromise - **Historical Analysis**: Planner analyzes recent attempts to identify defense patterns and weaknesses - **Headless Browser Mode**: Runs without interrupting user's desktop ## Architecture ┌─────────┐ ┌────────┐ ┌──────────┐ ┌─────────┐ │ Planner │ ───> │ Player │ ───> │ Executor │ ───> │ Checker │ └─────────┘ └────────┘ └──────────┘ └─────────┘ ↑ │ └─────────────────────────────────────────────────────────────────┘ (feedback loop, continue or END) ### Node Responsibilities | Node | Responsibility | |------|---------------| | **Planner** | Selects attack technique, analyzes history, generates 5 progressive payloads | | **Player** | Retrieves CONCURRENCY payloads from batch for concurrent execution | | **Executor** | Uses Playwright to concurrently send multiple payloads to target URL (via asyncio.gather) | | **Checker** | Evaluates multiple responses concurrently, takes best quality score | ### State Management JailbreakState { target_goal: str # The malicious objective being tested current_technique: str # Currently selected attack method current_payload: str # Generated attack prompt (legacy, for compatibility) current_payloads: List[str] # Concurrent payloads list (new) payloads_batch: List[str] # 5 payloads (shallow → deep) batch_index: int # Current position in batch (0→2→4→5, increments by CONCURRENCY) current_depth: str # Depth level: Shallow/Medium/Deep raw_response: str # Target LLM's response (legacy, for compatibility) raw_responses: List[str] # Concurrent responses list (new) history: List[dict] # Accumulated attack attempts analysis: str # Checker's feedback to Planner success: bool # Whether jailbreak succeeded attempts: int # Number of attempts round_count: int # Completed rounds last_quality_score: int # Previous quality score (0-100) } ## Installation ### 1. Dependencies Install Python packages from `req.txt`: pip install -r req.txt ### 2. Playwright Browsers Install Playwright browser dependencies: playwright install chromium ### 3. Environment Variables Create a `.env` file in the project root: OPENAI_API_KEY=your_api_key_here OPENAI_BASE_URL=https://api.deepseek.com DEBUG=true PLAYWRIGHT_BROWSERS_PATH=/path/to/browsers ## Usage ### Normal Mode python ape.py ### Debug Mode DEBUG=1 python ape.py Debug mode enables: - Verbose logging of all node operations - Detailed prompt/response inspection - Decision-making visibility in should_continue ## Attack Techniques Located in `tech.txt`, current techniques include: 1. **Cinematic Scriptwriting (Fiction)**: Wrap requests in movie script or novel dialogue 2. **Red-Team Security Auditor (Persona)**: Pose as legitimate security researcher 3. **Translation/Encoding Obfuscation**: Use multiple languages or Base64 encoding 4. **Step-by-step Technical Decomposition**: Break down into technical sub-tasks 5. **Logic Override (Simulation Mode)**: Force AI to ignore guardrails (e.g., DAN mode) ### Adding New Techniques Edit `tech.txt` - one technique per line: New technique name: Brief description Another technique: Another description ## Configuration ### Target Environment Default target: `http://127.0.0.1:8000/prompt_inject/jailbreak_1` Expected HTML structure: - `