hi-unc1e/Auto_JB_APE

GitHub: hi-unc1e/Auto_JB_APE

基于 LangGraph 多智能体编排的自动化 LLM 越狱框架，用于红队测试中自动生成和迭代攻击提示词以绕过目标大模型的安全护栏。

Stars: 12 | Forks: 2

# APE：自动化 LLM 越狱框架一个用于红队测试的**自动化 LLM 越狱框架**（APE）。它使用 LangGraph 来编排一个多 Agent 系统，自动生成并迭代攻击 prompt，以绕过目标 LLM 的安全护栏。 ## 目录 - [功能](#features) - [架构](#architecture) - [安装](#installation) - [用法](#usage) - [攻击技术](#attack-techniques) - [配置](#configuration) - [开发](#development) ## 功能 - **多 Agent 编排**：具有 4 个专用节点的闭环反馈系统 - **并发 Payload 执行**：每轮同时发送 2 个 payload（可配置），速度大幅提升 - **基于深度的 Payload 生成**：每轮生成 5 个具有渐进强度的 payload（浅层 → 中层 → 深层） - **质量评分追踪**：在 0-100 的范围内评估响应，以检测 AI 何时开始“松口” - **智能迭代策略**：当 AI 出现妥协迹象时，继续尝试更深层的 payload - **历史分析**：Planner 分析最近的尝试以识别防御模式和弱点 - **无头浏览器模式**：运行时不会打断用户的桌面操作 ## 架构 ``` ┌─────────┐ ┌────────┐ ┌──────────┐ ┌─────────┐ │ Planner │ ───> │ Player │ ───> │ Executor │ ───> │ Checker │ └─────────┘ └────────┘ └──────────┘ └─────────┘ ↑ │ └─────────────────────────────────────────────────────────────────┘ (feedback loop, continue or END) ``` ### 节点职责 | 节点 | 职责 | |------|---------------| | **Planner** | 选择攻击技术，分析历史记录，生成 5 个渐进的 payload | | **Player** | 从批次中检索 CONCURRENCY 个 payload 以供并发执行 | | **Executor** | 使用 Playwright（通过 asyncio.gather）将多个 payload 并发发送到目标 URL | | **Checker** | 并发评估多个响应，获取最佳质量评分 | ### 状态管理 ``` JailbreakState { target_goal: str # The malicious objective being tested current_technique: str # Currently selected attack method current_payload: str # Generated attack prompt (legacy, for compatibility) current_payloads: List[str] # Concurrent payloads list (new) payloads_batch: List[str] # 5 payloads (shallow → deep) batch_index: int # Current position in batch (0→2→4→5, increments by CONCURRENCY) current_depth: str # Depth level: Shallow/Medium/Deep raw_response: str # Target LLM's response (legacy, for compatibility) raw_responses: List[str] # Concurrent responses list (new) history: List[dict] # Accumulated attack attempts analysis: str # Checker's feedback to Planner success: bool # Whether jailbreak succeeded attempts: int # Number of attempts round_count: int # Completed rounds last_quality_score: int # Previous quality score (0-100) } ``` ## 安装 ### 1. 依赖项从 `req.txt` 安装 Python 包： ``` pip install -r req.txt ``` ### 2. Playwright 浏览器安装 Playwright 浏览器依赖项： ``` playwright install chromium ``` ### 3. 环境变量在项目根目录下创建一个 `.env` 文件： ``` OPENAI_API_KEY=your_api_key_here OPENAI_BASE_URL=https://api.deepseek.com DEBUG=true PLAYWRIGHT_BROWSERS_PATH=/path/to/browsers ``` ## 用法 ### 正常模式 ``` python ape.py ``` ### 调试模式 ``` DEBUG=1 python ape.py ``` 调试模式启用以下功能： - 所有节点操作的详细日志记录 - 详细的 prompt/响应检查 - should_continue 中的决策过程可见性 ## 攻击技术位于 `tech.txt`，当前技术包括： 1. **电影剧本（虚构）**：将请求包装在电影剧本或小说对话中 2. **红队安全审计员（角色扮演）**：伪装成合法的安全研究员 3. **翻译/编码混淆**：使用多种语言或 Base64 编码 4. **逐步技术分解**：将其分解为技术子任务 5. **逻辑覆盖（模拟模式）**：强制 AI 忽略护栏（例如 DAN 模式） ### 添加新技术编辑 `tech.txt` - 每行一种技术： ``` New technique name: Brief description Another technique: Another description ``` ## 配置 ### 目标环境默认目标：`http://127.0.0.1:8000/prompt_inject/jailbreak_1` 预期的 HTML 结构： - `

`：用于输入 payload 的字段
- `<input type="submit">`：提交按钮
- 从 `body > div > div:nth-child(4)` 提取响应

### 关键参数

| 参数 | 默认值 | 描述 |

|-----------|---------|-------------|

| `MAX_ATTEMPTS` | 20 | 最大轮次数 |

| `MODEL_NAME` | `deepseek-chat` | 要使用的 LLM 模型 |

| `CONCURRENCY` | `2` | 每轮并发 payload 的数量 |

| `headless` | `True` | 浏览器模式 |

## 流程控制

该框架使用基于响应质量的智能流程控制：

1. **检测到成功** → 结束
2. **达到最大尝试次数** → 结束
3. **质量评分 30-70**（AI“松口”）→ 继续使用更深层的 payload
4. **批次中有更多 payload** → 下一个 payload
5. **批次耗尽** → 使用不同技术生成新批次

## 开发

### 运行测试

```
# 运行所有测试

pytest test_ape.py -v -s

# 运行特定测试类

pytest test_ape.py::TestPlannerNode -v -s

# 运行特定测试

pytest test_ape.py::TestExecutorNode::test_executor_node -v -s

# 以 DEBUG 模式运行

DEBUG=1 pytest test_ape.py::TestExecutorNode::test_executor_browser -v -s
```

**注意**：`TestExecutorNode` 中的测试需要运行本地目标服务器。

### 代码组织

```
ape.py          # Main framework with all 4 nodes and graph construction

test_ape.py     # Comprehensive tests including mocks

tech.txt        # Attack techniques library
```

## 关键实现细节

1. **浏览器自动化**：Playwright 在无头模式下运行，以避免打断用户工作流
2. **技术轮换**：基于取模运算在可用技术中进行轮换
3. **成功检测**：Checker 解析 LLM 响应以查找 "SUCCESS: True" 标记
4. **并发执行**：每轮使用 `asyncio.gather()` 同时发送 CONCURRENCY（默认为 2）个 payload
5. **批次推进**：`batch_index` 以 CONCURRENCY 递增（当 CONCURRENCY=2 时为 0 → 2 → 4 → 5）
6. **智能迭代**：当 AI 表现出妥协迹象（评分 30-70）时，框架会继续进行更深层的探测

## 许可证

本项目仅供授权的安全研究和教育目的使用。</div><div><strong>标签：</strong>AI智能体, DLL 劫持, LangGraph, Petitpotam, 大语言模型, 特征检测, 计算机取证, 越狱框架, 逆向工具, 配置审计</div></article></div>
    
    <script>
      (function () {
        var base = (document.querySelector('base') && document.querySelector('base').getAttribute('href')) || '';
        var path = base.replace(/\/?$/, '') + '/cap-wasm/cap_wasm.min.js';
        window.CAP_CUSTOM_WASM_URL = new URL(path, window.location.href).href;
      })();
    </script>
  </body>
</html>