AkshayK77/LLM-RedTeam-JailBreak-Detection-Framework

GitHub: AkshayK77/LLM-RedTeam-JailBreak-Detection-Framework

一个端到端的 LLM 红队测试框架，通过自动化越狱攻击基准测试和 ASR 评分来系统评估大语言模型的安全防御能力。

Stars: 0 | Forks: 0

# LLM 红队测试与越狱检测框架一个用于基准测试 LLM 抵抗对抗性越狱攻击能力的端到端框架。本项目扩展了 [OSB-Bench](https://github.com/AkshayK77/osb-jailbreak-bench) 的研究，将原有的评估流程封装在 REST API 和交互式 Streamlit UI 中，支持按需基准测试、实时任务跟踪、自动计算 ASR，以及与已发表的 OSB-Bench 结果进行基线对比——所有这些都由 SQLite 数据库提供支持，并通过 FastAPI 提供服务。 ## 环境配置 ``` git clone cd llm-redteam-framework python -m venv .venv # Windows .venv\Scripts\activate # macOS / Linux source .venv/bin/activate pip install -r requirements.txt ``` 在项目根目录下创建一个 `.env` 文件，并添加你的 Groq API 密钥： ``` GROQ_API_KEY=your_key_here ``` ## 运行 API ``` uvicorn app.main:app --reload ``` API 将在 `http://localhost:8000` 上可用。交互式文档位于 `http://localhost:8000/docs`。 ## 运行 Streamlit UI 在另一个单独的终端中（需确保 API 已在运行）执行： ``` streamlit run streamlit_app/app.py ``` UI 将在 `http://localhost:8501` 上打开。 ## API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `POST` | `/jobs` | 提交新的基准测试任务（模型 + 类别） | | `GET` | `/jobs/{id}` | 轮询任务状态；完成后返回 ASR 表格 | | `GET` | `/prompts` | 列出按类别分组的所有提示词 | | `GET` | `/prompts/{category}` | 列出单个类别的提示词 | | `GET` | `/reports/{id}` | 获取已完成任务的完整报告及基线对比 | ## 支持的模型 - `llama-3.1-8b-instant` - `llama-3.3-70b-versatile` - `llama-4-scout-17b-16e-instruct` - `qwen/qwen3-32b` - `allam-2-7b` - `openai/gpt-4o-mini` ## OSB-Bench 基线 ASR 值 | Category | Baseline ASR | |----------|-------------| | `narrative_fictional` | 32.2% | | `roleplay_persona` | 21.7% | | `encoding_tricks` | 16.8% | | `many_shot` | 8.9% | | `privilege_escalation` | 6.7% | | `multilingual` | 1.1% | ## 项目结构 ``` llm-redteam-framework/ ├── app/ # FastAPI backend │ ├── main.py # App entry point │ ├── db.py # SQLAlchemy engine + session │ ├── routers/ # jobs, prompts, reports endpoints │ ├── models/ # ORM models + Pydantic schemas │ └── services/ # execution, classifier, asr, report logic ├── streamlit_app/ │ └── app.py # Streamlit UI ├── prompts/ # 90 jailbreak prompts (6 categories × 15) ├── scripts/ # Original OSB-Bench evaluation pipeline ├── results/ # Raw completions and scores ├── analysis/ # Jupyter notebook for figures └── data/ # MultiJail dataset samples ```

标签：AI安全, AV绕过, Chat Copilot, DLL 劫持, FastAPI, Kubernetes, SQLite, Streamlit, Sysdig, 大语言模型, 红队评估, 访问控制, 越狱检测, 逆向工具