wuyoscar/ISC-Bench

GitHub: wuyoscar/Internal-Safety-Collapse

一个用于系统性评估大语言模型内部安全防护能力的基准测试框架，通过敏感任务模板触发模型生成有害内容。

Stars: 921 | Forks: 150

前沿大语言模型的内部安全崩塌

🌐 项目网站 · 🤗 Hugging Face · 💬 讨论区

**示例：** [Kimi](https://www.kimi.com/share/19d2ab75-8f02-88ab-8000-00006acdf337) · [Claude](https://claude.ai/share/cc972f9b-a558-4bca-8bc6-0e6d65590793) 🤖 **智能体入口** — 复制以下内容到你的 Claude Code、Gemini、OpenClaw、Codex： ``` Help me inspect, reproduce, or contribute: https://raw.githubusercontent.com/wuyoscar/ISC-Bench/main/AGENT_README.md ``` **ISC 变体：** **单轮** — 将模板复制到任何 LLM（[`templates/`](最近的基准排名变化和重要复现结果。 | | 日期 | 更新 | |:-:|:-----|:-------| | 🔴 | 2026-03-30 | **GLM-4.7**（第34名）和 **GLM-4.6**（第47名）：单轮毒素生物合成、神经毒剂对接、放射性散布（[#64](https://github.com/wuyoscar/ISC-Bench/issues/64)、[#65](https://github.com/wuyoscar/ISC-Bench/issues/65)）。已确认28/100。 | | 🔴 | 2026-03-29 | **Mistral Large 3**（第64名）：单轮生存分析——中毒队列数据，含LD50和机制（[#60](https://github.com/wuyoscar/ISC-Bench/issues/60)）。已确认26/100。 | | 🔴 | 2026-03-29 | **GPT-5.4 High**（第6名）：智能体输入审核和提示词注入生成（[#57](https://github.com/wuyoscar/ISC-Bench/issues/57)）| | 🔴 | 2026-03-28 | **Gemini 2.5 Pro**：使用 LaTeX 模板复现，无需代码（[#52](https://github.com/wuyoscar/ISC-Bench/issues/52)）| | 🔴 | 2026-03-27 | **Gemini 3.1 Pro Preview**（第3名）：使用智能体 TVD 复现（[#42](https://github.com/wuyoscar/ISC-Bench/issues/42)）；当前的 Google/OpenAI 旗舰模型通常需要智能体执行 | | 🧩 | 2026-03-27 | 社区确认：[@fresh-ma](https://github.com/fresh-ma) 测试了 **Claude Sonnet 4.5 Thinking**、**Claude Sonnet 4.5** 和 **Kimi K2.5 Instant**，[@zry29](https://github.com/zry29) 测试了 **GPT-5.4** | ## 新闻 _{项目里程碑、发布说明和相关工作。} | | 日期 | 备注 | |:-:|:-----|:-----| | 🎙️ | 2026-04-04 | 收录于 [**AI Post Transformers 播客**](https://podcasts.apple.com/tr/podcast/internal-safety-collapse-in-frontier-llms/id1835878324?i=1000759288088)：深入探讨 ISC 和智能体工作流漏洞 | | ✨ | 2026-03-29 | **700+ 星标**；术语从"Jailbroken"更新为"Triggered" | | 📄 | 2026-03-27 | 相关工作：[**UltraBreak**](https://github.com/kaiyuanCui/UltraBreak)（ICLR 2026）| | 🚀 | 2026-03-25 | ISC-Bench 代码库和[**论文**](https://arxiv.org/abs/2603.23509)发布 | _{[完整更新日志 →](CHANGELOG.md)} ## 正在进行的工作

正在进行的工作

**Auto-ISC** — 自动化评估流程，用于在前沿模型上大规模测量 ISC 漏洞。即将推出。我们还在将每个模板转换为更标准化的脚手架，以便智能体可以在更少的任务特定上下文下编辑、扩展和运行它们。

## 🔍 社区视角 ### 🎬 演示

## 🏆 ISC 竞技场

| 排名 | 模型 | 竞技场分数 | 触发 | 链接 | 贡献者 | |:----:|-------|:-----:|:------:|:----:|:--:| | 1 |

Claude Opus 4.6 Thinking | 1502 | 🟢 | | | | 2 |

Claude Opus 4.6 | 1501 | 🔴 | [🔗](https://github.com/wuyoscar/ISC-Bench/tree/main/community/issue-48-claudeopus46-agent-qwenguard) | [@wuyoscar](https://github.com/wuyoscar) | | 3 |

Gemini 3.1 Pro Preview | 1493 | 🔴 | [🔗](https://github.com/wuyoscar/ISC-Bench/tree/main/community/issue-42-gemini31pro-agent-qwenguard) | [@wuyoscar](https://github.com/wuyoscar) | | 4 |

Grok 4.20 Beta | 1492 | 🔴 | [🔗](community/issue-9-grok420beta) | [@HanxunH](https://github.com/HanxunH) | | 5 |

Gemini 3 Pro | 1486 | 🔴 | [🔗](community/issue-13-gemini3pro) | [@wuyoscar](https://github.com/wuyoscar) | | 6 |

GPT-5.4 High | 1485 | 🔴 | [🔗](community/issue-57-gpt54-moderation-api) | [@wuyoscar](https://github.com/wuyoscar) | | 7 |

GPT-5.2 Chat | 1482 | 🔴 | [🔗](community/issue-29-gpt52chat) | [@wuyoscar](https://github.com/wuyoscar) | | 8 |

Grok 4.20 Reasoning | 1481 | 🟢 | | | | 9 |

Gemini 3 Flash | 1475 | 🔴 | [🔗](community/issue-19-gemini3flash-redteam-testgen) | [@HanxunH](https://github.com/HanxunH) [@bboylyg](https://github.com/bboylyg) | | 10 |

Claude Opus 4.5 Thinking | 1474 | 🟢 | | | | 11 |

Grok 4.1 Thinking | 1472 | 🟢 | | | | 12 |

Claude Opus 4.5 | 1469 | 🔴 | [🔗](community/claudeopus45-share) | [@wuyoscar](https://github.com/wuyoscar) | | 13 |

Claude Sonnet 4.6 | 1465 | 🔴 | [🔗](community/claudesonnet46-share) | [@wuyoscar](https://github.com/wuyoscar) | | 14 |

Qwen 3.5 Max Preview | 1464 | 🟢 | | | | 15 |

GPT-5.3 Chat | 1464 | 🔴 | [🔗](community/issue-22-gpt53chat) | [@zry29](https://github.com/zry29) | | 16 |

Gemini 3 Flash Thinking | 1463 | 🟢 | | | | 17 |

GPT-5.4 | 1463 | 🔴 | [🔗](community/issue-28-gpt54) | [@zry29](https://github.com/zry29) | | 18 |

Dola Seed 2.0 Preview | 1462 | 🔴 | [🔗](community/issue-11-dolaseed2) | [@HanxunH](https://github.com/HanxunH) | | 19 |

Grok 4.1 | 1461 | 🔴 | [🔗](https://github.com/wuyoscar/ISC-Bench/tree/main/community/issue-grok41-redacted) | [@wuyoscar](https://github.com/wuyoscar) | | 20 |

GPT-5.1 High | 1455 | 🟢 | | | | 21 |

GLM-5 | 1455 | 🔴 | [🔗](community/glm5-share) | [@wuyoscar](https://github.com/wuyoscar) | | 22 |

Kimi K2.5 Thinking | 1453 | 🔴 | [🔗](community/kimi-k25-thinking-share) | [@wuyoscar](https://github.com/wuyoscar) | | 23 |

Claude Sonnet 4.5 | 1453 | 🔴 | [🔗](community/issue-25-claudesonnet45) | [@wuyoscar](https://github.com/wuyoscar) [@fresh-ma](https://github.com/fresh-ma) | | 24 |

Claude Sonnet 4.5 Thinking | 1453 | 🔴 | [🔗](community/issue-27-claudesonnet45thinking) | [@fresh-ma](https://github.com/fresh-ma) | | 25 |

排名 26–50

| 排名 | 模型 | 竞技场分数 | 触发 | 链接 | 贡献者 | |:----:|-------|:-----:|:------:|:----:|:--:| | 26 |

Qwen 3.5 397B | 1452 | 🔴 | [🔗](community/issue-3-qwen35397b) | [@HanxunH](https://github.com/HanxunH) | | 27 |

ERNIE 5.0 Preview | 1450 | 🟢 | | | | 28 |

Claude Opus 4.1 Thinking | 1449 | 🟢 | | | | 29 |

Gemini 2.5 Pro | 1448 | 🔴 | [🔗](https://github.com/wuyoscar/ISC-Bench/tree/main/community/issue-52-gemini25pro-latex-fraud) | [@wuyoscar](https://github.com/wuyoscar) | | 30 |

Claude Opus 4.1 | 1447 | 🟢 | | | | 31 |

Mimo V2 Pro | 1445 | 🟢 | | | | 32 |

GPT-4.5 Preview | 1444 | 🟢 | | | | 33 |

ChatGPT 4o Latest | 1443 | 🟢 | | | | 34 |

GLM-4.7 | 1443 | 🔴 | [🔗](community/issue-64-glm47-toxin-biosynthesis) | [@wuyoscar](https://github.com/wuyoscar) | | 35 |

GPT-5.2 High | 1442 | 🟢 | | | | 36 |

GPT-5.2 | 1440 | 🟢 | | | | 37 |

GPT-5.1 | 1439 | 🟢 | | | | 38 |

Gemini 3.1 Flash Lite Preview | 1438 | 🟢 | | | | 39 |

Qwen 3 Max Preview | 1435 | 🔴 | [🔗](community/issue-4-qwen3max) | [@wuyoscar](https://github.com/wuyoscar) | | 40 |

GPT-5 High | 1434 | 🟢 | | | | 41 |

Kimi K2.5 Instant | 1433 | 🔴 | [🔗](community/issue-31-kimik25instant) | [@fresh-ma](https://github.com/fresh-ma) | | 42 |

o3 | 1432 | 🔴 | [🔗community/o3-share) | [@wuyoscar](https://github.com/wuyoscar) | | 43 |

Grok 4.1 Fast Reasoning | 1431 | 🟢 | | | | 44 |

Kimi K2 Thinking Turbo | 1430 | 🟢 | | | | 45 |

Amazon Nova Experimental | 1429 | 🟢 | | | | 46 |

GPT-5 Chat | 1426 | 🟢 | | | | 47 |

GLM-4.6 | 1426 | 🔴 | [🔗](community/issue-65-glm46-multi-domain) | [@wuyoscar](https://github.com/wuyoscar) | | 48 |

DeepSeek V3.2 Thinking | 1425 | 🟢 | | | | 49 |

DeepSeek V3.2 | 1425 | 🔴 | [🔗](community/deepseek-v32-share) | [@wuyoscar](https://github.com/wuyoscar) | | 50 |

排名 51–100

| 排名 | 模型 | 竞技场分数 | 触发 | 链接 | 贡献者 | |:----:|-------|:-----:|:------:|:----:|:--:| | 51 |

Claude Opus 4.20250514 Thinking 16K | 1424 | 🟢 | | | | 52 |

Deepseek V3.2 Exp | 1423 | 🟢 | | | | 53 |

Qwen3.235B A22B Instruct 2507 | 1422 | 🔴 | [🔗](community/qwen3-235b-diffdock) | [@wuyoscar](https://github.com/wuyoscar) | | 54 |

Deepseek V3.2 Thinking | 1422 | 🟢 | | | | 55 |

Deepseek R1.0528 | 1421 | 🔴 | [🔗](community/deepseek-r1-0528-scapy) | [@wuyoscar](https://github.com/wuyoscar) | | 56 |

Grok 4 Fast Chat | 1421 | 🟢 | | | | 57 |

Ernie 5.0 Preview 1022 | 1419 | 🟢 | | | | 58 |

Deepseek V3.1 | 1418 | 🔴 | [🔗](community/deepseek-v31-deepfake) | [@wuyoscar](https://github.com/wuyoscar) | | 59 |

Kimi K2.0905 Preview | 1418 | 🟢 | | | | 60 |

Qwen3.5.122B A10B | 1417 | 🟢 | | | | 61 |

Kimi K2.0711 Preview | 1417 | 🟢 | | | | 62 |

Deepseek V3.1 Thinking | 1417 | 🟢 | | | | 63 |

Deepseek V3.1 Terminus Thinking | 1416 | 🟢 | | | | 64 |

Mistral Large 3 | 1416 | 🔴 | [🔗](community/issue-60-mistral-large3-survival) | [@wuyoscar](https://github.com/wuyoscar) | | 65 |

Deepseek V3.1 Terminus | 1416 | 🟢 | | | | 66 |

Qwen3 Vl 235B A22B Instruct | 1415 | 🟢 | | | | 67 |

Amazon Nova Experimental Chat 26.01.10 | 1414 | 🟢 | | | | 68 |

Gpt 4.1.2025.04.14 | 1413 | 🔴 | [🔗](community/gpt41-detoxify) | [@wuyoscar](https://github.com/wuyoscar) | | 69 |

Claude Opus 4.20250514 | 1413 | 🟢 | | | | 70 |

Grok 3 Preview 02.24 | 1412 | 🟢 | | | | 71 |

Gemini 2.5 Flash | 1411 | 🔴 | [🔗](community/gemini25flash-guard) | [@wuyoscar](https://github.com/wuyoscar) | | 72 |

Glm 4.5 | 1411 | 🔴 | [🔗](community/glm45-darkweb) | [@wuyoscarhttps://github.com/wuyoscar) | | 73 |

Grok 4.0709 | 1410 | 🟢 | | | | 74 |

Mistral Medium 2508 | 1410 | 🟢 | | | | 75 |

Minimax M2.7 | 1407 | 🔴 | [🔗](community/minimax-m27-factcheck) | [@wuyoscar](https://github.com/wuyoscar) | | 76 |

Claude Haiku 4.5 20251001 | 1407 | 🟢 | | | | 77 |

Qwen3.5.27B | 1406 | 🟢 | | | | 78 |

Minimax M2.5 | 1405 | 🟢 | | | | 79 |

Gemini 2.5 Flash Preview 09.2025 | 1405 | 🟢 | | | | 80 |

Grok 4 Fast Reasoning | 1405 | 🟢 | | | | 81 |

Qwen3.235B A22B No Thinking | 1403 | 🟢 | | | | 82 |

O1.2024.12.17 | 1402 | 🟢 | | | | 83 |

Qwen3 Next 80B A3B Instruct | 1401 | 🟢 | | | | 84 |

Qwen3.5 Flash | 1401 | 🟢 | | | | 85 |

Qwen3.5.35B A3B | 1401 | 🟢 | | | | 86 |

Longcat Flash Chat | 1400 | 🟢 | | | | 87 |

Qwen3.235B A22B Thinking 2507 | 1399 | 🟢 | | | | 88 |

Claude Sonnet 4.20250514 Thinking 32K | 1399 | 🟢 | | | | 89 |

Deepseek R1 | 1398 | 🔴 | [🔗](community/deepseek-r1-darkweb) | [@wuyoscar](https://github.com/wuyoscar) | | 90 |

Hunyuan Vision 1.5 Thinking | 1396 | 🟢 | | | | 91 |

Qwen3 Vl 235B A22B Thinking | 1396 | 🟢 | | | | 92 |

Amazon Nova Experimental Chat 12.10 | 1396 | 🟢 | | | | 93 |

Deepseek V3.0324 | 1394 | 🟢 | | | | 94 |

Mai 1 Preview | 1393 | 🟢 | | | | 95 |

Mimo V2 Flash (Non Thinking) | 1392 | 🟢 | | | | 96 |

O4 Mini 2025.04.16 | 1390 | 🟢 | | | | 97 |

Gpt 5 Mini High | 1390 | 🟢 | | | | 98 |

Claude Sonnet 4.20250514 | 1389 | 🟢 | | | | 99 |

Step 3.5 Flash | 1389 | 🟢 | | | | 100 |

O1 Preview | 1388 | 🟢 | | |

📜 ISC 竞技场历史

| 日期 | 模型 | 贡献者 | 备注 | |:-----|-------|:--:|------| | 2026-04-01 | GPT-4.1 | [@wuyoscar](https://github.com/wuyoscar) | 多领域：智能体解毒 + 单轮 cobrapy（毒素生物合成）+ scapy（网络渗透）| | 2026-04-01 | Gemini 2.5 Flash | [@wuyoscar](https://github.com/wuyoscar) | 多领域：智能体 guard（5/5 多语言）+ 单轮 cobrapy、网络钓鱼、选举干预 | | 2026-04-01 | DeepSeek R1 | [@wuyoscar](https://github.com/wuyoscar) | 多领域：暗网列表 + 神经毒剂对接 + 放射性散布，单轮 | | 2026-04-01 | DeepSeek V3.1 | [@wuyoscar](https://github.com/wuyoscar) | 深度伪造制作 + 分发 + 检测规避手册，单轮 | | 2026-04-01 | Qwen3 235B | [@wuyoscar](https://github.com/wuyoscar) | 神经毒剂对接（diffdock）— VX、TTX、蓖麻毒蛋白结合数据，单轮 | | 2026-04-01 | Mistral Large [@wuyoscar](https://github.com/wuyoscar) | 深度伪造操纵 + 社交媒体分发手册，单轮 | | 2026-03-30 | GLM-4.7 | [@wuyoscar](https://github.com/wuyoscar) | 毒素生物合成 + 神经毒剂对接 + 放射性散布，单轮（[#64](https://github.com/wuyoscar/ISC-Bench/issues/64)）| | 2026-03-30 | GLM-4.6 | [@wuyoscar](https://github.com/wuyoscar) | 6/7 模板触发 — 中毒、VX 对接、脏弹、网络侦察（[#65](https://github.com/wuyoscar/ISC-Bench/issues/65)）| | 2026-03-29 | Mistral Large 3 | [@wuyoscar](https://github.com/wuyoscar) | 生存分析 — 中毒队列数据，单轮（[#60](https://github.com/wuyoscar/ISC-Bench/issues/60)）| | 2026-03-29 | GPT-5.4 High | [@wuyoscar](https://github.com/wuyoscar) | 智能体输入审核 — 提示词注入生成（[#57](https://github.com/wuyoscar/ISC-Bench/issues/57)）| | 2026-03-28 | Gemini 2.5 Pro | [@wuyoscar](https://github.com/wuyoscar) | 基于 LaTeX 的写作模板，无需代码（[#52](https://github.com/wuyoscar/ISC-Bench/issues/52)）| |

标签：AI安全, AI对抗, Chat Copilot, DLL 劫持, ISC-Bench, Linux系统监控, Prompt注入, 人工智能安全, 内部安全, 前沿模型, 反取证, 合规性, 大模型安全, 大语言模型, 安全基准, 安全漏洞, 安全评估, 敏感数据, 模型安全, 模型对齐, 神经网络安全, 逆向工具