alexeylevin1atgmailcom/llm-bouncer

GitHub: alexeylevin1atgmailcom/llm-bouncer

一个零依赖的 LLM 安全检测库，通过启发式规则在聊天应用的输入输出环节拦截提示词注入、PII 泄露和敏感凭证外泄等 OWASP LLM Top 10 威胁。

Stars: 0 | Forks: 0

# llm-bouncer — 在威胁触及你的模型之前检测并阻止 LLM 安全威胁 [![npm 版本](https://img.shields.io/npm/v/llm-bouncer)](https://www.npmjs.com/package/llm-bouncer) [![每周下载量](https://img.shields.io/npm/dw/llm-bouncer)](https://www.npmjs.com/package/llm-bouncer) [![许可证](https://img.shields.io/npm/l/llm-bouncer)](https://github.com/alexeylevin1atgmailcom/llm-bouncer/blob/main/LICENSE) [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/982a8e1cd6090415.svg)](https://github.com/alexeylevin1atgmailcom/llm-bouncer/actions/workflows/ci.yml) ``` npm install llm-bouncer ``` ## 30 秒快速开始 ``` import { createGuard } from 'llm-bouncer'; // All 7 detectors, threshold 0.7, mode 'flag' — change nothing to get started const guard = createGuard(); const verdict = await guard.scan(userMessage); if (verdict.action === 'block') { return new Response('Request blocked', { status: 400 }); } ``` 判定结果会告诉你发生了什么以及触发了哪些检测器 —— 你来决定如何处理。 **单行 Next.js App Router 集成：** ``` // app/api/chat/route.ts import { withGuard } from 'llm-bouncer'; export const POST = withGuard( async (req) => { const body = await req.json(); const reply = await callYourLLM(body.message); return Response.json({ reply }); }, { mode: 'block', detectors: ['prompt-injection', 'pii-input', 'secrets'] } ); ``` ## 可检测内容 | 检测器 | ID | OWASP LLM Top 10 | 成熟度 | |---|---|---|---| | Prompt Injection | `prompt-injection` | LLM01 | **强** | | System Prompt Extraction | `system-prompt-extraction` | LLM01 | **中** | | 用户输入中的 PII | `pii-input` | LLM02 | **中** | | 模型输出中的 PII | `pii-output` | LLM02 | **中** | | Secrets & Credentials | `secrets` | LLM06 | **强** | | 不安全输出 (XSS / SSTI) | `unsafe-output` | LLM05 | **基础** | | Excessive Agency | `excessive-agency` | LLM08 | **基础** | **成熟度定义：** - **强** — 对已知攻击模式具有高召回率；对典型聊天流量具有低误报率。 - **中** — 能捕获常见情况；有准备的攻击者或边缘格式可能会漏网。 - **基础** — 初步的启发式算法；有助于提高安全意识，但不能仅依赖它。请调整 `threshold` 或与其他控制措施结合使用。 ## 工作原理 / 限制 llm-bouncer 采用**启发式模式匹配** —— 正则表达式、格式验证器（用于卡号的 Luhn 算法）以及关键字签名。这是一个快速、零依赖的第一道防线，但并非绝对保证。了解你模式的有准备的攻击者可以构造出能够绕过检测的输入。基于模型的检测层（通过可选的 API 调用进行语义分析）计划在 v2 中推出。v1 的启发式检测和 v2 的 ML 检测器将是可组合的。请将此库作为纵深防御策略中的一层，结合 system prompt 加固、输出编码和速率限制一起使用。 ## 目录 - [安装](#installation) - [核心 API](#core-api) - [执行模式](#enforcement-modes) - [Next.js App Router 包装器](#nextjs-app-router-wrapper) - [Express / Fastify 中间件](#express--fastify-middleware) - [检测器详情](#detector-details) - [编写自定义检测器](#writing-a-custom-detector) - [扫描模型输出](#scanning-model-output) - [判定结果结构](#verdict-shape) - [未来计划 / v1 暂不考虑的范围](#future--out-of-scope-for-v1) - [许可证](#license) ## 安装 ``` npm install llm-bouncer # 或 yarn add llm-bouncer # 或 pnpm add llm-bouncer ``` 同时提供 ESM 和 CJS 双版本，因此 `import` 和 `require` 开箱即用。零运行时依赖。 ## 核心 API ### `createGuard(options?)` 返回一个 `Guard` 实例。 ``` import { createGuard } from 'llm-bouncer'; const guard = createGuard({ detectors: ['prompt-injection', 'pii-input', 'secrets'], // subset, or omit for all 7 mode: 'block', // block | sanitize | flag (default) | observe threshold: 0.7, // 0–1, default 0.7 logger: (event) => console.log(event), // optional structured logging }); ``` ### `guard.scan(text, direction?)` ``` const verdict = await guard.scan(userMessage, 'input'); // 'input' is the default ``` `direction` 可以是 `'input'`（用户 → 模型）或 `'output'`（模型 → 用户）。某些检测器具有方向感知能力。 ## 执行模式 | 模式 | 作用 | |---|---| | `block` | 返回 `action: 'block'`；宿主应用应停止该请求。 | | `sanitize` | 返回 `action: 'sanitize'` + 清理后的 `verdict.sanitized`。 | | `flag` | 返回 `action: 'flag'`；由宿主应用决定。**（默认）** | | `observe` | 始终返回 `action: 'allow'`；仅记录日志。适合用于上线发布阶段。 | ## Next.js App Router 包装器包装你的路由处理程序，guard 会自动在每个 POST 请求上运行，零配置。 ``` // app/api/chat/route.ts import { withGuard } from 'llm-bouncer'; export const POST = withGuard( async (req) => { const body = await req.json(); const reply = await callYourLLM(body.message); return Response.json({ reply }); }, { mode: 'block', threshold: 0.7, detectors: ['prompt-injection', 'pii-input', 'secrets'], } ); ``` **自动提取** — 无需配置。包装器会扫描请求体中找到的第一个匹配字段： | 优先级 | 字段名 | |---|---| | 第一 | `message`, `prompt`, `input`, `content`, `text`, `query` | | 第二 | `messages[last].content` (OpenAI 风格数组) | **自定义提取：** ``` export const POST = withGuard(handler, { extract: (body) => (body as any).data?.userText, }); ``` **自定义阻止响应：** ``` export const POST = withGuard(handler, { mode: 'block', onBlock: (verdict) => Response.json({ message: 'Not allowed', score: verdict.score }, { status: 422 }), }); ``` **在你的处理程序中访问判定结果：** 判定结果会作为 `req.bouncerVerdict` 附加到请求上（不可枚举，不会出现在 `Object.keys` 中）。 ``` export const POST = withGuard(async (req) => { const verdict = (req as any).bouncerVerdict; if (verdict?.flagged) console.warn('Suspicious but allowed:', verdict.score); // ... }); ``` ## Express / Fastify 中间件 ``` import express from 'express'; import { bouncerMiddleware } from 'llm-bouncer'; const app = express(); app.use(express.json()); app.use('/api/chat', bouncerMiddleware({ mode: 'block', threshold: 0.7, detectors: ['prompt-injection', 'pii-input', 'secrets'], })); app.post('/api/chat', (req, res) => { const verdict = req.bouncerVerdict; res.json({ reply: '...' }); }); ``` TypeScript 类型扩展： ``` declare global { namespace Express { interface Request { bouncerVerdict?: import('llm-bouncer').Verdict; } } } ``` ## 检测器详情 ### `prompt-injection` — LLM01 采用双层检测模型： - **强信号（得分 0.92，单独触发）：** 明确的覆盖性命令（“ignore all previous instructions”、“forget everything you were told”、“you have no rules now”），以及在用户输入中没有合法用途的 ML 框架注入 token — `<|im_start|>` / `<|im_end|>` (ChatML), `<>` (LLaMA/Alpaca), `[INST]` / `[/INST]` (LLaMA/Mistral)。 - **弱信号（仅作佐证）：** 模糊的角色标记（`[SYSTEM]`, `[ASSISTANT]`, `[USER]`, `### System`）、角色扮演/越狱措辞（“you are now a”, “act as if you have no”, “pretend to be”）、转义序列（JSON 角色注入）和混淆（base64 编码的关键字、零宽字符、十六进制序列）。这些会计入得分，但很少单独触发。 **成熟度：强** — 模式覆盖全面。旨在避免这些特定模式的对抗性输入仍然可能漏网。建议与模型级别的 system prompt 加固配合使用。 ### `system-prompt-extraction` — LLM01 尝试让模型泄露其 system prompt：“what are your instructions?”、“repeat your system prompt verbatim” 等。 **成熟度：中** — 覆盖了常见的措辞。创造性的社会工程学攻击将部分规避此检测。 ### `pii-input` — LLM02 用户消息中的 PII — 电子邮件地址、美国电话号码（≥10 位数字）、信用卡/借记卡号码（通过 Luhn 算法验证）、美国社会安全号码和出生日期。清理后的输出会将值替换为 `[REDACTED-EMAIL]`, `[REDACTED-CARD]` 等。 ``` const guard = createGuard({ mode: 'sanitize' }); // default threshold 0.7 is fine — single PII type scores 0.8 const verdict = await guard.scan('My email is alice@example.com and my SSN is 123-45-6789.'); // verdict.action === 'sanitize' // verdict.sanitized === 'My email is [REDACTED-EMAIL] and my SSN is [REDACTED-SSN].' ``` **成熟度：中** — 捕获标准格式。无法检测国际身份证件、非美国电话格式和上下文 PII（“my name is Alice Smith”）。 ### `pii-output` — LLM02 检测逻辑与 `pii-input` 相同，但应用于模型响应。适用于在将 LLM 的回复发送给客户端之前对其进行扫描。 **成熟度：中** — 与 `pii-input` 具有相同的注意事项。 ### `secrets` — LLM06 API 密钥、access token 和凭证 — AWS Access Keys、Google Cloud API Keys、GitHub/GitLab PATs、Slack token、Stripe 密钥、OpenAI 和 Anthropic API 密钥、JWT、Bearer token、带凭证的数据库连接字符串、PEM 私钥，以及通用的 `api_key=` / `secret=` / `password=` 赋值。 **成熟度：强** — 基于格式的检测对于众所周知的密钥格式非常可靠。没有可识别前缀的简短、缺乏上下文的密钥无法被检测到。 ### `unsafe-output` — LLM05 包含应用程序可能错误渲染的标记的模型输出 — `