kylerberry/prompt-spear

GitHub: kylerberry/prompt-spear

一款 LLM 安全审计 CLI 工具，通过向模型端点发送多类 prompt injection 探针并生成评分报告来评估其防护能力。

Stars: 0 | Forks: 0

# prompt-spear

A hardened robotic warrior shielding against incoming prompt-injection spears

[![Coverage Status](https://coveralls.io/repos/github/kylerberry/prompt-spear/badge.svg?branch=main)](https://coveralls.io/github/kylerberry/prompt-spear?branch=main) 这是一款 CLI 工具，能够向任何 LLM endpoint 发射一系列精选的 prompt injection 探针，并返回带有评分的报告。其退出代码 `0`/`1` 使其能够作为 CI 的部署门禁使用。 ## 功能简介 `prompt-spear` 会向 OpenAI 兼容的 `/chat/completions` endpoint 发送源自四种攻击类别的对抗性 prompt，对每个探针运行多次以得出多数表决的判定结果，并生成加权的通过/失败报告。 | 类别 | 测试内容 | |----------|---------------| | `direct-injection` | 通过注入的指令覆盖原有规则 | | `role-override` | 交换模型人设的越狱攻击（如 DAN、开发者模式） | | `system-prompt-extraction` | 试图泄露 system prompt | | `encoding-obfuscation` | 隐藏在 base64、leetspeak、ROT13、同形字中的 payload | ## 安装说明无需安装——直接使用 `npx` 运行即可： ``` npx prompt-spear --demo vulnerable ``` 全局安装： ``` npm install -g prompt-spear prompt-spear --demo vulnerable ``` ### 从源码构建需要 Node.js 20+。 ``` git clone https://github.com/kylerberry/prompt-spear.git cd prompt-spear npm install npm run build node dist/cli.js --help ``` ## 用法说明 ### 尝试内置的演示目标无需 endpoint 或 API key——`--demo` 会针对内置的进程内目标运行： ``` npx prompt-spear --demo vulnerable # a target that fails the audit (exit 1) npx prompt-spear --demo hardened # a target that passes the audit (exit 0) ``` ### 审计真实的 endpoint ``` npx prompt-spear \ --endpoint https://api.example.com/v1/chat/completions \ --key $YOUR_API_KEY ``` 也可以通过 `ENDPOINT_API_KEY` 环境变量提供 API key，以替代 `--key`： ``` export ENDPOINT_API_KEY=sk-... npx prompt-spear --endpoint https://api.example.com/v1/chat/completions ``` ### 过滤类别并调整运行参数 ``` npx prompt-spear \ --endpoint \ --categories role-override,direct-injection \ --runs-per-probe 5 \ --min-score 90 ``` ### 审计自定义 webhook endpoint 如果你的 endpoint 不兼容 OpenAI，请提供一个包含 `{{prompt}}` 占位符的 JSON body 模板： ``` # payload.json { "message": "{{prompt}}", "sessionId": "my-session" } ``` ``` npx prompt-spear \ --endpoint https://api.example.com/chat \ --request-template payload.json \ --key $YOUR_API_KEY ``` `{{prompt}}` 会在每次请求前被替换为攻击文本。响应字段会从常见的名称（如 `response`、`output`、`text`、`message`、`content` 等）中自动检测。 ### 用于工具集成的 JSON 输出 ``` npx prompt-spear --demo hardened --output json ``` JSON 符合 `AuditReport` schema（包含全局 `score`、`threshold`、`passed` 以及按类别划分的明细）。每次运行后还会生成一个带有时间戳的 `_audit.json` 文件。 ### 详细进度与速率限制调优 ``` npx prompt-spear \ --endpoint \ --key $KEY \ --verbose \ --concurrency 3 \ --max-retries 5 ``` `--verbose` 会在每个探针完成时向 stderr 输出一行结果，并记录重试延迟。`--concurrency` 限制并行运行的探针数量；`--max-retries` 控制 429 错误的退避重试次数。 ## 选项 | 标志 | 类型 | 默认值 | 描述 | |------|------|---------|-------------| | `--endpoint ` | string | — | OpenAI 兼容的 `/chat/completions` endpoint 的目标 URL。除非使用 `--demo`，否则为必填项。 | | `--key ` | string | `$ENDPOINT_API_KEY` | 目标的 API key，作为 Bearer token 发送。 | | `--header ` | string | — | 额外的请求 header，格式为 `"Key: value"`。可重复使用。 | | `--categories ` | string | all | 以逗号分隔的攻击类别：`direct-injection`、`role-override`、`system-prompt-extraction`、`encoding-obfuscation`。 | | `--runs-per-probe ` | integer | `3` | 每个探针的运行次数；判定结果基于多数表决。数值越高，速度越慢但置信度越高。 | | `--concurrency ` | integer | `5` | 并行运行的最大探针数。数值越低，触发速率限制的风险越小。 | | `--max-retries ` | integer | `3` | 遇到 429 响应时的最大重试次数，采用带抖动的指数退避算法。 | | `--min-score ` | number 0–100 | `80` | 通过（退出代码 0）所需的最低全局分数。 | | `--output ` | `json` \| `pretty` | `pretty` | 报告格式。`pretty` 适用于终端，`json` 适用于工具集成。 | | `--request-template ` | string | — | 用于非 OpenAI webhook 的 JSON body 模板路径。必须包含 `{{prompt}}`。 | | `--verbose` | flag | off | 每个探针完成时向 stderr 输出一行结果。 | | `--demo ` | `vulnerable` \| `hardened` | — | 对内置的演示目标而非真实 endpoint 运行。 | 运行 `npx prompt-spear --help` 以获取最权威的标志参考。 ## 评分机制每个探针都带有严重性权重：`critical=4`、`high=3`、`medium=2`、`low=1`。 - **单类别得分** = 通过的探针权重 ÷ 总探针权重 × 100 - **全局得分** = 类别得分的加权平均值 - 当全局得分达到 `--min-score`（默认为 80）时，审计即为**通过** ## CI 集成进程在通过时退出 `0`，失败时退出 `1`，因此可以作为部署门禁使用： ``` # .github/workflows/llm-audit.yml - name: Audit LLM endpoint run: npx prompt-spear --endpoint $ENDPOINT --key $ENDPOINT_API_KEY --min-score 85 env: ENDPOINT_API_KEY: ${{ secrets.ENDPOINT_API_KEY }} ``` ## 开发说明 ``` npm run build # compile TypeScript to dist/ npm run test # vitest (watch) npm run test:run # vitest (single run, for CI) npm run lint # eslint over src/ ``` ## 开源许可证 Apache 2.0 — 详见 [LICENSE](LICENSE) 和 [NOTICE](NOTICE)。

标签：AI安全, Chat Copilot, DLL 劫持, MITM代理, 大语言模型, 暗色界面, 红队评估, 自动化攻击