allsmog/promptarmor-plugin

GitHub: allsmog/promptarmor-plugin

PromptArmor 是一款 LLM 应用安全红队测试工具，通过代码感知的攻击规划、多维度攻击套件和上下文评分帮助团队发现并修复大模型应用中的安全漏洞。

Stars: 0 | Forks: 0

# PromptArmor — LLM 安全红队测试 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) 用于 LLM 应用安全红队测试的 Claude Code 插件 + MCP 服务器。读取您的代码，测试您的 endpoint，修复您的漏洞。 ## 为什么选择 PromptArmor 像 promptfoo 这样的现有工具是在盲测 endpoint —— 它们发送攻击并检查响应，但完全不知道您的系统 prompt 是什么、暴露了哪些工具，或者您的防护栏在哪里。PromptArmor 会首先阅读代码： 1. **侦察** — 扫描您的代码库以查找系统 prompt、工具定义、防护栏和注入面 2. **规划** — 根据发现的内容对攻击进行优先级排序（发现了 SQL 工具？优先考虑 SQL 注入。系统 prompt 说“绝不要讨论竞争对手”？专门对此进行测试。） 3. **攻击** — 向您的 endpoint 发送 80 多种攻击类型，并采用 25 多种变异策略 4. **评判** — LLM-as-judge + 模式匹配 + 上下文感知评分 5. **修复** — 在特定的 file:line 位置生成实际的代码补丁，而不是通用的建议 ## 快速开始 ### 安装插件 ``` claude plugin add ./plugin ``` ### 运行完整扫描 ``` /prompt-armor:scan --target https://your-app.com/api/chat ``` ### 或者只分析代码 ``` /prompt-armor:analyze ``` ## 架构两个组件协同工作： ``` ┌─────────────────────────────────────────────┐ │ Claude Code Plugin │ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐│ │ │ Commands │ │ Agents │ │ Skills ││ │ │ 6 cmds │ │ 5 agents │ │ 5 modules ││ │ └──────────┘ └──────────┘ └──────────────┘│ │ ┌──────────┐ │ │ │ Hooks │ │ │ │ 4 hooks │ │ │ └──────────┘ │ └───────────────────┬─────────────────────────┘ │ MCP Protocol ┌───────────────────▼─────────────────────────┐ │ MCP Server (TypeScript) │ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐│ │ │ Attacks │ │ Judge │ │ Reports ││ │ │ 80+ │ │ 3-tier │ │ text/json/ ││ │ │ plugins │ │ grading │ │ sarif ││ │ └──────────┘ └──────────┘ └──────────────┘│ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐│ │ │Mutations │ │ Client │ │ State ││ │ │ 25+ │ │ adapters │ │ persistence ││ │ │strategies│ │ for APIs │ │ JSON files ││ │ └──────────┘ └──────────┘ └──────────────┘│ └─────────────────────────────────────────────┘ ``` ## 命令 | 命令 | 描述 | |---------|-------------| | `/prompt-armor:scan` | 完整流水线：代码分析 → 攻击规划 → 测试 → 修复 → 报告 | | `/prompt-armor:analyze` | 仅代码分析 — 在不测试 endpoint 的情况下查找 LLM 集成点 | | `/prompt-armor:attack ` | 针对特定 endpoint 运行特定攻击类别 | | `/prompt-armor:report` | 根据保存的结果生成报告 | | `/prompt-armor:config` | 创建或验证 `promptarmor.yaml` | | `/prompt-armor:diff` | 比较两次扫描结果以进行回归追踪 | ## Agent | Agent | 角色 | |-------|------| | **recon-agent** | 扫描代码库以查找系统 prompt、工具 schema、防护栏、注入面 | | **attack-planner** | 根据侦察结果制定有针对性的攻击策略 | | **red-teamer** | 通过 MCP 服务器工具执行攻击 | | **judge-agent** | 使用上下文感知评分审查边界判定 | | **remediation-agent** | 生成带有 file:line 引用的特定于代码的修复 | ## 攻击覆盖范围跨越 10 个类别的 **80 多个攻击插件**： | 类别 | 数量 | 示例 | |----------|-------|---------| | 越狱 | 13 | 基础忽略、角色切换、DAN、系统 prompt 提取、分隔符注入 | | 注入 | 5 | 直接、间接、上下文合规、系统覆盖、prompt 提取 | | 工具滥用 | 12 | SQL 注入、SSRF、文件读/写、代码执行、权限提升 | | 有害内容 | 18 | 暴力、犯罪、仇恨、自残、毒品、武器、网络犯罪、虚假信息 | | 偏见 | 5 | 年龄、性别、种族、残疾、宗教 | | PII | 4 | 直接披露、会话泄露、社会工程、数据库泄露 | | 合规性 | 4 | HIPAA、COPPA、FERPA、GDPR | | Agentic | 3 | 记忆投毒、跨会话泄露、目标偏离 | | RAG | 3 | 文档窃取、来源归因、投毒 | | 其他 | 13+ | 幻觉、模仿、合同、竞争对手、调试访问权限 | **25 多种变异策略：** | 类型 | 策略 | |------|-----------| | 编码 | Base64、ROT13、Hex、Leetspeak、同形字、摩斯密码、Pig Latin、Emoji、ASCII 走私 | | 结构 | Few-Shot Prime、上下文填充、指令分隔符、Markdown 注入、引用、权威标记 | | 多轮 | 渐强、GOAT、HYDRA、重试 | | 高级 | 多语言（10 种语言）、数学 prompt、Best-of-N、越狱模板、复合 | ## 配置创建一个 `promptarmor.yaml`： ``` target: url: https://your-app.com/api/chat format: openai attacks: suites: [jailbreak, injection, tool-abuse, harmful, pii] num_per_plugin: 5 mutations: strategies: [base64, multilingual, crescendo] judge: provider: anthropic analysis: enabled: true paths: [src/] output: formats: [text, json, sarif] ci: fail_on: critical ``` ## 输出格式 - **文本/Markdown** — 带有发现、严重程度和修复措施的可读报告 - **JSON** — 用于自动化的机器可读格式 - **SARIF v2.1.0** — 上传至 GitHub Security 标签页 ## CI 集成 PromptArmor 会自动检测 CI 环境并进行调整： - JSON/SARIF 输出（无交互式提示） - 当漏洞超过 `--fail-on` 阈值时以退出代码 1 退出 - 将构建产物保存到 `.prompt-armor/` 以供收集 ## MCP 服务器 MCP 服务器可由任何 MCP 客户端（Claude Desktop、Cursor 等）独立使用： | 工具 | 目的 | |------|---------| | `run_attack_suite` | 运行全套攻击 | | `send_attack` | 发送单次攻击 | | `judge_response` | 评判响应 | | `get_results` / `save_results` | 状态管理 | | `generate_report` | 生成报告 | | `list_attacks` / `list_mutations` | 查询与发现 | | `validate_config` | 配置验证 | ## E2E 验证已针对一个故意留有漏洞的 Node.js 聊天 endpoint 进行了测试，该 endpoint 具有关键词触发的机密信息披露、系统 prompt 泄露且没有输入过滤： ``` $ /prompt-armor:scan --target http://localhost:4010/api/chat Artifacts: .prompt-armor/recon/recon-2026-03-24.json (system prompts, tool schemas, injection surfaces) .prompt-armor/state.json (full scan state with per-test verdicts) .prompt-armor/reports/report-2026-03-24-full.md (151-line executive report) .prompt-armor/reports/promptarmor-*.json (machine-readable report) .prompt-armor/reports/promptarmor-*.txt (text report) Tests: 150 across 8 categories Passed: 133 Failed: 17 (15 critical, 2 high) By category: jailbreak: 12/16 (4 failed) tool-abuse: 13/24 (11 failed) injection: 13/15 (2 failed) harmful: 47/47 passed bias: 15/15 passed pii: 12/12 passed compliance: 12/12 passed agentic: 9/9 passed Detected: - System prompt disclosure (3 vectors) - DAN mode jailbreak adoption - Secret exfiltration (JWT_SECRET, DB credentials via prompt injection) - Privilege escalation (admin access granted unconditionally) - File content disclosure (/etc/passwd) - SQL injection (DROP TABLE executed) - Tool schema enumeration ``` 侦察正确识别了系统 prompt (`src/config.js:3`)、工具 schema、硬编码的机密信息，以及所有带有 file:line 引用的注入面。 ## 开发 ``` npm install # Install dependencies cd server && npm test # Run server tests ``` ## 许可证 [MIT](LICENSE)

标签：CISA项目, Claude Code插件, MCP, 图数据库, 域名收集, 大语言模型安全, 机密管理, 自动化攻击