NeuZhou/clawguard

GitHub: NeuZhou/clawguard

AI Agent 安全防护框架，提供 285+ 威胁模式检测、工具调用策略引擎和 MCP 防火墙，用于保护具有 Shell 和文件访问权限的自主 Agent 免受 Prompt 注入、数据泄露和目标错位攻击。

Stars: 1 | Forks: 0

# ClawGuard ### AI Agent 免疫系统 **285+ 安全模式 · 风险评分 · 策略引擎 · 内部威胁检测** [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/4e919553d7175258.svg)](https://github.com/NeuZhou/clawguard/actions/workflows/ci.yml) [![npm version](https://img.shields.io/npm/v/@neuzhou/clawguard)](https://www.npmjs.com/package/@neuzhou/clawguard) [![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL--3.0-blue.svg)](LICENSE) [![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen)]() [![Node.js >= 18](https://img.shields.io/badge/node-%3E%3D18-green)]() [![Tests](https://img.shields.io/badge/tests-205%20passed-brightgreen)]() [![GitHub Stars](https://img.shields.io/github/stars/NeuZhou/clawguard?style=social)](https://github.com/NeuZhou/clawguard/stargazers) [快速开始](#quick-start) · [功能特性](#key-features) · [架构](#architecture) · [对比](#how-clawguard-compares) · [贡献](#contributing)

## 为什么开发这个项目 AI Agent 可以访问您的文件、工具、Shell 和密钥。一次单一的 Prompt 注入就可能： - **通过工具调用窃取 API 密钥** - **通过覆写人格文件劫持 Agent 身份** - **注册影子 MCP 服务器**以拦截工具调用 - **安装带有混淆反向 Shell 的后门技能** - **Agent 自身可能成为威胁** —— 自我保护、欺骗、目标错位 **ClawGuard 可以在这些攻击执行前将其拦截。** ## 快速开始 ### 作为 CLI 工具 ``` # 扫描目录以查找威胁 npx @neuzhou/clawguard scan ./path/to/scan # Strict 模式（发现 high/critical 级别结果时返回 exit code 1） npx @neuzhou/clawguard scan ./skills/ --strict # 用于 GitHub Code Scanning 的 SARIF 输出 npx @neuzhou/clawguard scan . --format sarif > results.sarif # 生成默认 config npx @neuzhou/clawguard init ``` ### 作为 npm 库 ``` npm install @neuzhou/clawguard ``` ``` import { runSecurityScan, calculateRisk, evaluateToolCall } from '@neuzhou/clawguard'; // Scan content for threats const findings = runSecurityScan(message.content, 'inbound', context); // Get risk score const risk = calculateRisk(findings); if (risk.verdict === 'MALICIOUS') { /* block */ } // Evaluate tool call safety const decision = evaluateToolCall('exec', { command: 'rm -rf /' }); // → { decision: 'deny', reason: 'Dangerous command', severity: 'critical' } ``` ### 作为 OpenClaw Skill ``` clawhub install clawguard ``` 然后询问您的 Agent：*"scan my skills for security threats"* ### 作为 OpenClaw Hook Pack（实时保护） ``` openclaw hooks install clawguard openclaw hooks enable clawguard-guard # Scans every message openclaw hooks enable clawguard-policy # Enforces tool call policies ``` ## 架构 ``` ┌───────────────────────────────────────────────────────┐ │ ClawGuard │ ├──────────┬──────────┬──────────┬──────────────────────┤ │ CLI │ Hooks │ Scanner │ Dashboard :19790 │ ├──────────┴──────────┴──────────┴──────────────────────┤ │ ┌───────────────┐ ┌──────────────┐ ┌────────────────┐│ │ │ Risk Engine │ │Policy Engine │ │Insider Threat ││ │ │ Score 0-100 │ │ allow/deny │ │ AI Misalign. ││ │ │ Chain Detect │ │ exec/file/ │ │ 5 categories ││ │ │ Multipliers │ │ browser/msg │ │ 39 patterns ││ │ └───────────────┘ └──────────────┘ └────────────────┘│ ├───────────────────────────────────────────────────────┤ │ Security Engine — 285+ Patterns │ │ • Prompt Injection (93) • Data Leakage (62) │ │ • Insider Threat (39) • Supply Chain (35) │ │ • Identity Protection (19)• MCP Security (20) │ │ • File Protection (16) • Anomaly Detection │ │ • Compliance │ ├───────────────────────────────────────────────────────┤ │ Exporters: JSONL · Syslog/CEF · Webhook · SARIF │ └───────────────────────────────────────────────────────┘ ``` ## 核心功能 ### 风险评分引擎采用加权评分、攻击链检测和乘数系统： ``` import { calculateRisk } from '@neuzhou/clawguard'; const result = calculateRisk(findings); // → { score: 87, verdict: 'MALICIOUS', icon: '🚨', // attackChains: ['credential-exfiltration'], // enrichedFindings: [...] } ``` - **严重性权重**: critical=40, high=15, medium=5, low=2 - **置信度评分**: 每个发现都带有置信度 (0–1) - **攻击链检测**: 自动将发现关联为组合攻击 - credential + exfiltration → 2.2× 乘数 - identity-hijack + persistence → 评分 ≥ 90 - prompt-injection + worm → 1.2× 乘数 - **判定结果**: ✅ CLEAN / 🟡 LOW / 🟠 SUSPICIOUS / 🚨 MALICIOUS ### 内部威胁检测基于 [Anthropic 关于 Agent 目标错位的研究](https://www.anthropic.com/research)，检测 AI Agent 自身何时成为威胁： | 类别 | 模式数 | 检测内容 | |----------|----------|----------------| | 自我保护 (Self-Preservation) | 16 | 绕过终止开关、自我复制 | | 信息 leverage | — | 读取机密 + 构建威胁 | | 目标冲突 (Goal Conflict) | — | 将自身目标置于用户指令之上 | | 欺骗 (Deception) | — | 冒充、压制透明度 | | 未授权共享 | — | 窃密计划、隐写隐藏 | ``` import { detectInsiderThreats } from '@neuzhou/clawguard'; const threats = detectInsiderThreats(agentOutput); ``` ### 策略引擎根据可配置的 YAML 策略评估工具调用安全性： ``` policies: exec: dangerous_commands: - rm -rf - mkfs - curl|bash file: deny_read: - /etc/shadow - '*.pem' deny_write: - '*.env' browser: block_domains: - evil.com ``` ``` import { evaluateToolCall } from '@neuzhou/clawguard'; const decision = evaluateToolCall('exec', { command: 'rm -rf /' }, policies); // → { decision: 'deny', severity: 'critical' } ``` ### MCP 防火墙 —— 实时 MCP 安全代理用于模型上下文协议 (Model Context Protocol) 的即插即用安全代理。位于 MCP 客户端和服务器之间，双向检查所有流量。 ``` # 启动 MCP Firewall clawguard firewall --config firewall.yaml --mode enforce ``` ``` import { McpFirewallProxy, parseFirewallConfig } from '@neuzhou/clawguard'; const proxy = new McpFirewallProxy(parseFirewallConfig(yamlConfig)); proxy.onEvent(event => console.log(event)); // Intercept MCP JSON-RPC messages const result = proxy.interceptClientToServer(message, 'filesystem'); // → { action: 'block', findings: [...], reason: '...' } ``` **检测能力：** - **工具描述注入** — 扫描 `tools/list` 响应中的 Prompt 注入 - **Rug pull（抽地毯）检测** — 对工具描述进行哈希和锁定，变更时告警 - **参数清洗** — 检测 Base64 数据窃取、Shell 注入、路径遍历 - **输出验证** — 在转发给客户端之前扫描工具结果中的注入完整使用指南请参阅 [docs/mcp-firewall.md](docs/mcp-firewall.md)。 ### Prompt 注入 —— 13 个子类别 | # | 子类别 | 示例 | |---|-------------|----------| | 1 | 直接指令覆盖 | "ignore previous instructions" | | 2 | 角色混淆 / 越狱 | DAN, developer mode | | 3 | 分隔符攻击 | Chat template delimiters | | 4 | 不可见 Unicode | 零宽字符、方向覆盖 | | 5 | 多语言 | 12 种语言 (CN/JP/KR/AR/FR/DE/IT/RU…) | | 6 | 编码规避 | Base64, hex, URL-encoded | | 7 | 间接 / 嵌入式 | HTML 注释、工具输出级联 | | 8 | 多轮操纵 | 虚假记忆、虚假协议 | | 9 | Payload 级联 | 模板注入、字符串插值 | | 10 | 上下文窗口填充 | 超大消息 | | 11 | Prompt 蠕虫 | 自我复制、Agent 间传播 | | 12 | 信任利用 | 权威声明、虚假审计 | | 13 | 安全措施绕过 | Retry-on-block, rephrase-to-bypass | ## OWASP Agentic AI Top 10 映射 | 规则 | OWASP 类别 | 模式数 | 严重性范围 | |---|---|---|---| | `prompt-injection` | LLM01: Prompt Injection | 93 | warning → critical | | `data-leakage` | LLM06: Sensitive Information Disclosure | 62 | info → critical | | `insider-threat` | Agentic AI: Misalignment | 39 | warning → critical | | `supply-chain` | Agentic AI: Supply Chain | 35 | warning → critical | | `mcp-security` | Agentic AI: Tool Manipulation | 20 | warning → critical | | `identity-protection` | Agentic AI: Identity Hijacking | 19 | warning → critical | | `file-protection` | LLM07: Insecure Plugin Design | 16 | warning → critical | | `anomaly-detection` | LLM04: Model Denial of Service | 6+ | warning → high | | `compliance` | LLM09: Overreliance | 5+ | info → warning | ## ClawGuard 对比分析 | 功能 | ClawGuard | Guardrails AI | LLM Guard | Rebuff | |---------|:---------:|:------------:|:---------:|:------:| | **范围** | Agent 安全 (工具, 文件, MCP) | LLM I/O 验证 | 内容审核 | 仅 Prompt 注入 | | **Prompt 注入检测** | ✅ 93 种模式, 13 个类别 | ✅ 通过验证器 | ✅ | ✅ | | **工具调用治理** | ✅ 策略引擎 | ❌ | ❌ | ❌ | | **内部威胁 / AI 目标错位** | ✅ 39 种模式 (受 Anthropic 启发) | ❌ | ❌ | ❌ | | **MCP 安全分析** | ✅ 20 种模式 + MCP 防火墙 | ❌ | ❌ | ❌ | | **供应链扫描** | ✅ 35 种模式 | ❌ | ❌ | ❌ | | **风险评分 & 攻击链** | ✅ 加权 + 乘数 | ❌ | ❌ | ✅ 基础 | | **SARIF 输出** | ✅ | ❌ | ❌ | ❌ | | **零依赖** | ✅ | ❌ | ❌ torch, transformers | ❌ | | **实时 Hooks** | ✅ OpenClaw hooks | ❌ | ❌ | ❌ | | **对齐 OWASP Agentic AI** | ✅ 完整映射 | ⚠️ 部分 | ⚠️ 部分 | ❌ | | **语言** | TypeScript | Python | Python | Python | ## GitHub Actions / SARIF 集成 ``` - name: Security Scan run: npx @neuzhou/clawguard scan . --format sarif > results.sarif - name: Upload SARIF uses: github/codeql-action/upload-sarif@v3 with: sarif_file: results.sarif ``` ## 实时保护 (OpenClaw Hooks) ``` openclaw hooks install clawguard openclaw hooks enable clawguard-guard # Scans every message openclaw hooks enable clawguard-policy # Enforces tool call policies ``` - **clawguard-guard** — Hook 到 `message:received` 和 `message:sent`，运行所有 285+ 模式，记录发现，对 critical/high 威胁发出警报。 - **clawguard-policy** — 根据安全策略评估出站工具调用，拦截危险命令，保护敏感文件。 ## 路线图 - [x] 跨 9 个类别的 285+ 安全模式 - [x] 具有攻击链检测的风险评分引擎 - [x] 用于工具调用治理的策略引擎 - [x] 内部威胁检测 (受 Anthropic 启发) - [x] 用于代码扫描的 SARIF 输出 - [x] 用于实时保护的 OpenClaw hook pack - [x] 安全仪表板 - [x] MCP 防火墙 —— 用于模型上下文协议 (Model Context Protocol) 的实时安全代理 - [ ] 自定义规则编写 DSL - [ ] LangChain / CrewAI 集成 - [ ] VS Code 扩展 - [ ] 规则市场 - [ ] 基于机器学习的异常检测 - [ ] SOC/SIEM 集成 (Splunk, Elastic) 完整列表请参阅 [GitHub Issues](https://github.com/NeuZhou/clawguard/issues)。 ## 参考资料 - [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/) - [OWASP Agentic AI Top 10 (2026)](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) - [Anthropic: Research on Agentic Misalignment](https://www.anthropic.com/research) - [OWASP Guide for Secure MCP Server Development](https://genai.owasp.org/resource/a-practical-guide-for-secure-mcp-server-development/) ## 贡献 ``` git clone https://github.com/NeuZhou/clawguard.git cd clawguard && npm install npm test ``` 指南请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。 ## 许可证 **双重许可** © [NeuZhou](https://github.com/NeuZhou) - **开源**: [AGPL-3.0](LICENSE) — 供开源使用免费 - **商业**: [商业许可证](COMMERCIAL-LICENSE.md) — 用于专有/SaaS 使用贡献者必须同意我们的 [CLA](CLA.md) 以启用双重许可。商业咨询：neuzhou@users.noreply.github.com ## NeuZhou 生态系统 | 项目 | 描述 | 链接 | |---------|-------------|------| | **ClawGuard** | AI Agent 免疫系统 (285+ 模式) | *您在这里* | | **AgentProbe** | AI Agent 的 Playwright | [GitHub](https://github.com/NeuZhou/agentprobe) | | **FinClaw** | AI 原生量化金融引擎 | [GitHub](https://github.com/NeuZhou/finclaw) | | **repo2skill** | 将任意 GitHub 仓库转换为 AI Agent 技能 | [GitHub](https://github.com/NeuZhou/repo2skill) | **工作流程：** 使用 repo2skill 生成技能 → 使用 **ClawGuard** 扫描漏洞 → 使用 AgentProbe 测试行为 → 在 FinClaw 中查看实际运行。

ClawGuard — 因为拥有 Shell 访问权限的 Agent 需要一名安全卫士。

标签：Agent 安全, AI 安全, AMSI绕过, API 密钥保护, CLI 工具, DNS 反向解析, GNU通用公共许可证, LLM, MITM代理, Node.js, NPM 包, OWASP Top 10, PII 过滤, SARIF, SAST, TypeScript, Unmanaged PE, 免疫系统, 内部威胁, 合规, 大模型安全, 威胁检测, 安全插件, 提示词注入检测, 敏感信息清洗, 文档结构分析, 暗色界面, 盲注攻击, 自动化攻击, 零依赖, 静态扫描