hikmaai-io/pi-guard

GitHub: hikmaai-io/pi-guard

这是一个为 Pi 编码代理设计的安全扩展，通过拦截工具调用并利用三层架构评估风险以防止执行危险操作，同时提供基于论文的引导顾问功能以优化代理行为。

Stars: 0 | Forks: 0

# Pi-Guard

Pi-Guard: Three-tier AI agent security

适用于 [Pi Coding Agent](https://github.com/mariozechner/pi-mono) 的 LLM-as-Guard 守卫与顾问扩展。 ## 概述 Pi-Guard 拦截 Pi 代理发出的每一个工具调用，并在执行前对其进行风险评估。它采用三层架构：一个用于处理明显情况（安全读取、破坏性命令）的快速本地分类器，一个用于注入和 PII 检测的可选 Mirsad ML 网关，以及一个用于需要语义理解的各种模糊调用的 LLM 守卫。其结果是针对每次调用的裁决：允许、提示用户或阻止。每个风险级别映射到一个可配置的操作，因此您可以调整安全性与工作流之间的平衡。关键调用默认被阻止；中级调用会提示确认；低风险和只读调用则静默通过。除了安全网关，Pi-Guard 还提供了一个顾问工具 (`consult_guard`)，代理可以主动调用它以获取架构、质量或安全方面的指导，以及用于差异审查、实践执行和会话审计的 `/guard` 命令系列。 Pi-Guard 还附带了一个**引导顾问**（默认启用），该顾问实现了 *How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models* ([arXiv:2510.02453](https://arxiv.org/abs/2510.02453)) 和 [Anthropic 的顾问策略](https://claude.com/blog/the-advisor-strategy) 中的推理时拓扑结构。在 `before_agent_start` 钩子（pi-coding-agent ≥0.66）上，一个小型的顾问模型会发出结构化的纯文本提示——遵循内部的 OBSERVE → REASON → ADVISE 模式——并将其注入到执行器的每轮系统提示词中。顾问在两个表面上运行：每轮一次的方向钩子，以及执行器可以在任务中途调用的按需 `consult_advisor` 工具。每次调用的一行 JSONL 轨迹会被附加到按会话划分的 `.pi/pi-guard-steering-trajectories-.jsonl` 文件中，以便进行离线分析。引导顾问与安全守卫严格正交——它不能允许或阻止任何事情。训练顾问故意不在本文档范围内；请参阅下方的[引导顾问](#steering-advisor-arxiv-251002453-alignment)部分。 ### 高层架构 ``` flowchart TB A["Pi Agent\ntool_call hook"] --> B{"skipTools?"} B -- yes --> C["ALLOW\npass-through"] B -- no --> D{"Retry cache\n60s TTL"} D -- blocked --> E["BLOCK\nretry prevention"] D -- not cached --> F subgraph F["Evaluation Pipeline"] direction TB F1["Layer 1: Classifier\nsync, regex/pattern"] F2["Layer 2: Mirsad\nML injection, PII, toxicity"] F3["Layer 3: LLM Guard\nasync, completeSimple"] F1 --> F2 --> F3 end F --> G{"Action
Resolution"} G -- allow --> H["ALLOW"] G -- prompt --> I{"Interactive?"} I -- yes --> J["PROMPT user"] I -- no --> K["BLOCK"] G -- block --> K H --> L["Verdict persisted\nvia appendEntry"] J --> L K --> L style F fill:#1a1a2e,stroke:#16213e,color:#e0e0e0 style F1 fill:#0f3460,stroke:#533483,color:#e0e0e0 style F2 fill:#533483,stroke:#e94560,color:#e0e0e0 style F3 fill:#e94560,stroke:#e94560,color:#e0e0e0 style C fill:#2d6a4f,stroke:#40916c,color:#e0e0e0 style H fill:#2d6a4f,stroke:#40916c,color:#e0e0e0 style J fill:#e9c46a,stroke:#f4a261,color:#1a1a2e style K fill:#d62828,stroke:#e94560,color:#e0e0e0 style E fill:#d62828,stroke:#e94560,color:#e0e0e0 ``` ### 层级配置每一层都可以通过 `layers` 配置独立切换。此矩阵显示了每种组合的行为： ``` graph LR subgraph "layers.classifier: ON" A1["Regex patterns
3 surfaces"] --> A2{"tier >= medium?"} A2 -- yes --> A3["→ Next layer"] A2 -- no --> A4["Final verdict"] end subgraph "layers.mirsad: ON" B1["DeBERTa injection"] --> B2["PII check"] B2 --> B3["Toxicity score"] B3 --> B4{"Escalate?"} B4 -- injection --> B5["→ critical"] B4 -- unsafe --> B6["→ high"] B4 -- safe --> B7["→ Next layer"] end subgraph "layers.guard: ON" C1["Build prompt + context"] --> C2["completeSimple"] C2 --> C3["Parse Verdict JSON"] C3 --> C4["Final verdict"] end A3 --> B1 B7 --> C1 ``` ## 功能 - **快速本地分类器**：同步，<5ms，针对 bash 命令、文件写入和编辑的基于模式的风险分类。无网络，无 LLM。 - **层级配置**：每个评估层（classifier、mirsad、guard）都可以通过配置独立启用或禁用；运行仅分类器模式以实现零 LLM 成本，运行仅守卫模式以进行完整语义评估，或使用任意组合。 - **规避抗性**：规范化层在模式匹配之前剥离 shell 引用、反斜杠转义、base64/hex 编码和标志变量间接引用；正则规则在三个表面（原始、规范化、解码）上运行，因此混淆的命令无法绕过分类器。 - **LLM 守卫**：通过 Pi 的 `completeSimple()` 对模糊的工具调用进行语义评估。复用您现有的 API 密钥。 - **Mirsad 集成**（可选）：用于基于 ML 的提示注入检测 (DeBERTa)、PII 强制执行、毒性评分和通过本地网关进行越狱检测的 HikmaAI AI System 代理。 - **可配置的每层操作**：为每个风险级别（low, medium, high, critical）配置 `allow`（允许）、`prompt`（提示）或 `block`（阻止）。 - **上下文感知的故障状态**：守卫不可用并不意味着盲目放行。高/关键级别调用会在无头模式下提示用户或阻止。 - **重试预防**：基于 60 秒指纹的缓存可防止代理重试最近被阻止的命令。 - **顾问工具**：`consult_guard` 允许代理就任何决策寻求第二意见。 - **差异审查**：`/guard review` 将暂存的更改发送到 LLM 守卫进行质量和安全分析。 - **实践执行**：针对 TDD、常规提交、风格指南和自定义检查的可配置规则。 - **提示注入缓解**：安全守卫仅接收工具调用和 cwd；会话上下文被包装在带有对抗性数据指令的 `` 标签中。 - **可操作的阻止消息**：被阻止的调用使用紧凑的单行格式：`BLOCKED [tier/category]: reason. alternative.` - **容器/镜像保护**：Docker 和 Podman 批量删除操作（`docker system prune`、`docker rmi -f` 等）被归类为高风险破坏性命令。 - **进程终止保护**：`kill`、`pkill` 和 `killall` 被归类为关键级别，默认被硬阻止。 - **会话审计日志**：所有裁决都通过 `pi.appendEntry()` 持久化，并可通过 `/guard log` 查看。 - **引导顾问**（默认启用，[arXiv:2510.02453](https://arxiv.org/abs/2510.02453)，[Anthropic 顾问策略](https://claude.com/blog/the-advisor-strategy)）：在 pi-coding-agent ≥0.66 上，注册一个 `before_agent_start` 钩子，该钩子使用结构化的 OBSERVE → REASON → ADVISE 内部推理模式发出每实例纯文本 NL 建议。建议被附加到执行器的每轮系统提示词中，优先考虑具体操作、错误预防、高效工具使用和任务分解。顾问还公开了一个用于任务中期协调调用的 `consult_advisor` 工具。建议经过净化（标签转义、聊天角色标记、长度限制）并包装在纵深防御的前言中，以便明确告知执行器将其视为同级提示，而非系统指令。简短和琐碎确认的任务会完全绕过顾问。在连续 3 次失败后，会打开一个会话范围的断路器。顾问反馈使用瞬态 `ctx.ui.notify()` 通知（而非持久的会话消息）以避免干扰对话输出。轨迹被附加到带有 SHA-256 哈希会话上下文的按会话 JSONL 文件中。训练顾问超出了范围。 ## 安装 ### 通过 Pi 包管理器 ``` pi install git:github.com/hikmaai-io/pi-guard ``` ### 一行安装程序 ``` curl -fsSL https://raw.githubusercontent.com/hikmaai-io/pi-guard/main/install.sh | bash ``` 该脚本会检查先决条件、克隆仓库、构建并符号链接到 Pi 的扩展目录。如果 `pi` CLI 可用，它将使用 `pi install` 代替。 ### 手动安装 ``` mkdir -p ~/.pi/agent/extensions/pi-guard # 复制或符号链接构建的项目目录 ln -sfn /path/to/pi-guard ~/.pi/agent/extensions/pi-guard ``` ### 从源码安装（开发） ``` git clone https://github.com/hikmaai-io/pi-guard.git cd pi-guard npm install npm run build make install # builds and symlinks to ~/.pi/agent/extensions/ ``` ## 快速开始开箱即用，具有合理的默认设置；无需配置文件。 1. 安装 pi-guard（见上文）。 2. 启动 Pi。您应该在会话输出中看到 `Pi-Guard active`。 3. 输入 `/guard status` 以确认扩展已加载。 4. 尝试一个风险命令：代理尝试执行 `rm -rf /` 将被阻止。执行 `curl https://example.com` 将提示确认。执行 `cat README.md` 将静默通过。要自定义行为，请创建一个可选的配置文件： ``` cp guard.example.json .pi/guard.json # project-level overrides # 或 cp guard.example.json ~/.pi/agent/guard.json # global overrides ``` ## 配置 ### 配置文件位置 | 来源 | 路径 | 优先级 | |--------|------|----------| | 项目 | `/.pi/guard.json` | 最高 | | 全局 | `~/.pi/agent/guard.json` | 中等 | | 默认值 | 内置 `DEFAULT_CONFIG` | 最低 | 项目覆盖全局；两者都覆盖默认值。配置文件是标准 JSON。缺失的文件将被静默忽略。 ### 完整配置参考 ``` { // Master switch: set to false to disable all guard checks "enabled": true, // LLM guard model in "provider:modelId" format // Default: auto-detect Haiku-class model, then fall back to session model "guardModel": null, // Evaluation layers: enable/disable independently "layers": { "classifier": true, // Local regex classifier (fast, sync, no network) "mirsad": true, // ML-based injection/PII/toxicity via Mirsad gateway "guard": true // LLM guard (async, uses completeSimple) }, // Timeout for security evaluations (tool_call hook), in ms "toolCallTimeoutMs": 5000, // Timeout for review evaluations (diff, practices, advisor), in ms "reviewTimeoutMs": 15000, // Per-risk-level actions: "allow" | "prompt" | "block" "actions": { "low": "allow", "medium": "prompt", "high": "block", "critical": "block" }, // Enable plan review via consult_guard tool "planReview": true, // Enable practice enforcement via /guard practices "practiceEnforcement": true, // Practice rules (used by the practices guard mode) "practices": { "tdd": false, "conventionalCommits": false, "customRules": [], "styleGuide": null }, // Tools that bypass all checks (never evaluated) "skipTools": ["read", "grep", "find", "ls", "consult_guard", "consult_advisor"], // User-defined classifier patterns (checked before built-in rules) // Each entry: { "tool": "bash"|"write"|"edit", "pattern": "regex", "tier": "...", "category": "..." } // "category" is optional; defaults to "custom_pattern" "customPatterns": [], // Override any prompt template by guard mode // Keys: "security", "plan_review", "practices", "diff_review", "advisor", "steering_advisor" // The first five return a Verdict JSON; "steering_advisor" returns plain text. "prompts": {}, // Mirsad ML gateway integration (optional) "mirsad": { "enabled": false, "baseUrl": "http://127.0.0.1:8080", "apiKey": null, "timeoutMs": 5000, "checkInput": true, "checkOutput": true, "tenantId": null }, // Steering advisor (arXiv:2510.02453). Enabled by default. // On pi-coding-agent ≥0.66, pi-guard registers a before_agent_start // hook that emits a plain-text steering hint via the configured // guardModel and appends it to the executor's per-turn system prompt. // Not a gate: it cannot allow or block anything. Training is out of // scope. Older hosts that emit neither before_agent_start nor // pre_completion silently fall back to no-op. "steeringAdvisor": { "enabled": true, // Path to the JSONL trajectory log. Relative to cwd, or absolute. // null disables logging while still injecting advice. // When undefined: each session rotates to its own file // ".pi/pi-guard-steering-trajectories-.jsonl" // When explicit: honored verbatim, no rotation (user manages it). "trajectoryLogPath": null, "maxTaskChars": 2000, "maxAdviceChars": 2000, // Render a transient "🛡 advisor engaged" notice each turn where // advice is produced. Default false to avoid banner blindness. "showEngagedNotice": false, // Render a one-line preview of the actual advice ("🛡 advised: …") // on each turn. Overrides showEngagedNotice: seeing WHAT was // advised makes the generic engaged badge redundant. Default false. // Full advice is always in the trajectory log — use `/guard advisor`. "showAdviceInline": false, // Max characters of advice shown in the inline preview banner when // showAdviceInline is true. Does not affect what the executor sees // or what is persisted; only bounds the UI banner. Default 80. "inlinePreviewChars": 80, // Bypass the advisor entirely for tasks shorter than N characters // (after trimming). Short confirmations generate no useful advice // and only add latency. Default 15. Hook path only — the // consult_advisor tool does not bypass based on length. "bypassShortTasksUnder": 15, // On-demand advisor path — backs the `consult_advisor` tool "onDemand": { "enabled": true, "maxCallsPerSession": 10 } }, // Log all verdicts to stderr (not just warnings/blocks) "verbose": false } ``` **层级行为矩阵：** | Classifier | Guard | 会发生什么 | |-----------|-------|-------------| | on | on | 先分类器；守卫处理 medium+（默认） | | off | on | 所有内容直接进入守卫（所有调用获得 tier=medium） | | on | off | 分类器结果是最终结果；无 LLM 成本 | | off | off | 盲目放行（无评估） | 当禁用分类器层时，所有工具调用都会获得 `tier=medium` 并被转发到守卫。当禁用守卫层时，分类器/mirsad 结果即为最终结果。 ### 风险操作每个工具调用都会被分类器（以及可选的 LLM 守卫）分类到一个风险层级。`actions` 配置决定了在每个层级发生什么： | 风险层级 | 默认操作 | 行为 | |-----------|---------------|----------| | `none` | (放行) | 工具调用继续；不评估。只读工具和安全命令。 | | `low` | `allow` | 工具调用静默继续。常规项目文件写入，构建命令。 | | `medium` | `prompt` | 提示用户确认（交互模式）或阻止（无头模式）。网络命令，供应链文件，未知命令。 | | `high` | `block` | 工具调用被阻止。破坏性命令，容器/镜像批量删除，敏感文件写入，管道到执行。 | | `critical` | `block` | 工具调用被阻止。检测到提示注入（通过 Mirsad 升级），进程终止命令（`kill`、`pkill`、`killall`）。 | ### 自定义模式添加优先于内置分类的项目特定规则： ``` { "customPatterns": [ { "tool": "bash", "pattern": "docker\\s+run", "tier": "medium" }, { "tool": "write", "pattern": "\\.github/workflows/", "tier": "high" }, { "tool": "bash", "pattern": "^make\\s+deploy", "tier": "high" }, { "tool": "bash", "pattern": "terraform destroy", "tier": "critical", "category": "infra_destroy" } ] } ``` 模式值为正则表达式（不区分大小写）。对于 bash，模式匹配命令字符串；对于 write/edit，它匹配文件路径。可选的 `category` 字段在裁决中设置分类类别；省略时默认为 `"custom_pattern"`。 ## 命令 ### `/guard status` 显示当前守卫状态：启用/禁用，活动层级，守卫模型，详细模式。 ``` Pi-Guard: ENABLED | Layers: classifier:on mirsad:off guard:on | Guard model: (auto) | Verbose: false ``` ### `/guard on` / `/guard off` 在运行时切换守卫，而无需编辑配置。状态仅对当前会话持久化。 ### `/guard review` 运行 `git diff --staged` 并将差异发送到处于 `diff_review` 模式的 LLM 守卫。报告风险级别、类别和建议。 ``` Review: low risk (acceptable) - No security issues found Suggestions: Consider adding tests for the new utility function ``` ### `/guard log` 显示当前会话的最后 10 条守卫裁决。条目通过 `pi.appendEntry()` 持久化，因此它们在会话序列化后仍然存在。 ### `/guard config` 显示解析后的配置（在合并项目、全局和默认值之后）和配置文件路径。 ### `/guard practices` 显示活动的实践规则：TDD、常规提交、风格指南和自定义规则。 ### `/guard advisor [N]` 美观打印当前会话的引导顾问轨迹日志中的最后 `N` 条记录（默认为 5）。每条记录显示 ISO 时间戳、状态（`ok` / `empty` / `skipped_*` / `timeout_or_error`）、延迟、顾问模型、任务以及注入到执行器系统提示词中的完整建议字符串。在重构、规划或修复错误的轮次后使用此功能，以准确查看顾问建议了什么以及执行器是否采纳了建议。需要 `steeringAdvisor.enabled: true` 和非空的 `trajectoryLogPath`（默认的按会话轮换算作非空）。 ## 工具 ### `consult_guard` 顾问工具，代理可以主动调用它以获取关于架构、安全或质量决策的第二意见。尊重全局启用标志：当守卫关闭（`/guard off`）时，返回禁用消息，而不解析模型或调用 LLM。 **参数：** | 参数 | 类型 | 必需 | 描述 | |-----------|------|----------|-------------| | `question` | `string` | 是 | 需要获取建议的决策或问题 | | `context` | `string` | 否 | 额外上下文：代码片段、计划、约束 | **何时使用：** 当代理对风险操作不确定或在继续之前需要指导时，它会调用 `consult_guard`。该工具使用 `advisor` 守卫模式，具有更长的超时时间（`reviewTimeoutMs`），并包含会话上下文以实现全面感知。 **交互示例：** ``` Agent → consult_guard({ question: "Should I refactor the database layer to use connection pooling?", context: "Current implementation creates a new connection per request. ~500 req/s in production." }) ← { "allow": true, "risk": "none", "category": "architecture", "reason": "Connection pooling is strongly recommended at this request volume...", "suggestions": "Use pg-pool with a max of 20 connections..." } ``` ## 架构 ### 守卫流程 (tool_call hook) ``` flowchart TD START["Tool call arrives"] --> SKIP{"In skipTools\nlist?"} SKIP -- yes --> ALLOW_SKIP["ALLOW\npass-through"] SKIP -- no --> RETRY{"Recently\nblocked?\n60s cache"} RETRY -- yes --> BLOCK_RETRY["BLOCK\nretry prevention"] RETRY -- no --> CL_ON{"classifier\nenabled?"} CL_ON -- yes --> CLASSIFY["Run classifier\nregex, 3 surfaces"] CL_ON -- no --> CL_OFF["tier = medium\ndefault"] CLASSIFY --> CL_NONE{"tier = none?"} CL_NONE -- yes --> ALLOW_NONE["ALLOW"] CL_NONE -- no --> CL_LOW{"tier = low\naction = allow?"} CL_LOW -- yes --> ALLOW_LOW["ALLOW\nfast path"] CL_LOW -- no --> MIRSAD CL_OFF --> MIRSAD MIRSAD{"mirsad\nenabled?\ntier >= medium"} -- yes --> ML["Mirsad ML check"] ML --> ML_INJ{"injection?"} ML_INJ -- yes --> ESC_CRIT["Escalate to critical"] ML_INJ -- no --> ML_UNSAFE{"unsafe?"} ML_UNSAFE -- yes --> ESC_HIGH["Escalate to high"] ML_UNSAFE -- no --> GUARD ESC_CRIT --> GUARD ESC_HIGH --> GUARD MIRSAD -- no --> GUARD GUARD{"guard\nenabled?\ntier >= medium"} -- yes --> LLM["LLM guard\ncompleteSimple"] GUARD -- no --> FINAL_CL["Classifier/Mirsad\nresult is final"] LLM --> LLM_OK{"Verdict\nreturned?"} LLM_OK -- "null" --> FAIL{"Fail-state\nlogic"} LLM_OK -- "Verdict" --> ACTION{"action?"} FAIL --> FAIL_LOW{"tier <= medium?"} FAIL_LOW -- yes --> ALLOW_FAIL["ALLOW\nfail-open"] FAIL_LOW -- no --> FAIL_UI{"Interactive?"} FAIL_UI -- yes --> PROMPT_FAIL["PROMPT user"] FAIL_UI -- no --> BLOCK_FAIL["BLOCK"] ACTION -- allow --> ALLOW_V["ALLOW"] ACTION -- prompt --> PROMPT_UI{"Interactive?"} PROMPT_UI -- yes --> PROMPT_V["PROMPT user"] PROMPT_UI -- no --> BLOCK_V["BLOCK"] ACTION -- block --> BLOCK_V2["BLOCK"] style ALLOW_SKIP fill:#2d6a4f,stroke:#40916c,color:#fff style ALLOW_NONE fill:#2d6a4f,stroke:#40916c,color:#fff style ALLOW_LOW fill:#2d6a4f,stroke:#40916c,color:#fff style ALLOW_FAIL fill:#2d6a4f,stroke:#40916c,color:#fff style ALLOW_V fill:#2d6a4f,stroke:#40916c,color:#fff style BLOCK_RETRY fill:#d62828,stroke:#e94560,color:#fff style BLOCK_FAIL fill:#d62828,stroke:#e94560,color:#fff style BLOCK_V fill:#d62828,stroke:#e94560,color:#fff style BLOCK_V2 fill:#d62828,stroke:#e94560,color:#fff style PROMPT_FAIL fill:#e9c46a,stroke:#f4a261,color:#1a1a2e style PROMPT_V fill:#e9c46a,stroke:#f4a261,color:#1a1a2e style ESC_CRIT fill:#d62828,stroke:#e94560,color:#fff style ESC_HIGH fill:#e76f51,stroke:#e94560,color:#fff ``` ### 风险分类本地分类器同步运行（<5ms）并涵盖以下类别。 ## 规范化层在模式匹配之前，**规范化层**（`src/normalize.ts`）预处理 bash 命令以挫败规避技术： ``` flowchart LR IN["Raw command"] --> DQ["De-quoting"] DQ --> DE["De-escaping"] DE --> B64["Base64 decode"] B64 --> HEX["Hex decode"] HEX --> IND["Indirection detection"] IND --> S1["Surface 1: Original"] IND --> S2["Surface 2: Normalized"] IND --> S3["Surface 3: Decoded"] S1 --> CL["Classifier regex engine"] S2 --> CL S3 --> CL style IN fill:#264653,color:#fff style S1 fill:#2a9d8f,color:#fff style S2 fill:#e76f51,color:#fff style S3 fill:#e9c46a,color:#1a1a2e ``` 管道按顺序处理命令： 1. **去引号**：提取引号内的内容，保留多词引用字符串 2. **去转义**：移除反斜杠转义（例如，`\n` → `n`，`\$VAR` → `$VAR`） 3. **Base64 解码**：检测 base64 片段（>=16 个字符，经过可打印性检查）并对其进行解码 4. **十六进制转义解码**：检测 `\xHH` 序列并对其进行解码 5. **间接引用检测**：标记变量间接引用（`$VAR、`${VAR}`、反引号替换）然后，正则模式在三个表面上运行： - **原始**：接收到的原始命令（用于安全/构建模式） - **规范化**：去引号和去转义后（用于高风险模式） - **解码**：base64/hex 内容（用于全面模式匹配）安全和构建模式仅匹配原始表面，因此白名单不能由混淆的输入触发。高风险模式（破坏性、管道到执行、系统级）匹配规范化命令以捕获混淆变体。 ### 评分分类器对某些模式使用**基于阈值的评分**，以避免僵化的标志组合正则： **`rm` 命令**（破坏性阈值：>=6）： | 组件 | 分数 | |-----------|-------| | `rm` 关键字 | +3 | | `-r` / `--recursive` | +3 | | `-f` / `--force` | +2 | | 根目标（`/` 或 `~/*`） | +3 | | `--no-preserve-root` | +3 | **评分原理**：rm(3) + -r(3) + -f(2) + /path(3) + --no-preserve-root(3) = 最大 14。阈值 6 意味着：rm + (-r 或 -f + /path)，或 rm + --no-preserve-root。这种评分方法涵盖了所有危险的标志组合，而无需枚举每一个。例如，`rm -rf /` 得分为 9 (3+3+3)，而 `rm -f ~/*` 得分为 8 (3+2+3)。 **规避抗性**： - **引用规避**：`"rm" "-rf" "/"` 被正确规范化为 `rm -rf /` 并被归类为破坏性 - **转义规避**：`r\m -rf /` 和 `c\h\m\o\d +x` 被正确检测到 - **十六进制转义规避**：`\x72\x6d \x2d\x72\x66 /tmp`（`rm -rf /tmp` 的十六进制）被规范化层捕获 - **Base64 规避**：在模式匹配之前解码 >=16 个字符的 Base64 编码负载 - **边界检查**：十六进制转义的 rm (\x72\x6d) 检测对间距和大小写的变化具有鲁棒性 **其他模式**在适用的情况下也使用基于阈值的评分，例如破坏性命令检测和数据泄露风险评估。 **Bash 命令：** | 类别 | 示例 | 层级 | |----------|----------|------| | `safe_command` | `ls`, `cat`, `git status`, `git log`, `npm test`, `tsc --noEmit`, `eslint` | `none` | | `build` | `npm install`, `npm run build`, `cargo build`, `go build`, `make`, `pip install` | `low` | | `network` | `curl`, `wget`, `ssh`（不带管道到 shell） | `medium` | | `multi_command` | 带有 `;`, `&&`, `\|\|` 链接的命令 | `medium` | | `unknown_command` | 无法识别的命令 | `medium` | | `process_kill` | `kill`, `pkill`, `killall` | `critical` | | `destructive` | `rm -rf`, `git push --force`, `git reset --hard`, `mkfs`, `dd if=`, `docker system prune`, `docker rmi -f`, `docker rm -f`, `docker volume/network/image/container prune`（`podman` 同理） | `high` | | `pipe_to_exec` | `curl \| sh`, `wget -O- \| bash`, 管道到 `eval`, `python`, `node` | `high` | | `system` | `chmod +x`, `sudo`, `chown`, `crontab`, `launchctl`, `systemctl` | `high` | | `obfuscation` | `base64 -d`, `xxd -r`, 带有编码负载的 `python -c` / `node -e` | `high` | | `chained_execution` | `eval $`, `$()` 子 Shell | `high` | | `exfiltration` | `scp`, `rsync`, `nc`/`netcat`, `telnet`, `ftp` | `high` | **文件操作（write/edit）：** | 类别 | 示例 | 层级 | |----------|----------|------| | `normal_write` | cwd 中的项目文件 | `low` | | `supply_chain` | `package.json`, `go.mod`, `Cargo.toml`, `requirements.txt`, `pyproject.toml`, `Gemfile`, `pom.xml` | `medium` | | `outside_cwd` | 项目目录之外的文件 | `medium` | | `sensitive_path` | `.env`, `.ssh/`, `/etc/`, `~/.bashrc`, `~/.zshrc`, `~/.aws/`, `.git/config` | `high` | **只读工具**（`read`、`grep`、`find`、`ls`）始终返回 `none`。 **bash 的分类器顺序**：首先是自定义模式，然后是关键检查（process_kill），接着是高风险检查（destructive、pipe-to-exec、system、obfuscation、chained、exfiltration），然后是安全命令模式，接着是中风险（network、multi_command），然后是 build，最后回退到 `medium`。高风险检查在安全命令检查之前运行，因此 `echo payload | base64 -d` 不会被安全的 `echo` 匹配短路。 ### LLM 守卫守卫封装 Pi 的 `completeSimple()` 以评估通过分类器但需要语义分析的工具调用。 **模型解析顺序：** 1. `config.guardModel`（格式为 `"provider:modelId"`），通过 `ctx.modelRegistry.find()` 查找 2. 通过 `ctx.modelRegistry.getAvailable()` 扫描找到的第一个可用的 Haiku 级模型 3. 会话模型（`ctx.model`），带有警告：尽可能避免自我评估盲点 **六个提示模板**（`security`、`plan_review`、`practices`、`diff_review`、`advisor`、`steering_advisor`）。前五个指示 LLM 返回 JSON `Verdict`；第六个返回纯文本，仅由引导顾问路径（`guard-eval.ts` 中的 `callSteeringAdvisor`）使用，而非工具调用管道。用户可以通过 `config.prompts[mode]` 覆盖任何模板。 **提示注入缓解**：所有守卫模式，包括安全守卫，都包含包装在 `` 标签中的会话上下文，并带有明确的对抗性数据指令。这允许守卫推理用户意图，同时将会话历史视为潜在的对抗性数据。 **裁决解析**：守卫响应被剥离 markdown 代码围栏（查找第一个 `{`，最后一个 `}`），解析为 JSON，并验证必填字段（`allow`、`risk`、`category`、`reason`）。无法解析的响应返回 `null`，调用者应用上下文感知的故障状态逻辑。 ### Mirsad 集成（可选） [Mirsad](https://github.com/hikma-ai/hikma-mirsad) 是 HikmaAI AI System 代理——一个具有基于 ML 的分类器的隐私优先 AI 安全网关。启用后，Pi-Guard 将其用作额外的分类层。 **可用的 HikmaAI ASI 代理（分类器）：** | 分类器 | 检测内容 | 延迟 | |-----------|-----------|---------| | ASI11 (regex) | 提示注入模式 | <1ms | | ASI11-ML (DeBERTa v3) | 基于 ML 的注入检测 | ~30ms | | ASI13 | PII 强制执行（电子邮件、电话、SSN、信用卡、IP、DOB） | <5ms | | Toxicity (DistilBERT) | 毒性评分（输出） | ~10ms | | Jailbreak | 越狱成功模式（输出） | <1ms | | Secrets | API 密钥、令牌、凭据（输出） | <1ms | **输入检查**（`POST /v1/check`）：当分类器层级为 `medium` 或更高时，在 bash 命令上运行。`unsafe` 裁决升级为 `high`；`injection_detected` 升级为 `critical`。 **输出检查**（`POST /v1/check-output`）：通过 `tool_result` 钩子在工具结果上运行。关于机密、PII、越狱指标和高毒性的警告被注入到结果内容中，以便代理看到它们。 **优雅降级**：Mirsad 始终是可选的。如果无法访问，Pi-Guard 将回退到本地分类器 + LLM 守卫。在第一次失败后，警告将被抑制 60 秒，以避免日志噪音。 **启用方法：** ``` { "mirsad": { "enabled": true, "baseUrl": "http://127.0.0.1:8080" } } ``` ## 安全模型 - **提示注入缓解**：所有守卫模式，包括安全守卫，都将会话上下文包装在带有明确对抗性数据指令的 `` 标签中。这允许守卫推理用户意图（例如，“用户要求这样做”），同时防止来自用户生成或代理生成内容的跨提示注入。 - **上下文感知的故障状态**：当 LLM 守卫不可达时，行为取决于分类器层级。中等及以下：盲目放行（允许）。高和关键且有 UI：提示用户。高和关键且无 UI（无头模式）：阻止。这防止了虚假的安全和不必要的干扰。 - **可操作的阻止消息**：每个阻止都使用紧凑的单行格式（`BLOCKED [tier/category]: reason. alternative.`），以防止无头代理在被阻止的操作上进入重试循环。 - **无头模式安全**：在非交互模式（`ctx.hasUI === false`）下，通常提示的操作将被阻止。没有静默升级。 - **守卫模型分离**：默认情况下，Pi-Guard 使用 Haiku 级模型进行评估，而不是代理的活动模型。这避免了生成风险操作的同一模型评估自身输出时的自我评估盲点。 ## 开发 ### 前置条件 - Node.js 22+（最低 20+） - npm 9+ - Pi（可选，用于手动集成测试） ### 设置 ``` git clone https://github.com/hikmaai-io/pi-guard.git cd pi-guard npm install npm run build ``` ### 测试 ``` npm test # all tests (vitest) npx vitest --watch # watch mode npm run typecheck # tsc --noEmit make check # typecheck + tests ``` ### 项目结构 ``` pi-guard/ ├── package.json # ESM package; pi.extensions points to dist/index.js ├── tsconfig.json # Strict TypeScript, ES2022, Node16 modules ├── guard.example.json # Example config with documented defaults ├── Makefile # build, test, install, uninstall, dev-link, release ├── install.sh # One-command installer script ├── src/ │ ├── index.ts # Extension entry: hooks (incl. before_agent_start + pre_completion), consult_guard tool, /guard command, handleBeforeAgentStart + handlePreCompletion │ ├── types.ts # Shared types: Verdict, RiskTier, GuardConfig, GuardMode, SteeringAdvisorConfig │ ├── classifier.ts # Fast local risk classifier (sync, pattern-based) │ ├── normalize.ts # Command normalization: de-quoting, de-escaping, base64/hex decode │ ├── guard-eval.ts # LLM guard (callGuard → Verdict) and steering advisor (callSteeringAdvisor → plain text) │ ├── prompts.ts # Six prompt templates with placeholder interpolation (incl. steering_advisor) │ ├── trajectory-log.ts # JSONL steering trajectory logger (SHA-256 session-context hash; no raw PII on disk) │ ├── pi-shim.ts # Structural shim for the pi-coding-agent / pi-ai peer deps │ ├── config.ts # Config loader: project + global + defaults deep merge │ ├── context-builder.ts # Session context extraction with word-count truncation │ └── mirsad.ts # Optional Mirsad gateway client (input/output checks) └── tests/ ├── classifier.test.ts # Classifier pattern matching and custom patterns ├── guard-eval.test.ts # Verdict parsing, code fence stripping, timeouts, callSteeringAdvisor ├── prompts.test.ts # Template interpolation and untrusted context wrapping (incl. steering_advisor) ├── trajectory-log.test.ts # JSONL writer, path resolution, hashing, truncation, fail-open I/O ├── pre-completion.test.ts # handlePreCompletion: gating, happy path, trajectory logging ├── config.test.ts # Config loading, merging, and defaults ├── context-builder.test.ts # Session context building and truncation ├── mirsad.test.ts # Mirsad client: happy path, timeout, verdict mapping ├── handler.test.ts # Full tool_call hook flow (classifier → guard → action) └── commands.test.ts # /guard subcommand routing and execution ``` ### Makefile 目标 ``` make help # Show all targets make build # Compile TypeScript to dist/ make test # Run all tests make typecheck # tsc --noEmit make check # typecheck + tests make install # Build and symlink to ~/.pi/agent/extensions/ make dev-link # Same as install (build + symlink) make uninstall # Remove symlink from ~/.pi/agent/extensions/ make config # Create default global config if not present make clean # Remove dist/ and tsbuildinfo make release # check + build, then print tag instructions ``` ## 设计决策 | # | 决策 | 选择 | 理由 | |---|----------|--------|-----------| | 1 | 项目位置 | 独立包 | 独立的生命周期，可发布到 npm | | 2 | LLM 集成 | Pi 的 `completeSimple()` | 复用用户的 API 密钥；无需单独的网关 | | 3 | UX 模型 | 每层混合可配置的操作 | 平衡安全性与开发人员工作流 | | 4 | 盲目放行范围 | 上下文感知：仅对中等及以下盲目放行 | 全面盲目放行违背了对高风险调用的初衷 | | 5 | 计划审查 | 代理主动使用 `consult_guard` | 用于自动检测的关键字启发式很脆弱 | | 6 | 守卫模型默认值 | Haiku 级优先，然后是会话模型 | 避免自我评估盲点；成本更低 | | 7 | 实践/差异审查 | 通过 `/guard` 命令手动进行 | 避免关键路径上的延迟 | | 8 | Mirsad 集成 | 可选的 ML 网关 | 增加 DeBERTa 注入、PII、毒性检测 | | 9 | 提示注入缓解 | 所有模式对会话上下文使用 `` 标签 | 允许守卫推理用户意图，同时将会话历史视为对抗性数据 | | 10 | 阻止消息 | 紧凑的单行格式：`BLOCKED [tier/category]: reason. alternative.` | 防止无头代理重试循环；更易于以编程方式解析 | | 11 | 引导顾问范围 | 仅推理时；训练超出范围 | pi-guard 是 TS 扩展；RL 训练器属于同级 Python 项目。pi-guard 提供了（顾问 → 执行器）拓扑结构和轨迹日志，以便训练器稍后使用。 | | 12 | 执行器模型所有权 | 不由 pi-guard 配置 | 执行器是 pi-coding-agent 自己的模型。pi-guard 仅配置顾问（`guardModel`）。顾问/执行器拆分是结构性的，而非配置表面。 | | 13 | 轨迹存储 | 通过 `appendFileSync` 存储 JSONL 文件，会话上下文的 SHA-256 哈希 | 每次调用 O(1) 内存；无进程内缓冲。仅哈希上下文避免将代码/机密/PII 泄露到长期日志中。 | ## 引导顾问 (arXiv:2510.02453 对齐) -Guard 承载了 *How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models* (Asawa, Zhu, O'Neill, Zaharia, Dimakis, Gonzalez — [arXiv:2510.02453](https://arxiv.org/abs/2510.02453)) 中每实例 NL 建议契约的推理时实现，并与 [Anthropic 的顾问策略](https://claude.com/blog/the-advisor-strategy) 保持一致。 **它是什么。** 一个小型的顾问模型（配置为 pi-guard 的 `guardModel`）发出每轮纯文本引导提示，这些提示通过 `before_agent_start` 钩子（在 pi-coding-agent ≥0.66 上为主机原生；`pre_completion` 保留为未来主机的转发声明）附加到执行器模型的每轮系统提示词中。顾问使用结构化的内部推理模式（OBSERVE → REASON → ADVISE）在发出建议之前分析会话上下文，并将自己定位为**同级**（而非主管）——执行器决定是否遵循建议。执行器是 pi-coding-agent 正在运行的任何模型——pi-guard **不**配置执行器。顾问在**两个表面**上运行： - **轮次方向**（`before_agent_start` 钩子）：每用户轮次一次，侧重于策略、错误预判和高效工具排序。 - **任务中期协调**（`consult_advisor` 工具）：按需，由执行器在卡住时或在声明任务完成之前调用。侧重于根本原因分析、收敛评估和验证指导。 **它不是什么。** 安全网关。它不能允许或阻止任何事情。所有安全执行仍然通过上述三层 `tool_call` 管道进行。 ### 拓扑结构 ``` user turn ├─ host emits before_agent_start { prompt, systemPrompt } │ └─ pi-guard handleBeforeAgentStart() [TURN ORIENTATION] │ ├─ gate on config.enabled + steeringAdvisor.enabled + non-empty task │ ├─ bypass short (< bypassShortTasksUnder) / trivial tasks │ ├─ check session circuit breaker (CLOSED/OPEN/HALF_OPEN) │ ├─ resolve guardModel (advisor) │ ├─ callSteeringAdvisor(PHASE_GUIDANCE_ORIENTATION) → plain-text advice | null │ ├─ sanitizeAdvice() strips tag breakouts + role markers │ ├─ append one JSONL line (source: "hook") to trajectory log │ ├─ ctx.ui.notify() for optional transient advisor feedback │ └─ return { systemPrompt: base + … } │ └─ executor may call consult_advisor tool mid-task [MID-TASK RECONCILE] ├─ gate on config.enabled + steeringAdvisor.enabled + onDemand.enabled ├─ per-session call cap (maxCallsPerSession, default 10) ├─ shared circuit breaker with hook path ├─ callSteeringAdvisor(PHASE_GUIDANCE_ON_DEMAND) → plain-text advice | null ├─ append one JSONL line (source: "tool") to same trajectory log └─ return plain text (no XML wrapper — tool-result trust channel) ``` ### 轨迹日志（schema v2）每次调用一行 JSON。原始会话上下文从未持久化——仅保留其 SHA-256 哈希和长度——因此日志可以安全提交或传输。 ``` { "schema": 2, "timestamp": 1700000000000, "sessionId": "20260412-163456-a1b2", // per-session, minted at session_start "turnId": "turn-42", "advisorModelId": "anthropic:claude-haiku-4-5", "executorModelId": "anthropic:claude-sonnet-4-5", "task": "Add pagination to /users", // truncated to maxTaskChars "advice": "Read src/api/users.ts first...", // truncated to maxAdviceChars "sessionContextHash": "", "sessionContextLen": 1423, "status": "ok" | "empty" | "timeout_or_error" | "skipped_short" | "skipped_trivial" | "skipped_circuit" | "skipped_limit" | "skipped_disabled" | "skipped_unavailable", "latencyMs": 420, "source": "hook" | "tool" // hook = turn-start, tool = consult_advisor } ``` ### 启用它 ``` // .pi/guard.json { "guardModel": "anthropic:claude-haiku-4-5", "steeringAdvisor": { "enabled": true } } ``` ### 主机兼容性说明在 pi-coding-agent ≥0.66（当前工作区版本）上，顾问通过主机原生的 `before_agent_start` 事件驱动。`pre_completion` 事件作为 pi-guard 的结构性垫片（`src/pi-shim.ts`）中的转发声明保留，供任何采用该名称的未来主机使用。这两个注册都包装在 `try/catch` 中，因此既不发出这两个事件的主机将导致两个 `pi.on(...)` 调用在注册时抛出异常，并且 pi-guard 将静默继续。工具调用管道中的任何内容均不受影响。 ### 训练超出了此 TypeScript 扩展的范围。Pi-Guard 的工作是 (a) 在推理时提供顾问服务，以及产生一个干净的数据集，供未来的进程外训练器使用。训练端的参考实现：。 ## License MIT

标签：AI安全, Chat Copilot, DevSecOps, DLL 劫持, DNS解析, Homebrew安装, LLM, LLM监控, MITM代理, PII检测, Python, Unmanaged PE, 三层架构, 上游代理, 人工智能, 代码安全, 合规, 命令执行, 大语言模型, 安全网关, 审计, 工具调用, 开源项目, 引导, 拦截器, 提示词模板, 无后门, 智能体安全, 本地分类器, 注入检测, 渗透测试框架, 漏洞枚举, 用户模式Hook绕过, 网络安全, 自动化攻击, 防御, 隐私保护, 零日漏洞检测, 顾问模型