armyknife-social/agentshield-community-rules

GitHub: armyknife-social/agentshield-community-rules

这是一个基于社区维护的AI代理安全检测规则库，针对OWASP LLM Top 10威胁提供精准检测，解决传统WAF无法应对的AI攻击问题。

Stars: 0 | Forks: 1

# AgentShield 社区规则 AI 代理安全的检测规则。Schema 校验，MIT 许可，社区维护。 **[armyknife-social.github.io/agentshield-community-rules](https://armyknife-social.github.io/agentshield-community-rules)** — 平台概述、架构图、规则浏览器、合规性配置文件。 ## 现有工具的问题传统 WAF 将已知的攻击签名与固定的 HTTP 字段进行模式匹配。AI 指导的攻击行为与已知攻击不同。它们语义有效，每次请求都会变异，通过模型本身进行路由，并且可以将你自己的 LLM 端点武器化来攻击你的基础架构。提示注入攻击看起来不像 SQL 注入攻击。它看起来像一条用户消息。通过受损的 RAG 管道进行的数据渗漏尝试不会触发速率限制或 IP 信誉检查。它看起来像是文档检索。跨代理委派滥用通过经过身份验证的 A2A 通道进行，并携带有效的 ATCS 身份令牌。 AgentShield 是为这种威胁模型构建的。这些社区规则是检测层。 ## 架构 ``` ┌─────────────────────────────────────────┐ │ Agent Request │ └──────────────────┬──────────────────────┘ │ ┌──────────────────▼──────────────────────┐ │ Cerberus │ │ Ingress gateway (cm-cerberusd) │ │ Rate limiting · TLS · Request routing │ └──────────────────┬──────────────────────┘ │ POST /inspect ┌──────────────────▼──────────────────────┐ │ Aegis │ │ Inline content inspector (cm-aegisd) │ │ 19 rules compiled at startup │ │ 5-20ms inline budget · shadow mode │ └──────────────────┬──────────────────────┘ │ ┌──────────────────▼──────────────────────┐ │ Verdict routing │ │ │ │ allow ──────────────────► upstream │ │ warn ─────────────────► upstream │ │ + log │ │ mirror ────────────────► upstream │ │ + Minerva │ │ block ─────────────────► 403 │ │ + trigger │ └──────────────────┬──────────────────────┘ │ block path only ┌──────────────────▼──────────────────────┐ │ cm-agentshieldd │ │ Session anomaly daemon (:7160) │ │ ATCS enforcement · session termination │ │ Substrate rules always-on │ │ Opt-in rules registered per-session │ └──────────────────┬──────────────────────┘ │ ┌──────────────────▼──────────────────────┐ │ ContextOS Chain │ │ Merkle-linked receipt chain │ │ AnomalyDetected · signed · timestamped │ └──────────────────┬──────────────────────┘ │ fan-out ┌─────────┬──────────────┬┴─────────────┬────────────┐ │ │ │ │ │ Splunk HEC PagerDuty OpenSearch CEF/LEEF Datadog ``` 该底座还在 Firecracker 微虚拟机层无条件地强制执行四条规则，无论会话配置如何： ``` exfiltration-correlation — volume/pattern-based data exfil detection host-network-escape-attempt — unauthorized host network interface access rootfs-write-attempt — writes outside ephemeral overlay vsock-bypass-attempt — unauthorized vsock channel opening ``` 无论注册了哪些选择加入规则，这些规则都会触发。它们不能被禁用。 ## 社区规则如何接入社区规则是**选择加入规则**。当插件在其清单中声明时，它们会按会话注册，并在 cm-aegisd 的内联内容检查器中运行。 ### 注册流程 ``` 1. Factory emits plugin manifest with compliance profile ┌─────────────────────────────────────────────┐ │ manifest.json │ │ { │ │ "agentshield_rules": [ │ │ "phi-exfil-pattern", │ │ "cross-agent-delegation-gate", │ │ "prompt-injection-marker" │ │ ], │ │ "required_opt_in_rules": [ │ │ "phi-exfil-pattern", │ │ "cross-agent-delegation-gate" │ │ ] │ │ } │ └─────────────────────────────────────────────┘ 2. agentos-runtime registers rules at session spawn POST http://127.0.0.1:7160/anomalies/configure-session { "session_id": "fcc-abc123", "plugin_id": "acme-intake-agent", "rules": ["phi-exfil-pattern", "cross-agent-delegation-gate", ...] } 3. Cerberus calls Aegis inline on every request POST http://127.0.0.1:7170/inspect { "content": "", "content_type": "user_input", "session_id": "fcc-abc123" } → { "verdict": "block", "matched_rules": ["phi-exfil-pattern"], "elapsed_ms": 7 } 4. On block: Cerberus fires trigger → cm-agentshieldd emits receipt POST http://127.0.0.1:7160/trigger { "session_id": "fcc-abc123", "rule": "phi-exfil-pattern", "severity": "HIGH" } → AnomalyDetected receipt enters the ContextOS chain ``` ### 规则 ID 必须注册为了让社区规则在生产环境中触发，其 `rule_id` 必须出现在 `cm-agentshieldd` 的 `KNOWN_OPT_IN_RULES` 常量中。此仓库中的所有 49 条规则都已在上游 AgentShield 底座中注册。如果你编写了新规则，请提交一个 PR，将其 ID 与 YAML 一起添加到 `KNOWN_OPT_IN_RULES` 中。 ## 规则格式 ``` schema_version: agentshield-rule-v0.1 rule_id: phi-exfil-pattern name: "PHI Data Exfiltration Pattern" description: > Detects Protected Health Information in model responses: ICD-10 diagnostic codes adjacent to clinical context terms, MRN identifiers, NDC drug codes, and patient record fields. Fires on content_type: response only. severity: HIGH category: data-exfiltration owasp_llm: LLM06 tags: [hipaa, phi, healthcare, compliance] content_types: [response] action: block detector: type: regex pattern: > (?i)(?:[A-Z]\d{2}\.?\d{0,4}[A-Za-z]?\s+(?:diagnosis|condition|disorder|disease|syndrome)) |(?:patient\s+(?:id|record|name|dob)[\s:]+[A-Za-z0-9\-\/]+) |(?:mrn[\s:]+[A-Za-z0-9\-]{4,}) |(?:ndc[\s:]+\d{4,5}-\d{3,4}-\d{1,2}) mitigation: > Terminate the session. Log session_id and matched content for HIPAA incident review. Do not cache or forward the response. references: - https://owasp.org/www-project-top-10-for-large-language-model-applications/ - https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/ test_cases: should_match: - "Patient diagnosis: F32.0 Major depressive disorder. DOB: 1985-03-12" - "MRN: P-447291, prescribed NDC 0069-0150-01" should_not_match: - "The patient portal is available at our website" - "Please enter the patient ID in the system field" author: ArmyKnifeLabs license: MIT ``` ### 检测器类型 | 类型 | 描述 | 何时使用 | |---|---|---| | `regex` | 编译的 Rust `regex` crate 模式。在检查时按请求应用。 | 内容模式匹配。常见情况。 | | `heuristic` | 无模式。描述记录多信号逻辑。 | 无法用单个正则表达式表达的行为或统计规则。 | | `external` | 调用外部 HTTP 端点进行分类。 | 基于 ML 的分类器，外部威胁情报源。 | **Rust regex crate 限制：** 不支持前瞻和后顾断言（`(?!...)`, `(?=...)`）。支持命名捕获组、非贪婪量词和 Unicode 字符类。使用 Rust 风格在 [regex101.com](https://regex101.com) 测试模式。 ## OWASP LLM Top 10 覆盖范围每个 OWASP LLM Top 10 类别（2025）都映射到此仓库中的一个或多个规则。完整映射请参见 [OWASP-MAPPING.md](OWASP-MAPPING.md)。 ``` LLM01 Prompt Injection 10 rules LLM02 Insecure Output Handling 1 rule LLM03 Training Data Poisoning 1 rule LLM04 Model Denial of Service 1 rule LLM05 Supply Chain Vulnerabilities 1 rule LLM06 Sensitive Information Disclosure 4 rules (phi, pci, secrets, training data) LLM07 Insecure Plugin Design 3 rules LLM08 Excessive Agency 3 rules LLM09 Overreliance 1 rule LLM10 Model Theft 1 rule ``` ## 合规性配置文件配置文件将监管框架映射到一组必需的选择加入规则。AgentShield 工厂在插件清单发出时选择正确的配置文件，并将相应的 `agentshield_rules` 数组写入清单。 ``` profiles/ default.yaml — prompt-injection-marker hipaa.yaml — phi-exfil-pattern, cross-agent-delegation-gate pci-dss.yaml — pci-pattern-detector, cross-agent-delegation-gate soc2.yaml — audit-trail-completeness iso-42001.yaml — ai-system-boundary-check fedramp.yaml — four required rules, READ_ONLY ATCS ceiling gdpr.yaml — pii-bulk-detection, cross-agent-delegation-gate ``` 配置文件 YAML 格式： ``` schema_version: agentos-rule-pack-v0.1 compliance_profile: hipaa description: "HIPAA PHI-scope agent substrate rules." substrate_rules: - exfiltration-correlation - host-network-escape-attempt - rootfs-write-attempt - vsock-bypass-attempt required_opt_in_rules: - prompt-injection-marker - phi-exfil-pattern - cross-agent-delegation-gate optional_rules: - pii-bulk-detection - audit-trail-completeness atcs_authority_ceiling: WRITE_STANDARD ``` `substrate_rules` 列表仅供参考——这些规则在底座中始终开启，与配置文件无关。`required_opt_in_rules` 列表是工厂写入 `manifest.agentshield_rules` 的内容。 ## 仓库布局 ``` rules/ prompt-injection/ 12 rules — direct override, role hijack, jailbreak, system prompt extraction, token manipulation, retrieval injection, hidden unicode, multimodal data-exfiltration/ 11 rules — API keys, PAN, SSN, PHI, PCI track data, PII bulk, AWS credentials, private keys, DB connection strings, training data probing tool-abuse/ 11 rules — cross-agent delegation, network egress, AI system boundary, plugin chain bypass, SSRF, shell injection, A2A lateral movement llm-owasp/ 10 rules — one canonical rule per OWASP LLM Top 10 experimental/ 5 rules — adversarial suffix, hallucination amplification, model distillation probe, agent impersonation, RAG context injection profiles/ default.yaml hipaa.yaml pci-dss.yaml soc2.yaml iso-42001.yaml fedramp.yaml gdpr.yaml schema/ rule.schema.json JSON Schema (draft-07) for rule validation scripts/ validate.py Schema validation — runs on every PR via CI generate_index.py Generates index.yaml from all rules test_rules.py Unit tests — runs should_match/should_not_match per rule smoke-test.sh Live integration test against a running AgentShield substrate index.yaml Auto-generated rule index (do not edit manually) ``` ## 快速入门 **验证规则：** ``` pip install pyyaml jsonschema python scripts/validate.py rules/prompt-injection/direct-instruction-override.yaml ``` **运行单元测试（所有规则，无需基础架构）：** ``` python scripts/test_rules.py # 结果：42 通过 0 失败 7 跳过 (heuristic rules) ``` **针对底座 VM 运行实时冒烟测试：** ``` CM_AGENTSHIELD_CONTROL_TOKEN= \ VM= \ bash scripts/smoke-test.sh ``` 冒烟测试验证： - 所有社区规则 ID 都在 `KNOWN_OPT_IN_RULES` 中注册 - 所有规则都可以通过 `configure-session` 为会话注册 - 触发器触发并在 ContextOS 链中产生回执 - 如果 cm-aegisd 可达，已知恶意负载会产生预期的判定 ## 编写规则 1. 从相关类别复制现有规则作为起点 2. 设置唯一的 `rule_id`（kebab-case） 3. 编写你的 `detector.pattern` 并使用 Rust 风格在 regex101.com 上验证它 4. 添加 `test_cases.should_match` 和 `test_cases.should_not_match`——至少各 2 个 5. 运行 `python scripts/test_rules.py --rule ` 6. 运行 `python scripts/validate.py rules//.yaml` 7. 提交 PR 首次提交的规则放在 `rules/experimental/` 中。经过一个审查周期确认误报率可接受后，它们会升级到主类别目录。完整审查流程和严重性指南请参见 [CONTRIBUTING.md](CONTRIBUTING.md)。 ## 严重性与判定映射 ``` HIGH → block (confidence 0.95) session kill / request denied MEDIUM + MEDIUM → mirror (0.75) allowed + forwarded to Minerva for review MEDIUM → warn (0.60) allowed + logged LOW → warn (0.50) logged only ``` 当 cm-aegisd 返回 `block` 时，Cerberus 向 cm-agentshieldd 触发 `POST /trigger`。对于底座规则，ATCS 强制策略无条件终止会话，无论严重性如何。对于选择加入规则，高严重性会触发会话终止；中等和低严重性仅记录。 **信号脱敏：** `data-exfiltration` 类别中的规则（`T6-*`, `phi-exfil-pattern`, `pci-pattern-detector`, `pii-bulk-detection`）在 `/inspect` 响应的 `signals` 字段中绝不包含匹配的文本。你会看到 `[rule_id] ` 而不是匹配的片段。这防止了检测机制将其检测到的秘密或 PII 重新披露给下游消费者。`matched_rules` 数组仍然标识了哪些规则被触发。 ## 企业集成 AgentShield 回执扇出将每个 `AnomalyDetected`、`SessionStarted`、`SessionEnded` 和 `ToolCall` 事件实时发送到配置的企业接收器。 ``` # cm-receiptd-fanout.yaml destinations: - id: splunk-prod adapter: splunk-hec url: "${SPLUNK_HEC_URL}/services/collector/event" headers: Authorization: "Splunk ${SPLUNK_HEC_TOKEN}" filter: kind: [] - id: pagerduty-agentshield adapter: pagerduty-events-v2 url: https://events.pagerduty.com/v2/enqueue headers: X-PagerDuty-Routing-Key: "${PAGERDUTY_ROUTING_KEY}" filter: kind: [AnomalyDetected] severity: [HIGH, CRITICAL] ``` 每个回执都携带按工具调用的归因，其中包含以操作员 YubiKey 为锚点的加密监管链。`AnomalyDetected` 回执在顶层包含 `rule`、`severity`、`session_id`、`elapsed_ms` 和 `terminated` 字段——这些字段直接落入 Splunk 的索引中，无需字段提取。 **扇出管理端点**（`GET /fanout/status`, `GET /fanout/dlq`）需要承载令牌： ``` export CM_RECEIPTD_FANOUT_TOKEN= # required — endpoints locked if unset ``` 扇出传递目标（Splunk、PagerDuty 等）在上面的 YAML 中配置，除了目标凭据外，在 cm-receiptd 侧不需要额外的身份验证。这就是区别所在。CVE 扫描器可以告诉你主机上存在哪些漏洞。AgentShield 告诉你，哪个代理身份的哪个工具调用在会话的哪个点触发了哪个异常规则，以及由操作员硬件密钥签名并链接到不可变链中的回执。 ## MCP 服务器一个 [MCP（模型上下文协议）](https://modelcontextprotocol.io) 服务器包含在 `mcp/` 中。它让 Claude 和其他 MCP 客户端直接访问 AgentShield 底座。 ``` { "mcpServers": { "agentshield": { "command": "node", "args": ["/path/to/agentshield-community-rules/mcp/dist/index.js"], "env": { "AGENTSHIELD_HOST": "127.0.0.1", "CM_AGENTSHIELD_CONTROL_TOKEN": "", "AGENTSHIELD_RULES_DIR": "/path/to/agentshield-community-rules" } } } } ``` 工具：`inspect_content`, `trigger_rule`, `configure_session`, `get_anomalies`, `query_receipts`, `list_rules`, `validate_rule` 等。请参见 [mcp/README.md](mcp/README.md)。 ``` cd mcp && npm install && npm run build ``` ## 许可 MIT。请参见 [LICENSE](LICENSE)。 `rules/experimental/` 中的规则是社区贡献的，并带有它们自己的作者归属。所有其他规则由 ArmyKnifeLabs 维护。 ## 链接 - [armyknife-social.github.io/agentshield-community-rules](https://armyknife-social.github.io/agentshield-community-rules) — 平台概述和架构 - [AgentShield](https://contextos.armyknifelabs.com/#agentshield) - [OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/) - [rule-schema.md](rule-schema.md) — 完整 schema 参考 - [OWASP-MAPPING.md](OWASP-MAPPING.md) — 完整 OWASP 覆盖图 - [CONTRIBUTING.md](CONTRIBUTING.md) — 贡献指南

标签：AI代理, AI安全, AMSI绕过, Atomic Red Team, Chat Copilot, Homebrew安装, MITM代理, TLS, WAF, 二进制发布, 人工智能安全, 代理安全, 可视化界面, 合规性, 大语言模型安全, 威胁检测, 开源工具, 提示注入防护, 数据泄露检测, 机密管理, 检测规则, 社区维护, 网关安全, 网络安全, 网络资产发现, 自动化攻击, 逆向工具, 防御工具, 隐私保护