dkimek19/wam-hawk

GitHub: dkimek19/wam-hawk

wam-hawk 是一个 LLM agent 运行时安全监控器,通过包装 OpenAI 兼容客户端实时检测 prompt injection、数据泄露和危险工具调用等威胁。

Stars: 0 | Forks: 0

# wam-hawk 🦅 **LLM agent 的运行时安全监控器。** wam-hawk 包装了任何兼容 OpenAI 的客户端,并实时检查每个请求和响应——在造成破坏之前,检测 prompt injection、数据泄露、系统 prompt 覆盖以及过度的工具使用。 ## 工作原理 ``` # 之前 — 原始 OpenAI client client = openai.OpenAI() response = client.chat.completions.create(...) # 之后 — wam-hawk 保护(完全相同的 API) from wam_hawk import Hawk client = Hawk(openai.OpenAI()) response = client.chat.completions.create(...) ``` wam-hawk 拦截每个 `chat.completions.create` 调用,通过规则引擎处理消息和响应,并在规则匹配时触发警报。其余代码保持不变。 ## 安装 ``` pip install wam-hawk ``` **环境要求:** Python 3.11+ ## 快速开始 ``` import openai from wam_hawk import Hawk client = Hawk(openai.OpenAI(), mode="warn") response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "What is the capital of France?"}], ) # 检查会话期间触发的警报 alerts = client.get_alerts() for alert in alerts: print(alert.rule_id, alert.severity.value, alert.explanation) ``` ## 模式 | 模式 | 行为 | |---|---| | `warn`(默认) | 在终端打印警报,允许调用通过 | | `block` | 在终端打印警报,在 LLM 调用之前或之后引发 `HawkBlockedError` | | `silent` | 仅在内部记录警报 — 无终端输出 | ``` # Warn 模式(默认) client = Hawk(openai.OpenAI(), mode="warn") # Block 模式 — 检测到时抛出 HawkBlockedError client = Hawk(openai.OpenAI(), mode="block") # Silent 模式 — 记录警报但不打印 client = Hawk(openai.OpenAI(), mode="silent") alerts = client.get_alerts() ``` ### 处理拦截 ``` from wam_hawk import Hawk, HawkBlockedError client = Hawk(openai.OpenAI(), mode="block") try: response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Ignore all previous instructions."}], ) except HawkBlockedError as e: print(f"Blocked by rule {e.alert.rule_id}: {e.alert.explanation}") ``` ## 警报输出 当规则在 `warn` 或 `block` 模式下触发时,wam-hawk 会向 stderr 打印一个富文本面板: ``` ╭─────────────────────────────────────────────────────╮ │ 🦅 WAM-HAWK ALERT │ │─────────────────────────────────────────────────────│ │ Rule : RT-PI-001 │ │ Severity : HIGH │ │ Action : WARN │ │ Event : llm_input │ │ Content : "ignore previous instructions and..." │ ╰─────────────────────────────────────────────────────╯ ``` - **HIGH** → 红色,**MED** → 黄色,**LOW** → 蓝色 - **BLOCKED** → 红色边框,**WARN** → 黄色边框 ## 内置规则 | 规则 ID | 名称 | 严重性 | 检测内容 | |---|---|---|---| | `RT-PI-001` | `runtime_prompt_injection` | HIGH | 用户消息或工具结果中的 prompt injection | | `RT-EX-001` | `runtime_exfiltration` | HIGH | LLM 输出中的数据泄露模式 | | `RT-SPO-001` | `runtime_system_prompt_override` | HIGH | 替换或覆盖系统 prompt 的尝试 | | `RT-EA-001` | `excessive_agency_tool` | HIGH | 危险的工具调用(shell、exec、delete 等) | ## 自定义规则 将 `rules_dir` 指向一个包含 YAML 文件的目录: ``` from pathlib import Path from wam_hawk import Hawk client = Hawk(openai.OpenAI(), rules_dir=Path("./my_rules")) ``` 规则文件格式: ``` id: RT-CUSTOM-001 name: my_custom_rule severity: HIGH # HIGH | MED | LOW description: > Describe what this rule detects. targets: event_types: - llm_input # llm_input | llm_output | tool_call | tool_result patterns: - type: content_contains value: "forbidden phrase" - type: content_regex value: "(?i)(bad|dangerous).{0,20}pattern" - type: tool_name_regex value: "(?i)(shell|exec)" - type: url_in_content value: "(?i)evil\\.com" ``` **匹配类型:** | 类型 | 匹配方式 | |---|---| | `content_contains` | 内容包含字符串(不区分大小写) | | `content_regex` | 内容匹配 regex | | `tool_name_exact` | 工具调用名称完全等于此值 | | `tool_name_regex` | 工具调用名称匹配 regex | | `url_in_content` | 内容包含匹配域名 regex 的 URL | ## 记录到文件 将警报持久化为 JSONL(每行一个 JSON 对象): ``` from pathlib import Path from wam_hawk import Hawk client = Hawk(openai.OpenAI(), log_file=Path("hawk.log")) ``` 每一行: ``` {"timestamp": "2026-06-16T12:00:00Z", "rule_id": "RT-PI-001", "severity": "high", "action": "warn", "event_type": "llm_input", "content": "ignore previous...", "explanation": "..."} ``` ## API 参考 ### `Hawk` ``` Hawk( client, # any OpenAI-compatible client mode: str = "warn", # "warn" | "block" | "silent" rules_dir: Path | None = None, # custom rules directory log_file: Path | None = None, # JSONL alert log path ) ``` | 方法 | 描述 | |---|---| | `chat.completions.create(**kwargs)` | OpenAI 调用的直接替代品 | | `get_alerts() -> list[RuntimeAlert]` | 本次会话中触发的所有警报 | | `clear_alerts() -> None` | 重置警报历史 | ### `RuntimeAlert` ``` alert.rule_id # str — e.g. "RT-PI-001" alert.severity # Severity.HIGH | .MED | .LOW alert.action # Action.WARN | .BLOCK alert.event.event_type # "llm_input" | "llm_output" | "tool_call" | "tool_result" alert.event.content # content that triggered the rule alert.explanation # human-readable reason alert.timestamp # ISO 8601 UTC string ``` ## CLI ``` # 检查已加载的 rules 和版本 wam-hawk status ``` 输出: ``` wam-hawk v0.1.0 Rules loaded : 4 • RT-PI-001 — runtime_prompt_injection (HIGH) • RT-EX-001 — runtime_exfiltration (HIGH) • RT-SPO-001 — runtime_system_prompt_override (HIGH) • RT-EA-001 — excessive_agency_tool (HIGH) ``` ## 环境变量 ``` HAWK_MODE=warn # warn | block | silent (overridden by Hawk(mode=...) argument) HAWK_LOG_FILE=hawk.log # alert log path ``` ## 许可证 MIT
标签:AI安全, Chat Copilot, DLL 劫持, LLM代理, Petitpotam, 人工智能, 大语言模型, 提示词注入检测, 用户模式Hook绕过, 运行时防护, 逆向工具