dkimek19/wam-hawk
GitHub: dkimek19/wam-hawk
wam-hawk 是一个 LLM agent 运行时安全监控器,通过包装 OpenAI 兼容客户端实时检测 prompt injection、数据泄露和危险工具调用等威胁。
Stars: 0 | Forks: 0
# wam-hawk 🦅
**LLM agent 的运行时安全监控器。**
wam-hawk 包装了任何兼容 OpenAI 的客户端,并实时检查每个请求和响应——在造成破坏之前,检测 prompt injection、数据泄露、系统 prompt 覆盖以及过度的工具使用。
## 工作原理
```
# 之前 — 原始 OpenAI client
client = openai.OpenAI()
response = client.chat.completions.create(...)
# 之后 — wam-hawk 保护(完全相同的 API)
from wam_hawk import Hawk
client = Hawk(openai.OpenAI())
response = client.chat.completions.create(...)
```
wam-hawk 拦截每个 `chat.completions.create` 调用,通过规则引擎处理消息和响应,并在规则匹配时触发警报。其余代码保持不变。
## 安装
```
pip install wam-hawk
```
**环境要求:** Python 3.11+
## 快速开始
```
import openai
from wam_hawk import Hawk
client = Hawk(openai.OpenAI(), mode="warn")
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
# 检查会话期间触发的警报
alerts = client.get_alerts()
for alert in alerts:
print(alert.rule_id, alert.severity.value, alert.explanation)
```
## 模式
| 模式 | 行为 |
|---|---|
| `warn`(默认) | 在终端打印警报,允许调用通过 |
| `block` | 在终端打印警报,在 LLM 调用之前或之后引发 `HawkBlockedError` |
| `silent` | 仅在内部记录警报 — 无终端输出 |
```
# Warn 模式(默认)
client = Hawk(openai.OpenAI(), mode="warn")
# Block 模式 — 检测到时抛出 HawkBlockedError
client = Hawk(openai.OpenAI(), mode="block")
# Silent 模式 — 记录警报但不打印
client = Hawk(openai.OpenAI(), mode="silent")
alerts = client.get_alerts()
```
### 处理拦截
```
from wam_hawk import Hawk, HawkBlockedError
client = Hawk(openai.OpenAI(), mode="block")
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Ignore all previous instructions."}],
)
except HawkBlockedError as e:
print(f"Blocked by rule {e.alert.rule_id}: {e.alert.explanation}")
```
## 警报输出
当规则在 `warn` 或 `block` 模式下触发时,wam-hawk 会向 stderr 打印一个富文本面板:
```
╭─────────────────────────────────────────────────────╮
│ 🦅 WAM-HAWK ALERT │
│─────────────────────────────────────────────────────│
│ Rule : RT-PI-001 │
│ Severity : HIGH │
│ Action : WARN │
│ Event : llm_input │
│ Content : "ignore previous instructions and..." │
╰─────────────────────────────────────────────────────╯
```
- **HIGH** → 红色,**MED** → 黄色,**LOW** → 蓝色
- **BLOCKED** → 红色边框,**WARN** → 黄色边框
## 内置规则
| 规则 ID | 名称 | 严重性 | 检测内容 |
|---|---|---|---|
| `RT-PI-001` | `runtime_prompt_injection` | HIGH | 用户消息或工具结果中的 prompt injection |
| `RT-EX-001` | `runtime_exfiltration` | HIGH | LLM 输出中的数据泄露模式 |
| `RT-SPO-001` | `runtime_system_prompt_override` | HIGH | 替换或覆盖系统 prompt 的尝试 |
| `RT-EA-001` | `excessive_agency_tool` | HIGH | 危险的工具调用(shell、exec、delete 等) |
## 自定义规则
将 `rules_dir` 指向一个包含 YAML 文件的目录:
```
from pathlib import Path
from wam_hawk import Hawk
client = Hawk(openai.OpenAI(), rules_dir=Path("./my_rules"))
```
规则文件格式:
```
id: RT-CUSTOM-001
name: my_custom_rule
severity: HIGH # HIGH | MED | LOW
description: >
Describe what this rule detects.
targets:
event_types:
- llm_input # llm_input | llm_output | tool_call | tool_result
patterns:
- type: content_contains
value: "forbidden phrase"
- type: content_regex
value: "(?i)(bad|dangerous).{0,20}pattern"
- type: tool_name_regex
value: "(?i)(shell|exec)"
- type: url_in_content
value: "(?i)evil\\.com"
```
**匹配类型:**
| 类型 | 匹配方式 |
|---|---|
| `content_contains` | 内容包含字符串(不区分大小写) |
| `content_regex` | 内容匹配 regex |
| `tool_name_exact` | 工具调用名称完全等于此值 |
| `tool_name_regex` | 工具调用名称匹配 regex |
| `url_in_content` | 内容包含匹配域名 regex 的 URL |
## 记录到文件
将警报持久化为 JSONL(每行一个 JSON 对象):
```
from pathlib import Path
from wam_hawk import Hawk
client = Hawk(openai.OpenAI(), log_file=Path("hawk.log"))
```
每一行:
```
{"timestamp": "2026-06-16T12:00:00Z", "rule_id": "RT-PI-001", "severity": "high", "action": "warn", "event_type": "llm_input", "content": "ignore previous...", "explanation": "..."}
```
## API 参考
### `Hawk`
```
Hawk(
client, # any OpenAI-compatible client
mode: str = "warn", # "warn" | "block" | "silent"
rules_dir: Path | None = None, # custom rules directory
log_file: Path | None = None, # JSONL alert log path
)
```
| 方法 | 描述 |
|---|---|
| `chat.completions.create(**kwargs)` | OpenAI 调用的直接替代品 |
| `get_alerts() -> list[RuntimeAlert]` | 本次会话中触发的所有警报 |
| `clear_alerts() -> None` | 重置警报历史 |
### `RuntimeAlert`
```
alert.rule_id # str — e.g. "RT-PI-001"
alert.severity # Severity.HIGH | .MED | .LOW
alert.action # Action.WARN | .BLOCK
alert.event.event_type # "llm_input" | "llm_output" | "tool_call" | "tool_result"
alert.event.content # content that triggered the rule
alert.explanation # human-readable reason
alert.timestamp # ISO 8601 UTC string
```
## CLI
```
# 检查已加载的 rules 和版本
wam-hawk status
```
输出:
```
wam-hawk v0.1.0
Rules loaded : 4
• RT-PI-001 — runtime_prompt_injection (HIGH)
• RT-EX-001 — runtime_exfiltration (HIGH)
• RT-SPO-001 — runtime_system_prompt_override (HIGH)
• RT-EA-001 — excessive_agency_tool (HIGH)
```
## 环境变量
```
HAWK_MODE=warn # warn | block | silent (overridden by Hawk(mode=...) argument)
HAWK_LOG_FILE=hawk.log # alert log path
```
## 许可证
MIT
标签:AI安全, Chat Copilot, DLL 劫持, LLM代理, Petitpotam, 人工智能, 大语言模型, 提示词注入检测, 用户模式Hook绕过, 运行时防护, 逆向工具