raiph-ai/fireclaw
GitHub: raiph-ai/fireclaw
一款面向 AI 代理的开源安全网关,专注防御提示注入并保障上下文安全。
Stars: 16 | Forks: 1

# 🛡️ FireClaw — 您的 Agent 大脑防火墙
Open-source security proxy that protects AI agents from prompt injection attacks.
Website • Quick Start • How It Works • Community Threat Feed • Want to Help?
## 问题 AI agents that browse the web are vulnerable to **prompt injection attacks**. Malicious websites can embed hidden instructions that hijack your agent's behavior — stealing data, executing commands, or overriding safety guidelines. Simple input filtering isn't enough; this is an adversarial problem that requires defense-in-depth. **No existing open-source tool addresses this.** FireClaw fills that gap. ## FireClaw 的功能 FireClaw sits between your AI agent and the internet. Every web fetch passes through a **hardened 4-stage pipeline** that strips prompt injection payloads before content reaches your agent's context window. Your agent calls FireClaw instead of fetching directly. FireClaw returns clean, factual content — no hidden instructions, no Unicode tricks, no encoding exploits. ## 工作原理 ``` Your Agent FireClaw The Web │ │ │ │── fetch("example.com") ──▶│ │ │ │── GET example.com ────────▶│ │ │◀── raw HTML ──────────────│ │ │ │ │ │ ┌─── Stage 1: DNS Check ─────┐ │ │ │ Block known-malicious URLs │ │ │ └────────────────────────────┘ │ │ ↓ │ │ ┌─── Stage 2: Sanitize ──────┐ │ │ │ Strip HTML tricks, hidden │ │ │ │ Unicode, encoding exploits, │ │ │ │ inject canary tokens │ │ │ └────────────────────────────┘ │ │ ↓ │ │ ┌─── Stage 3: LLM Summary ───┐ │ │ │ Isolated LLM extracts facts │ │ │ │ only — no tools, no memory │ │ │ └────────────────────────────┘ │ │ ↓ │ │ ┌─── Stage 4: Output Scan ───┐ │ │ │ Check for residual inject- │ │ │ │ ions, canary survival, │ │ │ │ tool-call syntax │ │ │ └────────────────────────────┘ │ │ │◀── clean content ─────────│ ``` ### 关键洞见 Even if the summarization LLM in Stage 3 gets injected, **it has no tools, no memory, and no access to your data.** It can only return text. And that text still passes through Stage 4 output scanning. The attacker is in a dead end. ## 特性 - **200+ Injection Patterns** — Regex-based detection covering structural tricks, injection signatures, exfiltration attempts, and output manipulation - **DNS-Level Blocklists** — Integrates URLhaus, PhishTank, OpenPhish, and the FireClaw community blocklist - **Canary Token System** — Unique markers injected into content detect if summarization was bypassed - **Domain Trust Tiers** — Configure trusted (skip sanitization), neutral (full pipeline), suspicious (aggressive), or blocked (reject) per domain - **Rate Limiting & Cost Controls** — Per-minute/hour/day limits with auto-throttle and hard caps - **JSONL Audit Logging** — Complete forensic trail of every fetch, detection, and alert - **No Bypass Mode** — The pipeline is fixed. Even if your agent is compromised, it cannot disable FireClaw. - **OLED Display Support** — Optional Raspberry Pi OLED integration for physical monitoring - **Dashboard** — Web-based UI for monitoring, configuration, and log browsing ### 🔥 Pi Appliance OLED 显示
OLED showing daily fetch and threat counts
FireClaw — Defend Your Agent. Protect Your Data. Join the Community.
🛡️ fireclaw.app
标签:4阶段流水线, AI代理防护, AI安全, Chat Copilot, MITM代理, Web安全防护, 中文标签, 代理安全, 内容清洗, 反提示注入, 大模型防护, 威胁情报, 安全代理, 开发者工具, 提示注入防御, 无旁路模式, 源代码安全, 社区威胁源, 网络安全, 自定义脚本, 请求响应过滤, 请求拦截, 输入过滤, 防御纵深, 防火墙, 隐私保护