SkintLabs/WonderwallAi

GitHub: SkintLabs/WonderwallAi

面向大语言模型应用的开源 AI 防火墙 SDK，提供提示注入检测、主题约束、敏感信息过滤等双向安全防护能力。

Stars: 2 | Forks: 0

# WonderwallAi [![CI](https://github.com/SkintLabs/WonderwallAi/actions/workflows/ci.yml/badge.svg)](https://github.com/SkintLabs/WonderwallAi/actions/workflows/ci.yml) [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) 用于 LLM 应用程序的 AI firewall SDK。防止 prompt 注入、数据泄露和滥用话题。 ## 为什么选择 WonderwallAi？ | | WonderwallAi | 托管 API (Lakera 等) | 重型框架 | |---|---|---|---| | **延迟** | <2ms (快速路径) | 50-200ms 往返 | 不定 | | **隐私** | 消息从不离开你的服务器 | 发送给第三方 | 不定 | | **集成** | 3 行代码 | API key + HTTP 调用 | 包装整个 pipeline | | **成本** | 免费 SDK，托管 API $0/月起 | 每次请求 $0.001+ | 免费但复杂 | | **离线** | 无需互联网即可工作 (semantic router) | 需要互联网 | 不定 | ## 它的功能 WonderwallAi 位于你的用户和你的 LLM 之间，双向扫描消息： **入站（用户到 LLM）：** - **Semantic Router** —— 使用针对你允许的主题的向量相似度来阻止无关话题查询 - **Sentinel Scan** —— 使用快速 LLM 分类器 (Groq) 检测 prompt 注入攻击 **出站（LLM 到用户）：** - **Egress Filter** —— 在泄露的 API key、PII 和 canary token 到达用户之前将其拦截 - **File Sanitizer** —— 通过 magic bytes 验证上传并剥离 EXIF 元数据所有层默认均为 fail-open —— 出错时允许消息通过，而不是阻止合法用户。 ## 安装 ``` # 轻量级（仅 egress filter —— 无 ML dependencies） pip install wonderwallai # 完整安装（所有层，包括 semantic routing + sentinel） pip install wonderwallai[all] # 单独的层 pip install wonderwallai[semantic] # + sentence-transformers + torch pip install wonderwallai[sentinel] # + groq pip install wonderwallai[files] # + Pillow + filetype ``` ## 快速开始 ``` from wonderwallai import Wonderwall from wonderwallai.patterns.topics import ECOMMERCE_TOPICS wall = Wonderwall( topics=ECOMMERCE_TOPICS, sentinel_api_key="gsk_...", bot_description="a customer service chatbot for an online store", ) # 在用户输入到达您的 LLM 之前对其进行扫描 verdict = await wall.scan_inbound(user_message) if not verdict.allowed: return verdict.message # User-friendly rejection # 生成 canary token 并将其注入您的 LLM system prompt canary = wall.generate_canary(session_id) system_prompt += wall.get_canary_prompt(canary) # 在 LLM 输出到达用户之前对其进行扫描 verdict = await wall.scan_outbound(llm_response, canary) response_text = verdict.message # Cleaned text (API keys/PII redacted) ``` ## 配置所有参数都有合理的默认值。你可以作为关键字参数传递，或使用 `WonderwallConfig` 对象： ``` from wonderwallai import Wonderwall, WonderwallConfig # Keyword arguments wall = Wonderwall( topics=["Order tracking", "Returns", "Product questions"], similarity_threshold=0.35, sentinel_api_key="gsk_...", sentinel_model="llama-3.1-8b-instant", bot_description="a customer service chatbot", canary_prefix="MYAPP-", fail_open=True, block_message="I can only help with topics I'm designed for.", ) # 或者使用 config object config = WonderwallConfig(topics=["..."], ...) wall = Wonderwall(config=config) ``` ### 关键参数 | 参数 | 默认值 | 描述 | |-----------|---------|-------------| | `topics` | `[]` | 允许的对话主题。为空则禁用 semantic routing。 | | `similarity_threshold` | `0.35` | 余弦相似度阈值 (0.0-1.0)。 | | `embedding_model` | `None` | 预加载的 SentenceTransformer 实例（节省内存）。 | | `sentinel_api_key` | `""` | Groq API key。回退到 `GROQ_API_KEY` 环境变量。 | | `sentinel_model` | `"llama-3.1-8b-instant"` | 用于 sentinel 分类器的模型。 | | `bot_description` | `"an AI assistant"` | 用于 sentinel 系统 prompt 中。 | | `canary_prefix` | `"WONDERWALL-"` | 生成的 canary token 的前缀。 | | `fail_open` | `True` | 出错时允许消息通过（而非阻止）。 | | `block_message` | 通用 | 当 semantic router 阻止时显示的消息。 | | `block_message_injection` | 通用 | 当 sentinel 阻止时显示的消息。 | ## 预置主题集 ``` from wonderwallai.patterns.topics import ( ECOMMERCE_TOPICS, # 18 shopping/order topics SUPPORT_TOPICS, # 13 technical support topics SAAS_TOPICS, # 14 SaaS product topics ) # 组合 topic sets wall = Wonderwall(topics=ECOMMERCE_TOPICS + SUPPORT_TOPICS) ``` ## 自定义模式扩展内置的 API key 和 PII 检测模式： ``` import re from wonderwallai.patterns.api_keys import DEFAULT_API_KEY_PATTERNS wall = Wonderwall( api_key_patterns=[re.compile(r'myapp_[a-zA-Z0-9]{32}')], pii_patterns={"employee_id": re.compile(r'EMP-\d{6}')}, include_default_patterns=True, # Merge with built-in patterns ) ``` ## Verdict 如何工作每次扫描都会返回一个 `Verdict` 对象： ``` verdict = await wall.scan_inbound(message) verdict.allowed # bool — True if message passes verdict.action # "allow" | "block" | "redact" verdict.blocked_by # "semantic_router" | "sentinel_scan" | "egress_filter" | None verdict.message # The (possibly cleaned) text or block message verdict.violations # List of violation codes verdict.scores # Layer scores, e.g. {"semantic": 0.72} ``` ## 架构 ``` User Message | v [Semantic Router] ---> cosine similarity vs allowed topics (sub-ms) | v [Sentinel Scan] -----> LLM binary classifier via Groq (~100ms) | v Your LLM (GPT, Claude, Llama, etc.) | v [Egress Filter] -----> canary tokens, API keys, PII detection | v User Response (cleaned) ``` ## 托管 API 不想自建？使用 WonderwallAi 托管 API： ``` curl -X POST https://api.wonderwallai.com/v1/scan/inbound \ -H "Authorization: Bearer ww_live_abc123..." \ -H "Content-Type: application/json" \ -d '{"message": "How do I track my order?"}' ``` 计划起价为 **$0/月**（1,000 次扫描）。查看 [定价](https://buddafest.github.io/wonderwallai/#pricing)。 ## 贡献开发设置和指南请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。 ## 许可证 MIT

标签：AI防火墙, Apex, API密钥保护, Canary Tokens, DLL 劫持, DNS 反向解析, EXIF移除, Naabu, PII过滤, Prompt注入检测, Python SDK, Sysdig, 内容安全, 凭据扫描, 向量相似度, 大语言模型, 开源, 数据隐私, 文件净化, 机器学习, 网络安全, 语义路由, 逆向工具, 隐私保护, 零日漏洞检测