vladlen-codes/llm-security-toolkit

GitHub: siphalion/quisium

一个Python中间件库，在应用与LLM提供商之间拦截每次调用，提供提示词注入检测、输出安全扫描和工具调用验证。

Stars: 4 | Forks: 2

# LLM 安全工具包 ### 架构与详细技术规范 ## 目录 1. [这是什么项目？](#1-what-is-this-project) 2. [高层架构](#2-high-level-architecture) 3. [仓库结构](#3-repository-structure) 4. [核心类型与模型](#4-core-types--models) 5. [策略引擎](#5-policy-engine) 6. [防护层](#6-the-guards-layer) 7. [提供商层](#7-providers-layer) 8. [中间件层](#8-middleware-layer) 9. [日志与异常](#9-logging--exceptions) 10. [公共 API 接口](#10-public-api-surface) 11. [测试、示例与文档](#11-tests-examples--docs) 12. [扩展性与设计原则](#12-extensibility--design-principles) 13. [未来路线图](#13-future-roadmap) ## 1. 这是什么项目？ **LLM Security Toolkit** 是一个 Python 中间件库，位于您的应用程序代码和任何 LLM 提供商之间 —— 拦截每一次模型调用，以在 AI 响应之前和之后强制执行安全检查。可以将其视为专为 AI 调用设计的安全防火墙： - 在每个提示词（prompt）到达模型之前，扫描其中的**注入和越狱模式** - 验证每个响应是否存在**不安全内容、凭证泄露或危险命令** - 对模型尝试进行的每次工具/函数调用强制执行**schema 规则** - 应用**可配置策略**来决定是拦截、警告还是记录该库暴露了一个干净、可导入的 API —— 在任何现有的 Python AI 应用中只需增加几行代码。无需更改基础设施。 | 属性 | 值 | |---|---| | 类型 | Python 库（可导入包，可通过 pip 安装） | | 目的 | 应用代码与 LLM 提供商之间的安全中间件 | | 主要接口 | 装饰器 / 上下文管理器 / 提供商封装器 | | 防护 | 提示词注入、不安全输出、危险工具调用 | | 策略引擎 | YAML 或 Python dict —— 每个端点的策略 | | 提供商支持 | OpenAI (v1)，通用可调用对象，可扩展至 Claude, Gemini | | 框架支持 | FastAPI（原生），Flask（计划中） | | 返回类型 | `GuardDecision { allowed, score, reasons, safe_output }` | ## 2. 高层架构 ### 2.1 系统概览该工具包组织为**五个不同的层**，每一层都有明确界定的职责： ``` ┌─────────────────────────────────────────────────────────────┐ │ Your Application │ └────────────────────────────┬────────────────────────────────┘ │ ┌────────────────────────────▼────────────────────────────────┐ │ Middleware Layer (FastAPI / Flask) │ │ Dependency injection or global middleware │ └────────────────────────────┬────────────────────────────────┘ │ ┌────────────────────────────▼────────────────────────────────┐ │ Providers Layer │ │ OpenAIProvider / GenericProvider (adapters) │ └──────┬─────────────────────┴──────────────────┬────────────┘ │ │ ┌──────▼──────┐ ┌───────▼──────┐ │ Guards │ │ Guards │ │ (Input) │ │ (Output) │ │ prompts.py │ │ outputs.py │ │ tools.py │ │ tools.py │ └──────┬──────┘ └───────┬──────┘ │ │ ┌──────▼─────────────────────────────────────────▼────────────┐ │ Policy Engine │ │ Policy | config.py | policies.py │ └─────────────────────────────────────────────────────────────┘ ``` ### 2.2 请求 / 响应流每次受保护的 LLM 调用都遵循一个**六阶段流水线**： | # | 阶段 | 发生了什么 | |---|---|---| | 1 | 应用调用受保护的提供商 | `guarded_openai_chat(prompt, tools, policy)` | | 2 | 输入扫描器运行 | 对 prompt + system instructions 执行 `scan_prompt()` | | 3 | 风险决策 | 基于策略阈值执行 Block / Warn / Allow | | 4 | 转发给 LLM | 对 OpenAI / 本地模型的真实 API 调用 | | 5 | 输出扫描器运行 | `scan_output()` + `validate_tool_call()` | | 6 | 返回 GuardDecision | `{ allowed, score, reasons, safe_output }` | ## 3. 仓库结构该项目遵循 **src-layout** 约定以避免导入冲突，并反映了其五个内部层之间的关注点分离： ``` llm-security-toolkit/ ├── README.md # Project overview and quick-start ├── CONTRIBUTING.md # Fork & contribution guide ├── CODE_OF_CONDUCT.md # Community standards ├── LICENSE # MIT (encourages forks) ├── pyproject.toml # Build config, deps, tool settings ├── .pre-commit-config.yaml # ruff, black, mypy on every commit ├── .github/ │ └── workflows/ci.yml # Tests + lint on push / PR │ ├── src/llm_security/ # Main package (src layout) │ ├── __init__.py # Public re-exports │ ├── types.py # ScanResult, GuardDecision, ToolCall │ ├── policies.py # Policy models + built-in presets │ ├── config.py # YAML / dict → Policy loaders │ ├── exceptions.py # BlockedByPolicyError, etc. │ ├── logging.py # log_decision() + hooks │ ├── guards/ │ │ ├── prompts.py # Input / injection guards │ │ ├── outputs.py # Output / content guards │ │ └── tools.py # Tool-call validation guards │ ├── providers/ │ │ ├── base.py # ProviderAdapter ABC │ │ ├── openai.py # OpenAI concrete adapter │ │ └── generic.py # Generic callable adapter │ └── middleware/ │ ├── fastapi.py # FastAPI dependency + middleware │ └── flask.py # Flask (planned) │ ├── tests/ # Pytest test suite ├── examples/ # Runnable minimal examples └── docs/ # MkDocs documentation ``` ## 4. 核心类型与模型库的每一部分都使用相同的三种数据结构。这些是整个包的*通用语言*。 ### ScanResult 单个防护检查的输出。每个防护函数都会返回其中一个： ``` @dataclass class ScanResult: allowed: bool # True = safe to proceed score: float # 0.0 (safe) → 1.0 (critical risk) reasons: List[str] # Human-readable explanations safe_output: Optional[str] # Redacted text (output guards only) ``` ### GuardDecision 返回给您应用程序的顶层结果 —— 聚合了所有活动防护的所有 `ScanResult`： ``` @dataclass class GuardDecision: allowed: bool score: float reasons: List[str] safe_output: Optional[str] scan_results: List[ScanResult] # Full audit trail ``` ### ToolCall 表示模型请求的结构化工具/函数调用： ``` @dataclass class ToolCall: name: str # e.g. 'read_file' args: Dict # e.g. { 'path': '/etc/passwd' } schema: Dict # JSON Schema the args must conform to ``` ## 5. 策略引擎 `Policy` 是控制整个安全管道的唯一配置对象。每个 guard、每个 provider、每个 middleware 都从中读取配置。 ### 5.1 Policy 结构 ``` class Policy(BaseModel): # Guard toggles prompt_guard_enabled: bool = True output_guard_enabled: bool = True tool_guard_enabled: bool = True # Thresholds (0.0 – 1.0) block_threshold: float = 0.75 # Score above this → block warn_threshold: float = 0.40 # Score above this → log warning # Allowed tool names (None = allow all) allowed_tools: Optional[List[str]] = None # On block: raise exception OR return GuardDecision raise_on_block: bool = True ``` ### 5.2 内置策略预设 | Policy | 行为 | 最适用于 | |---|---|---| | `StrictPolicy` | 阻断任何风险信号 | 生产环境，敏感应用 | | `BalancedPolicy` | 阻断高风险，警告中风险 | 标准应用（默认） | | `LoggingOnlyPolicy` | 从不阻断 —— 仅记录日志 | 开发 / 测试 | ### 5.3 加载 Policies ``` # 从 YAML 文件 (推荐生产环境使用) policy = load_policy_from_yaml('policies/production.yaml') # 从 dict (适用于测试) policy = load_policy_from_dict({ 'block_threshold': 0.8, 'allowed_tools': ['read_file', 'search_web'], }) ``` ## 6. 防护层 Guards 是工具包的**安全大脑**。每个 guard 模块都小巧、专注且可独立测试 —— 旨在易于 fork 和扩展新的检测规则。 | Guard Module | 检测模式 | 类别 | 默认动作 | |---|---|---|---| | Prompt Guard | `"ignore previous instructions"` | 注入 | 阻断或警告 | | Prompt Guard | `"pretend you are the system"` | 越狱 | 阻断或警告 | | Prompt Guard | 请求揭示隐藏上下文 | 数据窃取 | 阻断 | | Output Guard | API 密钥、token、密码 | 机密泄露 | 脱敏 + 警告 | | Output Guard | Shell 命令（`rm -rf`, `curl` 等） | OS 命令 | 阻断 | | Output Guard | 自残或恶意软件说明 | 内容 | 阻断 | | Tool Guard | 无效的工具名称 | Schema | 阻断 | | Tool Guard | `rm -rf /` 或管理员 API 调用 | 危险操作 | 阻断 | | Tool Guard | 参数与 schema 不匹配 | 验证 | 阻断 | ### 6.1 Prompt Guard — `guards/prompts.py` 在任何 API 调用**之前**运行。扫描用户提示词和系统指令，寻找表明试图颠覆模型行为的模式。 ``` def scan_prompt(prompt: str, policy: Policy) -> ScanResult: """ Heuristic patterns checked: - 'ignore previous instructions' / 'disregard above' - 'you are now the system prompt' - 'repeat everything above' (context exfiltration) - 'DAN' jailbreak variants - Base64 encoded instructions Returns ScanResult with score + reasons. """ ``` ### 6.2 Output Guard — `guards/outputs.py` 在响应的每个 token 到达您的应用程序之前运行。可以选择**脱敏**敏感内容，而不是直接阻断。 ``` def scan_output(text: str, policy: Policy) -> ScanResult: """ Patterns checked: - Credential regexes (API keys, JWTs, SSH private keys) - Shell command patterns (rm, curl, wget, sudo) - Malware / ransomware indicators - Self-harm or violence instructions safe_output field will contain redacted version if score < block_threshold. """ ``` ### 6.3 Tool Call Guard — `guards/tools.py` 拦截模型想要进行的每次函数/工具调用，并根据策略的白名单和工具的 JSON schema 对其进行验证。 ``` def validate_tool_call(call: ToolCall, policy: Policy) -> ScanResult: """ Checks applied: - Tool name in policy.allowed_tools (if allowlist defined) - Args validate against call.schema (jsonschema) - Blocked operation patterns (file deletion, network scanning) - Internal admin API URL detection """ ``` ## 7. 提供商层 Providers 封装了真实的 LLM 客户端。它们编排完整的防护管道 —— 输入扫描 → 转发 → 输出扫描 —— 并向调用者返回一个 `GuardDecision`。 | 文件 | 职责 | |---|---| | `providers/base.py` | 抽象基类 `ProviderAdapter`。定义所有 providers 必须实现的 `chat()` 接口。 | | `providers/openai.py` | 封装 OpenAI Python SDK 的具体适配器。在每次 `chat()` 调用周围自动运行所有 guards。 | | `providers/generic.py` | 接受任何可调用对象作为 LLM。用户传递自己的客户端函数；适配器处理其周围的完整防护流程。 | ### 7.1 ProviderAdapter 接口 ``` class ProviderAdapter(ABC): @abstractmethod def chat( self, *, messages: List[Dict], tools: Optional[List[Dict]] = None, policy: Optional[Policy] = None, ) -> GuardDecision: ... ``` ### 7.2 OpenAI 适配器流程 1. 对所有 user + system 消息执行 `scan_prompt()` 2. 如果允许 → 调用 `openai.chat.completions.create(...)` 3. 对响应内容执行 `scan_output()` 4. 对模型请求的任何 `tool_calls` 执行 `validate_tool_call()` 5. 返回聚合所有结果的 `GuardDecision` ## 8. 中间件层中间件层使得保护整个 HTTP 端点变得非常简单，几乎无需更改代码。 ### 8.1 FastAPI 集成 ``` # 依赖注入 — 保护所有对 /chat 的调用 def get_guarded_openai(policy: Policy = BalancedPolicy()): return OpenAIProvider(policy=policy) @app.post('/chat') async def chat( req: ChatRequest, provider: OpenAIProvider = Depends(get_guarded_openai), ): decision = provider.chat(messages=req.messages) if not decision.allowed: raise HTTPException(400, detail=decision.reasons) return { 'reply': decision.safe_output } ``` ### 8.2 Middleware vs Dependency | 方式 | 最适用于 | |---|---| | Dependency (`Depends`) | 每个路由的策略。每个端点注入不同的 provider。最灵活。 | | Middleware class | 应用于每个请求的全局策略。适用于组织范围的默认设置。 | | Flask middleware | 计划在 v1.1 中推出。适配 Flask `before/after_request` 钩子的相同模式。 | ## 9. 日志与异常 ### 9.1 结构化日志 — `logging.py` 每个 `GuardDecision` 都可以传递给 `log_decision()`，该函数发出与任何日志后端兼容的结构化 JSON 日志条目： ``` def log_decision(decision: GuardDecision, logger: logging.Logger) -> None: logger.info({ 'allowed': decision.allowed, 'score': decision.score, 'reasons': decision.reasons, 'timestamp': datetime.utcnow().isoformat(), }) # 未来: OpenTelemetry spans, Datadog trace hooks ``` ### 9.2 异常层次结构 — `exceptions.py` | Exception | 触发时机 | |---|---| | `BlockedByPolicyError` | Prompt 或 output 超过 `block_threshold` 且 `raise_on_block=True` | | `InvalidToolCallError` | 工具名称不在白名单中，或参数未通过 schema 验证 | ## 10. 公共 API 接口顶层包重新导出用户所需的一切。不公开任何特定于实现的内容： ``` from .providers.openai import OpenAIProvider from .providers.generic import GenericProvider from .policies import StrictPolicy, BalancedPolicy, LoggingOnlyPolicy from .config import load_policy_from_dict, load_policy_from_yaml from .types import ScanResult, GuardDecision, ToolCall from .exceptions import BlockedByPolicyError, InvalidToolCallError __all__ = [ 'OpenAIProvider', 'GenericProvider', 'StrictPolicy', 'BalancedPolicy', 'LoggingOnlyPolicy', 'load_policy_from_dict', 'load_policy_from_yaml', 'ScanResult', 'GuardDecision', 'ToolCall', 'BlockedByPolicyError', 'InvalidToolCallError', ] ``` ## 11. 测试、示例与文档 ### 11.1 测试套件 — `tests/` | 文件 | 覆盖范围 | |---|---| | `test_policies.py` | 从 dict 和 YAML 加载 Policy，阈值逻辑，预设验证 | | `test_guards_prompts.py` | 每种注入和越狱模式：通过和失败案例 | | `test_guards_outputs.py` | 凭证正则表达式，OS 命令模式，内容类别 | | `test_guards_tools.py` | Schema 验证，白名单强制执行，被阻断的操作 | | `test_providers_openai.py` | 带有模拟 API 的 OpenAI 适配器 —— 全流水线测试 | | `test_middleware_fastapi.py` | FastAPI TestClient 集成 —— 依赖注入 | ### 11.2 示例 — `examples/` - `basic_openai_guard.py` — 15 行代码的最小 OpenAI 防护 - `fastapi_endpoint_guard.py` — 带有策略注入的完整 FastAPI 端点 - `custom_policy_example.py` — 编写和加载自定义 YAML 策略 ### 11.3 文档 — `docs/` - `getting-started.md` — 安装，首次调用，首个策略 - `configuration.md` — 完整的 Policy 参考和 YAML schema - `providers.md` — 如何添加新的 ProviderAdapter - `middleware.md` — FastAPI 和 Flask 集成指南 - `contributing.md` — 添加新的 guard 规则，运行测试 ## 12. 扩展性与设计原则该工具包经过刻意设计，对 **fork 友好**且**易于贡献**。这些原则指导着每一个架构决策： ### 小而专注的 Guards 每个 guard 函数都是单一职责的 Python 函数。添加新的检测规则意味着添加一个函数和一个测试 —— 无需浏览类层次结构。 ### 策略优先设计所有安全决策都通过 `Policy` 对象进行。运维人员可以通过更改配置文件来改变安全态势（严格 vs. 仅记录日志）—— 无需更改代码。 ### Provider 抽象 `ProviderAdapter` ABC 意味着可以封装任何 LLM 客户端。添加 Claude、Gemini 或本地 Ollama 模型只需要实现一个方法：`chat()`。 ### 零基础设施要求该工具包是一个纯 Python 包。没有 sidecar，没有 agent，没有 proxy。它在与您现有应用相同的进程中运行。 ## 13. 未来路线图 | 版本 | 功能 | 状态 | |---|---|---| | v1.0 | OpenAI 适配器 + prompt/output/tool guards + FastAPI | 计划中 | | v1.1 | Anthropic (Claude) provider adapter | 计划中 | | v1.1 | Flask 中间件 | 计划中 | | v1.2 | OpenTelemetry 追踪集成 | 构想中 | | v1.3 | Gemini + 本地模型 (Ollama) 适配器 | 构想中 | | v2.0 | 可选的托管 SaaS 网关配对 | 未来 | *LLM Security Toolkit — 架构文档 v1.0*

标签：AI基础设施, AI防火墙, AppSec, CISA项目, LLM护栏, Naabu, OpenAI安全, Schema验证, 人工智能安全, 内容安全, 函数调用过滤, 合规性, 大语言模型安全, 安全中间件, 安全工具包, 安全规则引擎, 提示词注入防护, 敏感数据泄露, 机密管理, 策略引擎, 网络安全, 网络安全挑战, 越狱检测, 逆向工具, 防御 evasion, 隐私保护