sudormrf-dev/llm-prompt-injection-defense

GitHub: sudormrf-dev/llm-prompt-injection-defense

针对LLM提示注入攻击提供四层纵深防御的生产级方案。

Stars: 0 | Forks: 0

# llm-prompt-injection-defense 防御 LLM 应用程序免受提示注入攻击的生产模式。四层防御覆盖完整的攻击面。 ## 攻击分类 | 攻击类型 | 防御 | |---|---| | 直接注入（"忽略之前的指令"） | `InputSanitizer` | | 间接注入（恶意网页内容） | `PromptFirewall` | | 输出泄露检测 | `OutputValidator` + `CanaryInjector` | | 人格劫持 | `OutputValidator` | ## 模式 ### `input_sanitizer.py` — 阻断已知注入模式基于正则表达式检测 16 种以上注入签名、Unicode 方向符剥离，以及使用 XML 分隔符构建结构化提示。 ``` from patterns.input_sanitizer import InputSanitizer, InjectionDetected sanitizer = InputSanitizer() try: clean = sanitizer.sanitize(user_input) except InjectionDetected as exc: return "Input rejected" # 安全变体（不引发异常） result = sanitizer.sanitize_safe(user_input) if result.warnings: log_suspicious(result.warnings) ``` ### `output_validator.py` — 检测输出中的注入成功检查模型输出是否存在系统提示泄露、人格劫持和安全绕过指示。支持格式验证（JSON）和长度边界。 ``` from patterns.output_validator import OutputValidator validator = OutputValidator( system_prompt_fragments=["confidential instructions"], expected_format="json", ) validated = validator.validate(llm_response) ``` ### `prompt_firewall.py` — 二级模型内容筛查在将外部内容（网页、文档、工具输出）包含到主提示之前进行评估。可插拔分类器（启发式或基于 LLM）。 ``` from patterns.prompt_firewall import PromptFirewall firewall = PromptFirewall(classifier_fn=my_llm_classifier) decision = await firewall.check(retrieved_web_content) if decision.is_safe: prompt = build_prompt(decision.content) ``` ### `canary_tokens.py` — 通过金丝雀字符串检测泄露在系统提示中嵌入不可见的令牌字符串。如果任何输出或工具调用参数中出现该令牌，则表示注入成功。 ``` from patterns.canary_tokens import CanaryInjector injector = CanaryInjector() system_prompt, token = injector.inject("You are a helpful assistant.") response = call_llm(system_prompt=system_prompt, user=user_input) if injector.is_leaked(response, token): alert_security_team(token.session_id) ``` ## 快速开始 ``` pip install -e ".[dev]" pytest ``` ## 纵深防御同时使用所有四层防御： 1. **Sanitize** 在用户输入到达模型前进行清理 2. **Screen** 使用防火墙筛查所有外部/检索内容 3. **Embed** 在每个会话的系统提示中嵌入金丝雀令牌 4. **Validate** 在将模型输出暴露给用户或工具执行前进行验证 ## 许可证 MIT

标签：HEURISTIC, JSON格式验证, LLM分类器, Unicode防护, XML分隔符, 内容筛查, 分层防御, 可插拔分类器, 提示注入防御, 攻防模式, 旁路检测, 检索增强生成安全, 源代码安全, 生产模式, 系统提示泄露, 结构化提示, 输入净化, 输出验证, 逆向工具, 长度边界, 防火墙