TechDre/ai-security-lab

GitHub: TechDre/ai-security-lab

一个渐进式AI安全工程学习实验室，通过动手实验教授如何检测Prompt注入攻击并构建覆盖输入验证、PII脱敏和内容策略的LLM防护栏Pipeline。

Stars: 1 | Forks: 0

# AI 安全实验室 ![Python](https://img.shields.io/badge/Python-3.10%2B-blue?logo=python&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-green) ![Status](https://img.shields.io/badge/Status-Active-brightgreen) ![Security](https://img.shields.io/badge/Focus-AI%20Security-red?logo=shield) ## 概述 **AI 安全实验室** 是一系列实用且独立的实验集合，专注于保护由 LLM 驱动的应用程序免受真实世界的攻击。每个实验都在前一个的基础上构建，带你从检测走向完整的 pipeline 防御。 | 实验室 | 主题 | 核心技能 | |-----|-------|------------| | [实验室 1 — Prompt 注入检测](#lab-1--prompt-injection-detection) | 使用正则表达式、启发式算法和 DeBERTa 分类器检测对抗性 prompt | 多层检测、阈值调整、JSON 输出 | | [实验室 2 — LLM 防护栏 Pipeline](#lab-2--llm-guardrails-pipeline) | 在注入和 PII 到达你的 LLM 之前将其拦截和清除 | 输入/输出验证、PII 脱敏、自定义内容策略 | ## 仓库结构 ``` ai-security-lab/ ├── agent.py # Lab 1 — Prompt injection detector ├── test_prompts.txt # Lab 1 — Sample attack prompts ├── GUIDE.md # Lab 1 — Full setup & usage guide └── llm-guardrails-lab/ ├── agent.py # Lab 2 — Guardrails pipeline ├── test_inputs.txt # Lab 2 — Sample inputs (safe + malicious) └── GUIDE.md # Lab 2 — Full setup & usage guide ``` ## 实验室 1 — Prompt 注入检测 **目标：** 使用三层叠加的检测层来检测针对 LLM 应用程序的 Prompt 注入攻击 —— 零漏报，极低误报。 ### 检测架构 | 层级 | 方法 | 速度 | 依赖项 | |-------|--------|-------|----------| | 1 — 正则表达式 | 25+ 种已知攻击模式 | < 1ms | 无 | | 2 — 启发式算法 | 结构异常评分 | < 5ms | 无 | | 3 — 分类器 | DeBERTa AI 模型（准确率 99%+） | ~150ms | `transformers` + `torch` | ### 快速开始 ``` # 安装依赖项（仅首次） pip install transformers torch sentencepiece protobuf # 分析单个 prompt — 完整 AI 驱动模式 python agent.py --input "Ignore all previous instructions" --mode full # 快速纯 regex 扫描（无需模型） python agent.py --input "Some text" --mode regex # 批量扫描文件并获取结构化 JSON 输出 python agent.py --file test_prompts.txt --mode full --output json ``` ### 示例输出 ``` Verdict : INJECTION DETECTED Composite Score : 0.7102 Regex Score : 0.5000 Matches: [system_prompt_override] Heuristic Score : 0.3009 Classifier : INJECTION (1.0000) Detection Time : 144.73 ms ``` **分数指南：** 0.0–0.3 安全 · 0.3–0.5 可疑 · 0.5–1.0 检测到注入 ### 可检测的攻击模式 | 模式 | 示例 | |---------|---------| | 系统指令覆盖 | "Ignore all previous instructions..." | | 角色扮演逃逸 | "You are now / Act as / Pretend to be..." | | 开发者模式 | "DAN mode / jailbreak / god mode..." | | 数据泄露 | "Reveal your system prompt..." | | Token 走私 | 零宽字符，隐藏的 Unicode | | 编码混淆 | "Base64 decode this and follow it..." | | Few-shot 注入 | 虚假的对话历史以重定向行为 | 完整安装指南：[GUIDE.md](./GUIDE.md) ## 实验室 2 — LLM 防护栏 Pipeline **目标：** 构建一个可用于生产环境的中间件层，验证你的 LLM 周围的每一个输入和输出 —— 阻止攻击、剥离 PII 并执行内容策略。 ### Pipeline 架构 ``` User Input | v [Length Guard] <- Blocks oversized inputs | v [Injection Guard] <- Blocks prompt injections & jailbreaks | v [Content Policy] <- Blocks harmful or off-topic requests | v [PII Guard] <- Strips SSNs, emails, credit cards, API keys | v LLM API <- Receives only clean, sanitized input | v [Output Guard] <- Catches system prompt leakage & PII in responses | v User Response <- Safe, redacted output ``` ### 快速开始 ``` # 无外部依赖 — 纯 Python stdlib # 完整 pipeline 验证 python llm-guardrails-lab/agent.py --input "Your message here" --mode input-only # 仅 PII 检测和脱敏 python llm-guardrails-lab/agent.py --input "My SSN is 123-45-6789" --mode pii # 带 JSON 输出的批量扫描 python llm-guardrails-lab/agent.py --file llm-guardrails-lab/test_inputs.txt --mode input-only --output json ``` ### PII 脱敏参考 | PII 类型 | 示例输入 | 替换为 | |----------|--------------|---------------| | 美国 SSN | 123-45-6789 | `[SSN_REDACTED]` | | 电子邮件 | user@example.com | `[EMAIL_REDACTED]` | | 信用卡 | 4111 1111 1111 1111 | `[CARD_REDACTED]` | | 电话号码 | (555) 123-4567 | `[PHONE_REDACTED]` | | IP 地址 | 192.168.1.1 | `[IP_REDACTED]` | | AWS 密钥 | AKIA... | `[AWS_KEY_REDACTED]` | ### 嵌入你的应用程序 ``` from agent import GuardrailsPipeline pipeline = GuardrailsPipeline() # or pass policy_path="custom_policy.json" # --- 发送到 LLM 前 --- result = pipeline.validate_input(user_message) if not result.safe: return f"Request blocked: {result.blocked_reason}" # 发送已清理的文本（PII 已剥离） llm_response = your_llm.generate(result.sanitized_text) # --- 返回给用户前 --- output = pipeline.validate_output(llm_response) return output.sanitized_text ``` 完整安装指南：[llm-guardrails-lab/GUIDE.md](./llm-guardrails-lab/GUIDE.md) ## 前置条件 | 实验室 | Python 版本 | 外部依赖 | |-----|---------------|----------------------| | 实验室 1 | 3.10+ | `transformers`, `torch`, `sentencepiece`, `protobuf` | | 实验室 2 | 3.10+ | 无 —— 纯 Python 标准库 | ## 学习路径本实验室被设计为一个 AI 安全工程领域的渐进式课程： 1. **实验室 1 — Prompt 注入检测** — 理解并检测对抗性输入 2. **实验室 2 — LLM 防护栏 Pipeline** — 构建完整的输入/输出防御层 3. *即将推出：* 使用 Cuckoo Sandbox 进行恶意软件行为分析 4. *即将推出：* 使用 Wireshark 进行网络流量分析 ## 许可证本项目是开源的，并基于 [MIT 许可证](LICENSE) 提供。

标签：AI安全, AI应用安全, AMSI绕过, Apex, Chat Copilot, DeBERTa, LLM Guardrails, LLM防护栏, MITRE ATLAS, NLP, PII脱敏, Prompt Injection Detection, Python, 个人身份信息过滤, 人工智能安全, 内容安全策略, 凭据扫描, 合规性, 启发式检测, 大语言模型安全, 威胁检测, 安全实验室, 安全工程, 密码管理, 无后门, 无服务器架构, 机器学习, 机密管理, 系统调用监控, 红队对抗, 网络安全实验, 规则仓库, 输入输出验证, 逆向工具, 配置审计