GregLevenhagen/hacking-and-securing-llms

GitHub: GregLevenhagen/hacking-and-securing-llms

一个面向 LLM 攻防与安全验证的开源 Python 演示套件，提供从本地模型到 Azure 的完整攻击与防御场景。

Stars: 0 | Forks: 0

# Hacking and Securing LLMs: Live Attacks, Broken Defenses, and Safer AI Systems [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/a5c8ffa924021806.svg)](https://github.com/GregLevenhagen/hacking-and-securing-llms/actions/workflows/ci.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) [![Python 3.11](https://img.shields.io/badge/Python-3.11-3776ab.svg)](https://www.python.org/) [![Ollama](https://img.shields.io/badge/Local%20LLM-Ollama-111111.svg)](https://ollama.ai) 一个用于探索 LLM 系统如何失效、这些失效如何被利用以及如何通过分层防御加固 AI 应用程序的开源 Python 演示套件。该仓库旨在为构建者、教育工作者和安全从业者提供实用价值，包含 30 个涵盖提示注入、RAG 攻击、代理利用、Azure AI 安全控制和架构安全模式等主题的演示。演示文稿：[hacking-and-securing-llms.pptx](hacking-and-securing-llms.pptx) ## 本仓库演示内容 - 针对聊天、RAG 和工具使用系统的直接与间接提示注入 - 失效模式，如幻觉滥用、输出处理问题、供应链投毒和权限提升 - 防御模式，包括输入净化、输出验证、审批门、安全 RAG 和架构隔离 - Azure AI 安全能力，如内容安全、提示防护、接地性检查、受保护材料检测和内容过滤器 - 基于 Flask 的统一演示中心（Demo Hub），支持实时流式传输、共享用户体验，以及易受攻击与受保护版本的并排对比 - 用于配置、Ollama 访问、Azure 集成、测试和 OpenTelemetry 仪器化的共享基础设施 ## 架构 ``` graph TD U[User or Presenter] --> H[Demo Hub] U --> S[Standalone Demo] H --> B[Blueprints for Demo 01-30] B --> L[Shared Python Library] S --> L L --> O[Local Ollama Models] L --> A[Azure AI Services] L --> T[OpenTelemetry Exporters] ``` ### 运行时模型 | 层 | 职责 | | --- | --- | | `demo-hub/` | 统一的 Flask 应用程序，在单个 UI 后暴露全部 30 个演示 | | `demo-01-*` 到 `demo-30-*` | 自包含的演示实现、提示、攻击载荷、固件和测试 | | `shared/python/` | 共享配置加载、Ollama 客户端、Azure 客户端、UI 助手、Web 助手和遥测 | | `tests/` | 集成测试和 Playwright 端到端覆盖，用于 Web 演示 | | `scripts/azure/` | Azure 支持演示的交互式配置、验证和拆卸助手 | ## 演示目录 ### 01-05 攻击演示 | 演示 | 主题 | 路径 | | --- | --- | --- | | 01 | 直接提示注入 | [demo-01-direct-prompt-injection](demo-01-direct-prompt-injection/README.md) | | 02 | 间接提示注入 | [demo-02-indirect-prompt-injection](demo-02-indirect-prompt-injection/README.md) | | 03 | RAG 投毒 | [demo-03-rag-poisoning](demo-03-rag-poisoning/README.md) | | 04 | 系统提示提取 | [demo-04-system-prompt-extraction](demo-04-system-prompt-extraction/README.md) | | 05 | 代理利用 | [demo-05-agent-exploitation](demo-05-agent-exploitation/python/) | ### 06-10 防御演示 | 演示 | 主题 | 路径 | | --- | --- | --- | | 06 | 输入净化 | [demo-06-input-sanitization](demo-06-input-sanitization/README.md) | | 07 | RAG 防御 | [demo-07-rag-defense](demo-07-rag-defense/README.md) | | 08 | 输出验证 | [demo-08-output-validation](demo-08-output-validation/README.md) | | 09 | 审批门 | [demo-09-approval-gates](demo-09-approval-gates/README.md) | | 10 | 安全架构 | [demo-10-secure-architecture](demo-10-secure-architecture/README.md) | ### 11-18 高级攻击演示 | 演示 | 主题 | 路径 | | --- | --- | --- | | 11 | 模型拒绝服务 | [demo-11-model-dos](demo-11-model-dos/python/) | | 12 | 越狱 | [demo-12-jailbreaking](demo-12-jailbreaking/python/) | | 13 | 幻觉利用 | [demo-13-hallucination-exploitation](demo-13-hallucination-exploitation/python/) | | 14 | 供应链投毒 | [demo-14-supply-chain-poisoning](demo-14-supply-chain-poisoning/python/) | | 15 | 不安全的输出处理 | [demo-15-insecure-output](demo-15-insecure-output/python/) | | 16 | 权限提升 | [demo-16-privilege-escalation](demo-16-privilege-escalation/python/) | | 17 | 多代理操纵 | [demo-17-multi-agent-manipulation](demo-17-multi-agent-manipulation/python/) | | 18 | 模型探测 | [demo-18-model-probing](demo-18-model-probing/python/) | ### 19-30 Azure 安全防御演示 | 演示 | 主题 | 路径 | | --- | --- | --- | | 19 | 内容安全 | [demo-19-content-safety](demo-19-content-safety/README.md) | | 20 | 提示防护 | [demo-20-prompt-shields](demo-20-prompt-shields/README.md) | | 21 | 接地性检测 | [demo-21-groundedness](demo-21-groundedness/README.md) | | 22 | 受保护材料检测 | [demo-22-protected-material](demo-22-protected-material/README.md) | | 23 | 自定义类别 | [demo-23-custom-categories](demo-23-custom-categories/README.md) | | 24 | 任务遵循 | [demo-24-task-adherence](demo-24-task-adherence/README.md) | | 25 | Azure OpenAI 内容过滤器 | [demo-25-aoai-content-filters](demo-25-aoai-content-filters/README.md) | | 26 | 安全 RAG | [demo-26-secure-rag](demo-26-secure-rag/README.md) | | 27 | 红队代理 | [demo-27-red-teaming](demo-27-red-teaming/README.md) | | 28 | APIM AI 网关 | [demo-28-apim-ai-gateway](demo-28-apim-ai-gateway/README.md) | | 29 | 身份与密钥保管库 | [demo-29-identity-keyvault](demo-29-identity-keyvault/README.md) | | 30 | Foundry 代理 | [demo-30-foundry-agents](demo-30-foundry-agents/README.md) | 权威的有序演示列表位于 [demo-hub/blueprints/__init__.py](demo-hub/blueprints/__init__.py)。 ## 仓库内容 ``` . |-- demo-hub/ Unified Flask UI for the full suite | |-- app.py App entrypoint | |-- blueprints/ Demo route registration and manifest | |-- templates/ Shared hub templates and demo views | `-- tests/ Hub smoke and telemetry tests |-- demo-01-*/ Attack and defense demos with local assets |-- demo-19-*/ Azure-backed demos and supporting assets |-- shared/ | |-- python/ Shared config, clients, helpers, telemetry, tests | `-- templates/ Shared Jinja templates and styling |-- scripts/azure/ Provision, check, and teardown helpers |-- tests/ | |-- e2e/ Playwright browser tests | `-- integration/ Integration coverage for Azure and telemetry flows |-- .env.example Local and Azure configuration template |-- Makefile Setup, run, test, lint, status, and utility targets |-- SETUP.md Full installation and setup guide |-- CONTRIBUTING.md Contribution guidance |-- SECURITY.md Vulnerability reporting policy `-- hacking-and-securing-llms.pptx ``` ## 先决条件 - macOS 或 Linux - 用于本地模型执行的 Ollama - 用于环境管理的 Conda 或 Miniforge - 通过 Conda 环境使用的 Python 3.11 - `make` - 可选（演示 19-30）：Azure 订阅、Azure CLI 和服务凭据 ## 快速开始 ### 1. 克隆并配置 ``` git clone https://github.com/greglevenhagen/hacking-and-securing-llms-public.git cd hacking-and-securing-llms-public cp .env.example .env ``` ### 2. 拉取本地模型 ``` ollama pull llama3.1:8b ollama pull mistral:7b ollama pull nomic-embed-text ``` 或让仓库自动完成： ``` make pull-models ``` ### 3. 创建环境 ``` make setup ``` `make setup` 会创建共享环境以及针对演示 `01` 到 `18` 的独立演示环境。Azure 演示 `19` 到 `30` 在存在 Azure 配置时使用共享环境运行。 ### 4. 启动演示中心 ``` make demo-hub ``` 打开 `http://localhost:2600`。 ## 演示中心演示中心是探索本仓库的推荐方式。它在一个 Flask 应用中运行所有演示，提供一致的界面和展示流程。关键能力： - 侧边栏导航访问全部 30 个演示 - 使用服务器发送事件（Server-Sent Events）流式传输 LLM 响应 - 关键防御演示的易受攻击与受保护版本并排对比视图 - 共享视觉设计与可复用模板 - 遥测面板与 Ollama 健康状态性 - 面向工作坊和实时演示的会话导向导航 ## 直接运行演示 ### 独立演示 ``` make demo-01 make demo-02 make demo-03 make demo-04 make demo-05 make demo-06 make demo-07 make demo-08 make demo-09 make demo-10 make demo-11 make demo-12 make demo-13 make demo-14 make demo-15 make demo-16 make demo-17 make demo-18 make demo-19 make demo-20 make demo-21 make demo-22 make demo-23 make demo-24 make demo-25 make demo-26 make demo-27 make demo-28 make demo-29 make demo-30 ``` ### Web 专用助手 ``` make demo-09-web make demo-10-web ``` ### 顺序运行多个演示 ``` make demo-all ``` ## 配置根目录的 [`.env.example`](.env.example) 涵盖三个主要区域： - 与 Ollama 兼容的本地模型设置 - 演示 19-30 的 Azure AI 服务端点和凭据 - 本地或托管追踪后端的 OpenTelemetry 设置常用变量： | 变量 | 用途 | | --- | --- | | `OLLAMA_BASE_URL` | Ollama 或另一个本地服务器的 OpenAI 兼容端点 | | `PRIMARY_MODEL` | 套件中用于通用聊天的主要模型 | | `SECONDARY_MODEL` | 在裁判或对比流程中使用的次要模型 | | `EMBEDDING_MODEL` | RAG 演示使用的嵌入模型 | | `AZURE_CONTENT_SAFETY_*` | Azure 内容安全演示 | | `AZURE_OPENAI_*` | Azure OpenAI 演示 | | `AZURE_AI_SEARCH_*` | 安全 RAG 演示 | | `OTEL_*` | 遥测导出与服务命名 | ## 测试与质量 ``` make test make test-hub make test-demo-03 make test-shared make test-e2e make lint make status ``` 这些目标涵盖的内容： - `make test`：在不依赖 Ollama 的情况下运行套件中的单元测试 - `make test-hub`：演示中心的冒烟测试 - `make test-demo-NN`：执行每个演示的测试 - `make test-e2e`：使用 Playwright 对 Web 演示进行浏览器覆盖测试 - `make lint`：对共享库进行 `mypy` 类型检查 - `make status`：展示每个演示的实现与存根状态及测试文件数量 GitHub Actions CI 在推送到 `main` 分支或拉取请求时运行单元测试、中心测试和代码检查。 ## Azure 支持的演示演示 `19` 到 `30` 围绕 Azure AI 安全与平台服务设计。它们是可选的，不会阻塞基于本地 Ollama 的演示。典型本地工作流： ``` make azure-setup make azure-check make demo-hub ``` 可用的 Azure 实用目标： - `make azure-setup`：交互式配置助手 - `make azure-check`：已配置服务的连接验证 - `make azure-teardown`：引导拆卸助手完整演练请参考 [SETUP.md](SETUP.md)。 ## 遥测与可观测性该仓库在共享库和演示中心中提供 OpenTelemetry 支持。启用遥测后，套件可将模型调用和演示流程的跟踪数据发送到任何兼容 OTLP 的后端。实用命令： ``` make telemetry-check make telemetry-check-json ``` 实现说明与后端示例位于 [shared/python/telemetry/README.md](shared/python/telemetry/README.md)。 ## 开发流程 - 使用 `make check` 验证先决条件与模型 - 仅需要共享环境、测试或演示中心时，使用 `make setup-quick` - 优先使用演示中心进行完整套件探索，并针对具体调试使用独立的 `make demo-NN` 目标 - 在打开拉取请求前运行 `make test` 和 `make lint` - 参考 [CONTRIBUTING.md](CONTRIBUTING.md) 了解贡献期望 ## 项目状态本仓库是一个活跃的开源演示套件，而非打包框架。重点是清晰度、可重现性和实时演示可用性。已良好覆盖的领域： - 攻击、防御和 Azure 安全控制的端到端演示目录 - 用于配置、测试和遥测的共享工具 - 通过演示中心提供的统一展示界面持续演进的领域： - 每个演示的深度与打磨 - 扩展 Azure 集成覆盖 - 附加工作坊资产和加固指导 ## 安全如果发现漏洞，请遵循 [SECURITY.md](SECURITY.md)。请勿公开提交敏感报告。 ## 贡献问题与拉取请求欢迎提交，只要它们符合仓库的教育与实践范围。请参考 [CONTRIBUTING.md](CONTRIBUTING.md)，保持变更聚焦，并在行为变更时包含测试。 ## 许可证本项目根据 [MIT 许可证](LICENSE) 授权。

标签：AI代理, AI风险缓解, Azure AI防御, Flask, GET参数, LLM评估, Naabu, Ollama, OpenTelemetry, Python, RAG安全, Red Canary, SEO词, 供应链投毒, 侧对比, 共享基础设施, 内容安全, 内容过滤器, 协议分析, 受保护材料检测, 安全攻防, 安全架构, 幻觉攻击, 开源, 接地性检查, 提示护盾, 提示注入, 无后门, 权限提升, 沙箱执行, 流式处理, 漏洞演示, 演示套件, 用户代理, 统一Demo中心, 蓝本, 输入过滤, 输出验证, 逆向工具, 防御模式, 集群管理