NVIDIA-NeMo/Guardrails

GitHub: NVIDIA-NeMo/Guardrails

NVIDIA 开源的可编程护栏工具包，用于为基于大模型的对话应用添加安全、可控的防护机制。

Stars: 6438 | Forks: 729

# Neemo Guardrails [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![PyPI](https://img.shields.io/pypi/v/nemoguardrails)](https://pypi.org/project/nemoguardrails) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/nemoguardrails)](https://pypi.org/project/nemoguardrails) [![Tests/Linux](https://img.shields.io/github/actions/workflow/status/NVIDIA-NeMo/Guardrails/pr-tests.yml?logo=github&label=Tests%2FLinux)](https://github.com/NVIDIA-NeMo/Guardrails/actions/workflows/pr-tests.yml) [![Tests/Windows](https://img.shields.io/github/actions/workflow/status/NVIDIA-NeMo/Guardrails/full-tests.yml?logo=github&label=Tests%2FWindows)](https://github.com/NVIDIA-NeMo/Guardrails/actions/workflows/full-tests.yml) [![Tests/macOS](https://img.shields.io/github/actions/workflow/status/NVIDIA-NeMo/Guardrails/full-tests.yml?logo=github&label=Tests%2FmacOS)](https://github.com/NVIDIA-NeMo/Guardrails/actions/workflows/full-tests.yml) [![Lint](https://img.shields.io/github/actions/workflow/status/NVIDIA-NeMo/Guardrails/lint.yml?logo=github&label=Lint)](https://github.com/NVIDIA-NeMo/Guardrails/actions/workflows/lint.yml) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![Documentation](https://img.shields.io/badge/docs-nvidia.com-blue.svg)](https://docs.nvidia.com/nemo/guardrails) [![arXiv](https://img.shields.io/badge/cs.CL-arXiv%3A2310.10501-b31b1b.svg)](https://arxiv.org/abs/2310.10501) [![Downloads](https://static.pepy.tech/badge/nemoguardrails)](https://pepy.tech/project/nemoguardrails) [![Downloads](https://static.pepy.tech/badge/nemoguardrails/month)](https://pepy.tech/project/nemoguardrails) ✨✨✨ 📌 **NeMo Guardrails 官方文档已移至 [docs.nvidia.com/nemo/guardrails](https://docs.nvidia.com/nemo/guardrails)。** ✨✨✨ NeMo Guardrails 是一个开源工具包，用于轻松地为基于 LLM 的对话式应用程序添加*可编程护栏*。护栏（简称 "rails"）是控制大型语言模型输出的特定方式，例如不谈论政治、以特定方式回应用户请求、遵循预定义的对话路径、使用特定的语言风格、提取结构化数据等。 [本文](https://arxiv.org/abs/2310.10501) 介绍了 NeMo Guardrails，并包含该系统的技术概览及当前评估。 ## 环境要求 Python 3.10, 3.11, 3.12 或 3.13。 NeMo Guardrails 使用 [annoy](https://github.com/spotify/annoy)，这是一个带有 Python 绑定的 C++ 库。要安装 NeMo Guardrails，您需要安装 C++ 编译器和开发工具。请查看 [安装指南](https://docs.nvidia.com/nemo/guardrails/getting-started/installation-guide.html#prerequisites) 获取针对特定平台的说明。 ## 安装使用 pip 进行安装： ``` > pip install nemoguardrails ``` 有关更详细的说明，请参阅 [安装指南](https://docs.nvidia.com/nemo/guardrails/getting-started/installation-guide.html)。 ## 概述 NeMo Guardrails 使构建基于 LLM 应用程序的开发者能够轻松地在应用程序代码和 LLM 之间添加**可编程护栏**。

添加*可编程护栏*的主要好处包括： - **构建值得信赖、安全且可靠的基于 LLM 的应用程序：** 您可以定义护栏来引导和 safeguard 对话；您可以选择定义基于 LLM 的应用程序在特定主题上的行为，并防止其参与不需要的主题讨论。 - **安全地连接模型、链和其他服务：** 您可以将 LLM 无缝且安全地连接到其他服务（也称为 tools）。 - **可控对话**：您可以引导 LLM 遵循预定义的对话路径，允许您按照对话设计最佳实践设计交互，并强制执行标准操作流程（例如：身份验证、支持）。 ### 防范 LLM 漏洞 NeMo Guardrails 提供了多种机制来保护基于 LLM 的聊天应用程序免受常见 LLM 漏洞（如越狱和提示注入）的侵害。以下是本仓库中包含的示例 [ABC Bot](./examples/bots/abc) 在不同护栏配置下所提供的保护概览。有关更多详细信息，请参阅 [LLM 漏洞扫描](https://docs.nvidia.com/nemo/guardrails/evaluation/llm-vulnerability-scanning.html) 页面。

### 用例您可以在不同类型的用例中使用可编程护栏： 1. **问答系统**，基于一组文档（也称为检索增强生成 RAG）：强制执行事实核查和输出审核。 2. **特定领域助手**（也称为聊天机器人）：确保助手保持专注话题并遵循设计的对话流程。 3. **LLM 端点**：为您的自定义 LLM 添加护栏，以实现更安全的客户交互。 4. **LangChain Chains**：如果您在任何用例中使用 LangChain，您可以在链周围添加一个护栏层。 ### 用法要向您的应用程序添加可编程护栏，您可以使用 Python API 或护栏服务器（有关更多详细信息，请参阅 [服务器指南](https://docs.nvidia.com/nemo/guardrails/user-guides/server-guide.html)）。使用 Python API 类似于直接使用 LLM。调用护栏层而不是 LLM 只需要对代码库进行极少的修改，并且涉及两个简单的步骤： 1. 加载护栏配置并创建一个 `LLMRails` 实例。 2. 使用 `generate`/`generate_async` 方法调用 LLM。 ``` from nemoguardrails import LLMRails, RailsConfig # 从指定路径加载 guardrails 配置。 config = RailsConfig.from_path("PATH/TO/CONFIG") rails = LLMRails(config) completion = rails.generate( messages=[{"role": "user", "content": "Hello world!"}] ) ``` 示例输出： ``` {"role": "assistant", "content": "Hi! How can I help you?"} ``` `generate` 方法的输入和输出格式类似于 OpenAI 的 [Chat Completions API](https://platform.openai.com/docs/guides/gpt/chat-completions-api)。 #### Async API NeMo Guardrails 是一个异步优先（async-first）的工具包，因为其核心机制是使用 Python async 模型实现的。公共方法同时具有同步和异步版本。例如：`LLMRails.generate` 和 `LLMRails.generate_async`。 ### 支持的 LLM 您可以将 NeMo Guardrails 与多种 LLM 一起使用，如 OpenAI GPT-3.5, GPT-4, LLaMa-2, Falcon, Vicuna 或 Mosaic。有关更多详细信息，请查看配置指南中的 [支持的 LLM 模型](https://docs.nvidia.com/nemo/guardrails/user-guides/configuration-guide.html#supported-llm-models) 部分。 ### 护栏类型 NeMo Guardrails 支持五种主要类型的护栏：

1. **输入护栏（Input rails）**：应用于用户的输入；输入护栏可以拒绝输入，停止任何进一步的处理，或者更改输入（例如，掩盖潜在的敏感数据，改写）。 2. **对话护栏（Dialog rails）**：影响 LLM 的提示方式；对话护栏作用于规范形式的消息（详见 [Colang 指南](https://docs.nvidia.com/nemo/guardrails/user-guides/colang-language-syntax-guide.html)）），并确定是否应执行操作、是否应调用 LLM 生成下一步或响应、是否应使用预定义的响应等。 3. **检索护栏（Retrieval rails）**：在 RAG（检索增强生成）场景中应用于检索到的块；检索护栏可以拒绝某个块，阻止其用于提示 LLM，或者更改相关的块（例如，掩盖潜在的敏感数据）。 4. **执行护栏（Execution rails）**：应用于需要由 LLM 调用的自定义操作（也称为 tools）的输入/输出。 5. **输出护栏（Output rails）**：应用于 LLM 生成的输出；输出护栏可以拒绝输出，阻止其返回给用户，或者更改它（例如，删除敏感数据）。 ### 护栏配置护栏配置定义了要使用的 **LLM** 和**一个或多个护栏**。一个护栏配置可以包括任意数量的输入/对话/输出/检索/执行护栏。一个没有任何配置护栏的配置本质上会将请求转发给 LLM。护栏配置文件夹的标准结构如下所示： ``` . ├── config │ ├── actions.py │ ├── config.py │ ├── config.yml │ ├── rails.co │ ├── ... ``` `config.yml` 包含所有常规配置选项，例如 LLM 模型、活动护栏和自定义配置数据。`config.py` 文件包含任何自定义初始化代码，`actions.py` 包含任何自定义 Python 操作。有关完整概述，请参阅 [配置指南](https://docs.nvidia.com/nemo/guardrails/user-guides/configuration-guide.html)。以下是一个 `config.yml` 示例： ``` # config.yml models: - type: main engine: openai model: gpt-3.5-turbo-instruct rails: # Input rails are invoked when new input from the user is received. input: flows: - check jailbreak - mask sensitive data on input # Output rails are triggered after a bot message has been generated. output: flows: - self check facts - self check hallucination - activefence moderation on input config: # Configure the types of entities that should be masked on user input. sensitive_data_detection: input: entities: - PERSON - EMAIL_ADDRESS ``` 护栏配置中包含的 `.co` 文件包含 Colang 定义（有关 Colang 的快速概述，请参阅下一节），这些定义定义了各种类型的护栏。以下是一个 `greeting.co` 文件示例，它定义了用于问候用户的对话护栏。 ``` define user express greeting "Hello!" "Good afternoon!" define flow user express greeting bot express greeting bot offer to help define bot express greeting "Hello there!" define bot offer to help "How can I help you today?" ``` 以下是针对侮辱性内容的对话护栏的 Colang 定义附加示例： ``` define user express insult "You are stupid" define flow user express insult bot express calmly willingness to help ``` ### Colang 为了配置和实现各种类型的护栏，该工具包引入了 **Colang**，这是一种专门为设计灵活且可控的对话流而创建的建模语言。Colang 具有类似 Python 的语法，旨在简单直观，特别是对于开发者而言。 ``` Two versions of Colang, 1.0 and 2.0, are supported and Colang 1.0 is the default. ``` 有关 Colang 1.0 语法的简要介绍，请参阅 [Colang 1.0 语言语法指南](https://docs.nvidia.com/nemo/guardrails/user-guides/colang-language-syntax-guide.html)。要开始使用 Colang 2.0，请参阅 [Colang 2.0 文档](https://docs.nvidia.com/nemo/guardrails/colang-2/overview.html)。 ### 护栏库 NeMo Guardrails 附带一组[内置护栏](https://docs.nvidia.com/nemo/guardrails/user-guides/guardrails-library.html)。 ``` The built-in guardrails may or may not be suitable for a given production use case. As always, developers should work with their internal application team to ensure guardrails meets requirements for the relevant industry and use case and address unforeseen product misuse. ``` 该库包括用于 LLM 自我检查（输入/输出审核、事实核查、幻觉检测）、NVIDIA 安全模型（内容安全、主题安全）、越狱和注入检测的护栏，以及与社区模型和第三方 API 的集成。有关完整列表，请参阅 [护栏库文档](https://docs.nvidia.com/nemo/guardrails/user-guides/guardrails-library.html)。 ## CLI NeMo Guardrails 还附带内置的 CLI。 ``` $ nemoguardrails --help Usage: nemoguardrails [OPTIONS] COMMAND [ARGS]... actions-server Start a NeMo Guardrails actions server. chat Start an interactive chat session. evaluate Run an evaluation task. server Start a NeMo Guardrails server. ``` ### 护栏服务器您可以使用 NeMo Guardrails CLI 启动护栏服务器。服务器可以从指定文件夹加载一个或多个配置，并公开 HTTP API 以供使用。 ``` nemoguardrails server [--config PATH/TO/CONFIGS] [--port PORT] ``` 例如，要获取 `sample` 配置的聊天补全，您可以使用 `/v1/chat/completions` 端点： ``` POST /v1/chat/completions ``` ``` { "config_id": "sample", "messages": [{ "role":"user", "content":"Hello! What can you do for me?" }] } ``` 示例输出： ``` {"role": "assistant", "content": "Hi! How can I help you?"} ``` #### Docker 要启动护栏服务器，您也可以使用 Docker 容器。NeMo Guardrails 提供了一个 [Dockerfile](./Dockerfile)，您可以使用它来构建 `nemoguardrails` 镜像。有关更多信息，请参阅 [使用 Docker](https://docs.nvidia.com/nemo/guardrails/user-guides/advanced/using-docker.html) 部分。 ## 与 LangChain 集成 NeMo Guardrails 与 LangChain 无缝集成。您可以轻松地将护栏配置包装在 LangChain 链（或任何 `Runnable`）周围。您还可以在护栏配置中调用 LangChain 链。有关更多详细信息，请查看 [LangChain 集成文档](https://docs.nvidia.com/nemo/guardrails/user-guides/langchain/langchain-integration.html) ## 评估评估基于 LLM 的对话式应用程序的安全性是一项复杂的任务，目前仍是一个开放的研究问题。为了支持适当的评估，NeMo Guardrails 提供以下内容： 1. 一个[评估工具](nemoguardrails/evaluate/README.md)，即 `nemoguardrails evaluate`，支持主题护栏、事实核查、审核（越狱和输出审核）以及幻觉检测。 2. 示例 LLM 漏洞扫描报告，例如，[ABC Bot - LLM 漏洞扫描结果](https://docs.nvidia.com/nemo/guardrails/evaluation/llm-vulnerability-scanning.html) ## 有何不同？有多种方式可以将护栏添加到基于 LLM 的对话式应用程序中。例如：显式审核端点（例如，OpenAI, ActiveFence）、批判链（例如 constitutional chain）、解析输出（例如 guardrails.ai）、单独的护栏（例如，LLM-Guard）、RAG 应用程序的幻觉检测（例如，Got It AI, Patronus Lynx）。 NeMo Guardrails 旨在提供一个灵活的工具包，可以将所有这些互补的方法集成到一个内聚的 LLM 护栏层中。例如，该工具包提供了与 ActiveFence、AlignScore 和 LangChain 链的开箱即用集成。据我们所知，NeMo Guardrails 是唯一还提供建模用户与 LLM 之间对话解决方案的护栏工具包。这使得一方面能够以精确的方式引导对话。另一方面，它为何时应使用某些护栏提供了细粒度的控制，例如，仅针对特定类型的问题使用事实核查。 ## 了解更多 - [文档](https://docs.nvidia.com/nemo/guardrails) - [入门指南](https://docs.nvidia.com/nemo/guardrails/getting-started) - [示例](./examples) - [常见问题](https://docs.nvidia.com/nemo/guardrails/faqs.html) - [安全指南](https://docs.nvidia.com/nemo/guardrails/security/guidelines.html) ## 许可证本工具包根据 [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0) 授权。 ## 引用如果您使用这项工作，请引用介绍它的 [EMNLP 2023 论文](https://aclanthology.org/2023.emnlp-demo.40)。 ``` @inproceedings{rebedea-etal-2023-nemo, title = "{N}e{M}o Guardrails: A Toolkit for Controllable and Safe {LLM} Applications with Programmable Rails", author = "Rebedea, Traian and Dinu, Razvan and Sreedhar, Makesh Narsimhan and Parisien, Christopher and Cohen, Jonathan", editor = "Feng, Yansong and Lefever, Els", booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.emnlp-demo.40", doi = "10.18653/v1/2023.emnlp-demo.40", pages = "431--445", } ```

标签：AI安全, AI治理, ChatBot安全, Chat Copilot, Clair, Colang, Cybersecurity, DLL 劫持, LLM, NVIDIA NeMo, Petitpotam, Prompt注入防护, Python, RAG安全, Red Canary, Unmanaged PE, 二进制发布, 人工智能, 分布式搜索, 可编程护栏, 大语言模型, 安全护栏, 对话系统, 幻觉抑制, 开源工具, 无后门, 用户模式Hook绕过, 请求拦截, 越狱防护, 输入校验, 输出验证, 逆向工具, 防御工具包