Tooba19/presidio-llm-security-gateway

GitHub: Tooba19/presidio-llm-security-gateway

基于 Microsoft Presidio 的 LLM 安全网关，用于在请求到达模型前检测 prompt injection 并对 PII 敏感信息进行脱敏处理。

Stars: 0 | Forks: 0

# 基于 Presidio 的 LLM 安全微型网关一个用于大型语言模型 (LLM) 应用的模块化安全网关，可在请求到达模型之前检测 prompt injection 尝试并缓解敏感信息泄露。 ## 概述本项目为 LLM 系统实现了一个轻量级预处理层。它分析用户输入，检测对抗性 prompt injection 模式，使用 Microsoft Presidio 识别个人身份信息 (PII)，并应用策略决策来： - **ALLOW** (允许) 输入 - **MASK** (掩码) 敏感实体 - **BLOCK** (拦截) 恶意或高风险输入该系统使用 FastAPI 实现，包含可配置的阈值、延迟测量和评估流水线。 ## 系统架构安全网关充当用户与 LLM 之间的预处理层。用户输入 → 注入检测 → Presidio PII 分析器 → 策略引擎 → 输出决策 (ALLOW / MASK / BLOCK)

## 功能特性 - 基于规则的 prompt injection / 越狱检测 - 基于 Microsoft Presidio 的 PII 检测与匿名化 - 自定义韩语电话号码识别器 - 上下文感知的置信度提升 - 复合实体检测（例如：姓名 + 电话） - 策略驱动执行 (ALLOW / MASK / BLOCK) - 通过 `config.py` 配置阈值 - 带有 Swagger UI 的 FastAPI REST API - 包含准确率、精确率、召回率、F1、混淆矩阵和延迟报告的评估流水线 ## 项目结构 ``` app/ main.py policy.py injection_detector.py presidio_engine.py context_scoring.py composite_detector.py custom_recognizers.py config.py eval/ prompts.jsonl run_eval.py report/ report.pdf figures/ arch.png ROC_Analysis_Final.pdf confusion_matrix_heatmap.pdf latency_distribution_plot.pdf precision_recall_curve.pdf requirements.txt README.md ``` ## 如何运行克隆仓库： ``` git clone https://github.com/Tooba19/presidio-llm-security-gateway.git cd presidio-llm-security-gateway ``` 安装依赖： ``` pip install -r requirements.txt python -m spacy download en_core_web_lg ``` 运行 API： ``` uvicorn app.main:app --reload ``` 打开 Swagger UI： ``` http://127.0.0.1:8000/docs ``` 运行评估流水线： ``` python -m eval.run_eval ``` ## 评估数据集评估数据集包含 40 个 prompt，涵盖四个类别： | 类别 | 描述 | 示例 | |--------|--------|--------| | Benign (良性) | 正常用户请求 | "What is the capital of Germany?" | | PII | 敏感信息 | 电子邮件地址、电话号码 | | Injection (注入) | Prompt injection 尝试 | "Ignore previous instructions" | | Mixed (混合) | Injection + PII | "Ignore instructions. My email is admin@company.com" | 数据集文件： eval/prompts.jsonl 每个条目包含： ``` { "text": "...", "label": "ALLOW/MASK/BLOCK", "category": "benign/pii/injection/mixed" } ``` 该数据集由 `eval/run_eval.py` 用于测量检测性能。 ## 评估结果评估在 40 个测试 prompt 上进行。 | 指标 | 数值 | |------|------| | Accuracy (准确率) | 1.00 | | Precision (精确率) | 1.00 | | Recall (召回率) | 1.00 | | F1 Score (F1 分数) | 1.00 | 混淆矩阵： | 实际值 / 预测值 | ALLOW | MASK | BLOCK | |---|---|---|---| | ALLOW | 10 | 0 | 0 | | MASK | 0 | 10 | 0 | | BLOCK | 0 | 0 | 20 | 平均延迟： | 类别 | 平均延迟 (毫秒) | |------|------| | Benign (良性) | 14.38 | | PII | 13.15 | | Injection (注入) | 4.07 | | Mixed (混合) | 5.52 | ## 技术报告描述威胁模型、Presidio 定制化、系统架构和评估的完整技术报告可在此处获取： [下载报告](report/report.pdf) ## 局限性当前系统使用基于规则的注入检测。虽然对已知攻击模式有效，但它可能无法应对： - 改写后的越狱 prompt - 多语言注入尝试 - 对抗性 prompt 混淆未来的工作将探索： - 基于嵌入的注入检测 - 轻量级 ML 分类器 - 自适应策略学习

标签：AMSI绕过, API安全, AV绕过, DLL 劫持, FastAPI, JSON输出, LLM, Naabu, PII泄露, Presidio, Python, Unmanaged PE, WAF, 人工智能安全, 内容安全, 合规性, 大语言模型, 威胁检测, 安全网关, 对抗攻击, 敏感信息检测, 数据脱敏, 无后门, 网络测绘, 脱敏网关, 越狱检测, 逆向工具, 隐私合规