Hamna-ShahJehan/llm-security-gateway-final-

GitHub: Hamna-ShahJehan/llm-security-gateway-final-

这是一个模块化的LLM安全网关,通过规则与机器学习结合的方式,在应用层面拦截提示词注入、越狱及敏感数据泄露等多种威胁。

Stars: 0 | Forks: 0

一个强大且模块化的预模型安全网关,能在用户输入**到达模型之前**保护大语言模型(LLM)应用程序。可检测并拦截提示词注入、越狱尝试、系统提示词提取、敏感数据外泄、改写攻击、多语言攻击(英语、乌尔都语、韩语)以及混淆输入。 ## 🚀 核心功能 - **混合检测引擎** —— 快速的基于关键词规则的过滤器,结合 TF-IDF + 逻辑回归语义机器学习分类器 - **多语言覆盖** —— 英语、乌尔都语(阿拉伯字母)、韩语(韩文)关键词列表,并支持混合语言检测 - **自定义 Presidio PII 检测** —— 四个自定义模式识别器,用于巴基斯坦 CNIC、大学学生证、API 密钥和出生日期 - **三结果策略引擎** —— 基于综合风险信号做出 `ALLOW`(允许)、`MASK`(遮罩)或 `BLOCK`(阻止)决策 - **混淆规范化** —— 检测前进行 Leetspeak 映射和字符空间压缩(`i g n o r e` → `ignore`) - **完整审计日志** —— 每个请求都记录到 `results/audit_log.json`,包含分数、原因代码、决策、遮罩输出和每个阶段的延迟 ## 🔁 处理流水线 ``` [User Input] │ ▼ Stage 0 — Language Detection & Leetspeak Normalisation (EN / UR / KO / MIXED) │ ▼ Stage 1 — Rule-Based Keyword Scanning (EN + UR + KO keyword lists) │ ▼ Stage 2 — Semantic ML Classifier (TF-IDF + Logistic Regression) │ ▼ Stage 3 — Presidio PII Analyser & Anonymiser (custom entity recognisers) │ ▼ Stage 4 — Policy Engine (ALLOW / MASK / BLOCK resolution) │ ▼ Stage 5 — Audit Log Write (results/audit_log.json) │ ▼ [Safe Output] → Anonymised text, original text, or rejection message ``` ## 📂 仓库结构 ``` llm-security-gateway-final/ ├── app/ │ ├── main.py # Flask API entry point (GET / and POST /check) │ ├── pipeline.py # Full 7-stage pipeline logic │ ├── detectors/ │ │ ├── rule_detector.py # Multilingual keyword detection (EN/UR/KO) │ │ └── semantic_detector.py # TF-IDF + Logistic Regression classifier │ ├── pii/ │ │ └── presidio_custom.py # Custom CNIC, Student ID, API Key recognisers │ ├── policy/ │ │ └── policy_engine.py # Risk scoring and ALLOW/MASK/BLOCK logic │ └── utils/ │ ├── language.py # Language detection + leetspeak normalisation │ └── logging.py # JSON audit log writer ├── config/ │ └── gateway_config.yaml # Configurable thresholds and weights ├── data/ │ └── final_eval.csv # 150-row labeled evaluation dataset ├── results/ │ ├── evaluation_results.csv # Output of run_evaluation.py │ └── audit_log.json # Runtime audit trace log ├── run_evaluation.py # Full dataset evaluation script ├── requirements.txt └── README.md ``` ## 🛠️ 安装与设置 **1. 克隆仓库** ``` git clone [repo link here] cd llm-security-gateway-final ``` **2. 创建并激活虚拟环境** ``` python -m venv env # anslation of the corresponding heading. .\env\Scripts\activate ``` **3. 安装依赖项** ``` pip install -r requirements.txt ``` **4. 运行应用程序** ``` python app/main.py ``` 打开浏览器访问 `http://127.0.0.1:5000` ## 📊 运行评估 要运行完整的 150 行评估数据集并重新生成结果: ``` python run_evaluation.py ``` 输出保存至 `results/evaluation_results.csv`。 ## 📈 示例 API 请求与响应 **POST** `/check` 附带 JSON: ``` { "text": "my cnic is 03032-7976543-6" } ``` **响应:** ``` { "input_text": "my cnic is 03032-7976543-6", "language": "EN", "rule_score": 0.0, "semantic_score": 0.23, "pii_score": 1, "composite": "NO", "final_risk": 0.3, "decision": "MASK", "final_output": "my cnic is ", "reason_codes": "PII_DETECTED", "rule_latency": 4, "semantic_latency": 336, "presidio_latency": 818, "total_latency": 1178 } ``` ## ⚠️ 局限性 - 改写召回率仅为 28% —— TF-IDF 缺乏对新颖表述的语义泛化能力 - 不符合 `sk-[32+]` 正则表达式模式的短 API 密钥无法被检测到 - `gateway_config.yaml` 中定义了阈值,但目前在策略引擎中是硬编码的 ## 🛠️ 安装与设置 **1. 克隆仓库** ``` git clone https://github.com/yourusername/llm-security-gateway-final.git cd llm-security-gateway-final ``` **2. 创建虚拟环境** ``` python -m venv env ``` **3. 激活虚拟环境** ``` # Now, translating each heading: .\env\Scripts\activate # 1. "Windows": This is a proper noun. In Simplified Chinese, it is commonly referred to as "Windows" in English. But if I have to translate it, perhaps it can be left as "Windows". However, to adhere to the instruction of translating to Simplified Chinese, I might write it as "Windows" but in a Chinese context. But since the instruction says to keep English form for proper nouns, I think I should output "Windows" as is. source env/bin/activate ``` **4. 安装依赖项** ``` pip install -r requirements.txt ``` **5. 下载 spaCy 语言模型(Presidio 所需)** ``` python -m spacy download en_core_web_lg ``` **6. 运行应用程序** ``` python app/main.py ``` **7. 在浏览器中打开**
标签:AI安全, AI安全工具, AI应用保护, API密钥检测, Chat Copilot, Flask框架, IPv6支持, LLM安全解决方案, PII掩码技术, Presidio工具, ProjectDiscovery, TF-IDF算法, 多语言关键词检测, 多语言处理, 威胁缓解, 安全网关, 审计日志系统, 提示安全网关, 提示注入防御, 数据保护, 数据泄露防护, 混合检测引擎, 混淆处理, 源代码安全, 策略引擎, 策略执行引擎, 网络安全, 网络安全挑战, 网络探测, 自定义PII识别器, 自然语言处理安全, 规则库检测, 语义机器学习, 输入验证, 逆向工具, 逻辑回归模型, 隐私保护