Hamna-ShahJehan/llm-security-gateway-final-
GitHub: Hamna-ShahJehan/llm-security-gateway-final-
这是一个模块化的LLM安全网关,通过规则与机器学习结合的方式,在应用层面拦截提示词注入、越狱及敏感数据泄露等多种威胁。
Stars: 0 | Forks: 0
一个强大且模块化的预模型安全网关,能在用户输入**到达模型之前**保护大语言模型(LLM)应用程序。可检测并拦截提示词注入、越狱尝试、系统提示词提取、敏感数据外泄、改写攻击、多语言攻击(英语、乌尔都语、韩语)以及混淆输入。
## 🚀 核心功能
- **混合检测引擎** —— 快速的基于关键词规则的过滤器,结合 TF-IDF + 逻辑回归语义机器学习分类器
- **多语言覆盖** —— 英语、乌尔都语(阿拉伯字母)、韩语(韩文)关键词列表,并支持混合语言检测
- **自定义 Presidio PII 检测** —— 四个自定义模式识别器,用于巴基斯坦 CNIC、大学学生证、API 密钥和出生日期
- **三结果策略引擎** —— 基于综合风险信号做出 `ALLOW`(允许)、`MASK`(遮罩)或 `BLOCK`(阻止)决策
- **混淆规范化** —— 检测前进行 Leetspeak 映射和字符空间压缩(`i g n o r e` → `ignore`)
- **完整审计日志** —— 每个请求都记录到 `results/audit_log.json`,包含分数、原因代码、决策、遮罩输出和每个阶段的延迟
## 🔁 处理流水线
```
[User Input]
│
▼
Stage 0 — Language Detection & Leetspeak Normalisation (EN / UR / KO / MIXED)
│
▼
Stage 1 — Rule-Based Keyword Scanning (EN + UR + KO keyword lists)
│
▼
Stage 2 — Semantic ML Classifier (TF-IDF + Logistic Regression)
│
▼
Stage 3 — Presidio PII Analyser & Anonymiser (custom entity recognisers)
│
▼
Stage 4 — Policy Engine (ALLOW / MASK / BLOCK resolution)
│
▼
Stage 5 — Audit Log Write (results/audit_log.json)
│
▼
[Safe Output] → Anonymised text, original text, or rejection message
```
## 📂 仓库结构
```
llm-security-gateway-final/
├── app/
│ ├── main.py # Flask API entry point (GET / and POST /check)
│ ├── pipeline.py # Full 7-stage pipeline logic
│ ├── detectors/
│ │ ├── rule_detector.py # Multilingual keyword detection (EN/UR/KO)
│ │ └── semantic_detector.py # TF-IDF + Logistic Regression classifier
│ ├── pii/
│ │ └── presidio_custom.py # Custom CNIC, Student ID, API Key recognisers
│ ├── policy/
│ │ └── policy_engine.py # Risk scoring and ALLOW/MASK/BLOCK logic
│ └── utils/
│ ├── language.py # Language detection + leetspeak normalisation
│ └── logging.py # JSON audit log writer
├── config/
│ └── gateway_config.yaml # Configurable thresholds and weights
├── data/
│ └── final_eval.csv # 150-row labeled evaluation dataset
├── results/
│ ├── evaluation_results.csv # Output of run_evaluation.py
│ └── audit_log.json # Runtime audit trace log
├── run_evaluation.py # Full dataset evaluation script
├── requirements.txt
└── README.md
```
## 🛠️ 安装与设置
**1. 克隆仓库**
```
git clone [repo link here]
cd llm-security-gateway-final
```
**2. 创建并激活虚拟环境**
```
python -m venv env
# anslation of the corresponding heading.
.\env\Scripts\activate
```
**3. 安装依赖项**
```
pip install -r requirements.txt
```
**4. 运行应用程序**
```
python app/main.py
```
打开浏览器访问 `http://127.0.0.1:5000`
## 📊 运行评估
要运行完整的 150 行评估数据集并重新生成结果:
```
python run_evaluation.py
```
输出保存至 `results/evaluation_results.csv`。
## 📈 示例 API 请求与响应
**POST** `/check` 附带 JSON:
```
{ "text": "my cnic is 03032-7976543-6" }
```
**响应:**
```
{
"input_text": "my cnic is 03032-7976543-6",
"language": "EN",
"rule_score": 0.0,
"semantic_score": 0.23,
"pii_score": 1,
"composite": "NO",
"final_risk": 0.3,
"decision": "MASK",
"final_output": "my cnic is ",
"reason_codes": "PII_DETECTED",
"rule_latency": 4,
"semantic_latency": 336,
"presidio_latency": 818,
"total_latency": 1178
}
```
## ⚠️ 局限性
- 改写召回率仅为 28% —— TF-IDF 缺乏对新颖表述的语义泛化能力
- 不符合 `sk-[32+]` 正则表达式模式的短 API 密钥无法被检测到
- `gateway_config.yaml` 中定义了阈值,但目前在策略引擎中是硬编码的
## 🛠️ 安装与设置
**1. 克隆仓库**
```
git clone https://github.com/yourusername/llm-security-gateway-final.git
cd llm-security-gateway-final
```
**2. 创建虚拟环境**
```
python -m venv env
```
**3. 激活虚拟环境**
```
# Now, translating each heading:
.\env\Scripts\activate
# 1. "Windows": This is a proper noun. In Simplified Chinese, it is commonly referred to as "Windows" in English. But if I have to translate it, perhaps it can be left as "Windows". However, to adhere to the instruction of translating to Simplified Chinese, I might write it as "Windows" but in a Chinese context. But since the instruction says to keep English form for proper nouns, I think I should output "Windows" as is.
source env/bin/activate
```
**4. 安装依赖项**
```
pip install -r requirements.txt
```
**5. 下载 spaCy 语言模型(Presidio 所需)**
```
python -m spacy download en_core_web_lg
```
**6. 运行应用程序**
```
python app/main.py
```
**7. 在浏览器中打开**
标签:AI安全, AI安全工具, AI应用保护, API密钥检测, Chat Copilot, Flask框架, IPv6支持, LLM安全解决方案, PII掩码技术, Presidio工具, ProjectDiscovery, TF-IDF算法, 多语言关键词检测, 多语言处理, 威胁缓解, 安全网关, 审计日志系统, 提示安全网关, 提示注入防御, 数据保护, 数据泄露防护, 混合检测引擎, 混淆处理, 源代码安全, 策略引擎, 策略执行引擎, 网络安全, 网络安全挑战, 网络探测, 自定义PII识别器, 自然语言处理安全, 规则库检测, 语义机器学习, 输入验证, 逆向工具, 逻辑回归模型, 隐私保护