thien-ng97/prompt-injection-detection-framework

GitHub: thien-ng97/prompt-injection-detection-framework

一个用于检测 LLM 应用中直接 Prompt Injection 攻击的端到端机器学习框架，通过训练和基准测试多种 NLP 文本分类模型来实现恶意输入的识别。

Stars: 0 | Forks: 0

# 直接 Prompt Injection 防御框架 ## 项目描述本仓库包含一个端到端的本地机器学习 pipeline，旨在实现并对不同的 NLP 文本分类策略进行基准测试，以进行直接 prompt injection 检测。主要目标是使用严格冻结的、分层采样的测试数据集，评估自定义 fine-tuned 的 DistilBERT 模型与现成的基线模型（例如 ProtectAI 的 DeBERTa）及替代范式（Vector Space Embeddings、本地 LLM-as-a-Judge）的对比表现。 ## 文件夹结构工作空间进行了解耦，以分离原始数据、处理脚本和分析输出： * `data/` — 包含严格冻结的、分层采样的数据集划分（Train、Validation、Test），格式为 CSV。（在远程版本控制缓存中被忽略，以防止数据泄露）。 * `notebooks/` — 包含用于探索性数据分析 (EDA) 和攻击类型启发式标记的 Jupyter Notebooks。 * `results/` — 包含输出产物，包括生成的分布图和未来的基线 JSON 指标。 * `src/` — 包含用于数据 pipeline 摄取和模型评估的核心 Python 脚本。 ## 数据集来源本项目使用以下开源 Hugging Face 数据集聚合了一个包含 5,517 行的独立语料库： 1. **[Deepset Prompt Injections](https://huggingface.co/datasets/deepset/prompt-injections):** 高质量、结构多样的 prompt injection 和标准用户查询（需要 Hugging Face 身份验证）。 2. **[Rogue Security Prompt Injections Benchmark](https://huggingface.co/datasets/rogue-security/prompt-injections-benchmark):** 跨越多种攻击分类法的密集对抗数据集（需要 Hugging Face 身份验证）。 ## 环境设置本框架专为本地执行而构建（兼容 Apple Silicon）。要设置环境，请克隆仓库并初始化虚拟环境： ``` # 初始化并激活虚拟环境 python3 -m venv env source env/bin/activate # 安装依赖 pip install -r requirements.txt # 使用 Hugging Face 进行认证（两个 dataset 都需要，需要 HF access token） hf auth login ## Milestone 3：Production Model 与 Test Metrics The primary DistilBERT model has been successfully trained, fine-tuned, and validated. Due to file size constraints, the final trained weights and tokenizer configurations are securely hosted on the Hugging Face Hub. 📦 **Hugging Face Model Repository:** [thienyu/prompt-injection-guardrail](https://huggingface.co/thienyu/prompt-injection-guardrail) ### 最终 Production Test Results（Blind 20% Dataset） - **Accuracy:** 87.95% - **Precision:** 92.48% - **Recall:** 75.80% - **F1-Score:** 83.31% - **False Positive Rate (FPR):** 4.05% *(Satisfies strict < 5% constraint)* ### Local Inference 验证 To verify the basic inference loop locally, ensure your environment is activated and execute: ```bash python src/evaluate_custom_model.py ```

标签：Apex, DistilBERT, DLL 劫持, IaC 扫描, NLP文本分类, NoSQL, URL发现, 人工智能, 大语言模型, 安全检测, 机器学习, 用户模式Hook绕过, 逆向工具