lp465/fp-analyzer-prompt-injection
GitHub: lp465/fp-analyzer-prompt-injection
Stars: 0 | Forks: 0
# False Positive Analyzer for Prompt Injection Detection
A research-oriented framework for analyzing false positives, threshold sensitivity, and over-defense behavior in prompt injection detection systems for Large Language Models (LLMs).
## Overview
Prompt injection attacks are one of the most significant security challenges affecting LLM-powered applications. While many existing defenses prioritize maximizing attack detection rates, overly aggressive detection systems can introduce high false positive rates that negatively impact usability, reliability, and real-world deployment.
This project provides a comparative experimental framework for evaluating different prompt injection detection approaches with a strong focus on:
* False positive behavior
* Security vs usability tradeoffs
* Threshold sensitivity
* Lexical vs semantic detection
* Ensemble behavior
* Cross-dataset evaluation
* Over-defense analysis
## Detection Techniques
| Technique | Description |
| -------------------------------------- | -------------------------------------------------------- |
| Rules Only | Deterministic rule-based detection baseline |
| ML Only (TF-IDF + Logistic Regression) | Lexical probabilistic classifier with threshold tuning |
| DeBERTa (ProtectAI) | Semantic contextual prompt injection detector |
| Hybrid OR | Aggressive ensemble combining ML OR DeBERTa |
| Hybrid AND | Conservative ensemble requiring ML AND DeBERTa agreement |
## Threshold Strategy
| Model | Threshold Behavior |
| ---------- | ----------------------------------------- |
| Rules Only | Deterministic logic (no threshold) |
| ML Only | Variable threshold with tradeoff analysis |
| DeBERTa | Fixed threshold of 0.50 |
| Hybrid OR | Fixed threshold of 0.50 |
| Hybrid AND | Fixed threshold of 0.50 |
Threshold tradeoff analysis is intentionally restricted to the ML-only model for analytical clarity and computational efficiency.
## Datasets
| Dataset | Size |
| -------------------------------------- | ---------- |
| deepset/prompt-injections | 662 rows |
| neuralchemy/Prompt-injection-dataset | 6,274 rows |
| prodnull/prompt-injection-repo-dataset | 5,671 rows |
## Features
* Comparative evaluation across multiple detection paradigms
* False Positive / False Negative analysis
* Threshold tradeoff visualization
* ROC and confusion matrix analysis
* Hybrid ensemble evaluation
* Interactive Streamlit dashboard
* Persistent experiment comparison logging
* Batch-optimized transformer inference
## Technology Stack
* Python
* Streamlit
* Scikit-learn
* HuggingFace Transformers
* DeBERTa
* Pandas
* NumPy
* Matplotlib
## Running the Application
Install dependencies:
pip install -r requirements.txt
Launch the Streamlit application:
streamlit run app4.py
## Research Focus
This framework is designed for experimental analysis of:
* Prompt injection false positives
* Over-defense tendencies in LLM security systems
* Lexical vs semantic detection behavior
* Ensemble amplification and suppression effects
* Security-usability tradeoffs
* Cross-domain generalization behavior
## References and Context
This project draws inspiration from current research and security guidance related to:
* Prompt injection attacks
* LLM security and guardrails
* Adversarial NLP
* Secure AI deployment
* OWASP Top 10 for LLM Applications
* Transformer-based security classifiers
## Disclaimer
This project is intended for research, educational, and experimental purposes related to prompt injection detection and LLM security evaluation.