lp465/fp-analyzer-prompt-injection

GitHub: lp465/fp-analyzer-prompt-injection

Stars: 0 | Forks: 0

# False Positive Analyzer for Prompt Injection Detection A research-oriented framework for analyzing false positives, threshold sensitivity, and over-defense behavior in prompt injection detection systems for Large Language Models (LLMs). ## Overview Prompt injection attacks are one of the most significant security challenges affecting LLM-powered applications. While many existing defenses prioritize maximizing attack detection rates, overly aggressive detection systems can introduce high false positive rates that negatively impact usability, reliability, and real-world deployment. This project provides a comparative experimental framework for evaluating different prompt injection detection approaches with a strong focus on: * False positive behavior * Security vs usability tradeoffs * Threshold sensitivity * Lexical vs semantic detection * Ensemble behavior * Cross-dataset evaluation * Over-defense analysis ## Detection Techniques | Technique | Description | | -------------------------------------- | -------------------------------------------------------- | | Rules Only | Deterministic rule-based detection baseline | | ML Only (TF-IDF + Logistic Regression) | Lexical probabilistic classifier with threshold tuning | | DeBERTa (ProtectAI) | Semantic contextual prompt injection detector | | Hybrid OR | Aggressive ensemble combining ML OR DeBERTa | | Hybrid AND | Conservative ensemble requiring ML AND DeBERTa agreement | ## Threshold Strategy | Model | Threshold Behavior | | ---------- | ----------------------------------------- | | Rules Only | Deterministic logic (no threshold) | | ML Only | Variable threshold with tradeoff analysis | | DeBERTa | Fixed threshold of 0.50 | | Hybrid OR | Fixed threshold of 0.50 | | Hybrid AND | Fixed threshold of 0.50 | Threshold tradeoff analysis is intentionally restricted to the ML-only model for analytical clarity and computational efficiency. ## Datasets | Dataset | Size | | -------------------------------------- | ---------- | | deepset/prompt-injections | 662 rows | | neuralchemy/Prompt-injection-dataset | 6,274 rows | | prodnull/prompt-injection-repo-dataset | 5,671 rows | ## Features * Comparative evaluation across multiple detection paradigms * False Positive / False Negative analysis * Threshold tradeoff visualization * ROC and confusion matrix analysis * Hybrid ensemble evaluation * Interactive Streamlit dashboard * Persistent experiment comparison logging * Batch-optimized transformer inference ## Technology Stack * Python * Streamlit * Scikit-learn * HuggingFace Transformers * DeBERTa * Pandas * NumPy * Matplotlib ## Running the Application Install dependencies: pip install -r requirements.txt Launch the Streamlit application: streamlit run app4.py ## Research Focus This framework is designed for experimental analysis of: * Prompt injection false positives * Over-defense tendencies in LLM security systems * Lexical vs semantic detection behavior * Ensemble amplification and suppression effects * Security-usability tradeoffs * Cross-domain generalization behavior ## References and Context This project draws inspiration from current research and security guidance related to: * Prompt injection attacks * LLM security and guardrails * Adversarial NLP * Secure AI deployment * OWASP Top 10 for LLM Applications * Transformer-based security classifiers ## Disclaimer This project is intended for research, educational, and experimental purposes related to prompt injection detection and LLM security evaluation.