Denis-Krueger-labs/llm-prompt-injection-paper

GitHub: Denis-Krueger-labs/llm-prompt-injection-paper

该论文系统分析了大语言模型系统中提示注入攻击的防御机制,评估了结构化查询、检测型防御和系统级方案在自适应攻击下的有效性及其局限性。

Stars: 0 | Forks: 0

# LLM Prompt Injection 防御 **专业技能与沟通**课程研讨会论文 THWS 维尔茨堡-施韦因富特 · 2026年夏季学期 **作者:** Denis Krüger · Sebastian Mautner ## 主题 本文分析了 Large Language Model (LLM) 系统中的 **prompt injection 攻击**,并评估了当前的防御机制。LLM 正越来越多地部署在处理外部不可信数据(如电子邮件、文档、网页、API 响应)的应用程序中,这种集成引入了一个关键的漏洞:即在结构上无法将指令与数据分离。 本文重点关注: - 由指令/数据边界问题引起的底层漏洞 - 结构化查询方法,如 **StruQ** 和 **SecAlign** - 基于检测的防御和系统级防御 - 使用最新 benchmark 进行评估(AgentDojo, OpenPromptInjection, AlpacaFarm) - 当前保护机制在自适应攻击下的局限性 ## 研究问题 ## 来源 ### 核心论文 | 关键词 | 标题 | 作者 | 发表会议/期刊 | |-----|-------|---------|-------| | StruQ | *StruQ: Defending Against Prompt Injection with Structured Queries* | Chen et al. | USENIX Security 2025 | | SecAlign | *SecAlign: Defending Against Prompt Injection with Preference Optimization* | Chen et al. | ACM CCS 2025 | | AgentDojo | *AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents* | Debenedetti et al. | NeurIPS 2024 | | Critical Evaluation | *A Critical Evaluation of Defenses against Prompt Injection Attacks* | Jia et al. | arXiv 2025 | | Checkpoint-GCG | *Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses* | Yang et al. | arXiv 2025 | | ASTRA | *May I Have Your Attention? Breaking Fine-Tuning Based Prompt Injection Defenses Using Architecture-Aware Attacks* | Pandya et al. | arXiv 2025 | ### 基础文献 | 关键词 | 标题 | 作者 | 发表会议/期刊 | |-----|-------|---------|-------| | Perez & Ribeiro | *Ignore Previous Prompt: Attack Techniques for Language Models* | Perez & Ribeiro | arXiv 2022 | | Greshake et al. | *Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection* | Greshake et al. | AISec 2023 | | Zou et al. | *Universal and Transferable Adversarial Attacks on Aligned Language Models* | Zou et al. | arXiv 2023 | | Wallace et al. | *The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions* | Wallace et al. | arXiv 2024 | | Liu et al. | *Prompt Injection Attack against LLM-Integrated Applications* | Liu et al. | arXiv 2023 | ### 防御机制 | 关键词 | 标题 | 作者 | 发表会议/期刊 | |-----|-------|---------|-------| | PromptArmor | *PromptArmor: Simple yet Effective Prompt Injection Defenses* | Shi et al. | arXiv 2025 | | RTBAS | *RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage* | Zhong et al. | arXiv 2025 | | f-secure / IFC | *System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective* | Wu et al. | arXiv 2024 | | Spotlighting | *Defending Against Indirect Prompt Injection Attacks With Spotlighting* | Hines et al. | arXiv 2024 | | Defeating by Design | *Defeating Prompt Injections by Design* | Debenedetti et al. | arXiv 2025 | | A-MemGuard | *A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory* | Wei et al. | arXiv 2025 | ### Benchmark 与评估 | 关键词 | 标题 | 作者 | 发表会议/期刊 | |-----|-------|---------|-------| | InjecAgent | *InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents* | Zhan et al. | arXiv 2024 | | InjectBench | *InjectBench: An Indirect Prompt Injection Benchmarking Framework* | Kong | Master's Thesis, Virginia Tech 2024 | | FSPIB | *Prompt Injection Benchmark for Foundation Model Integrated Systems* | Anonymous | Under review, ICLR 2025 | | MPIB | *MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs* | Lee et al. | arXiv 2025 | ### 实际案例与综述 | 关键词 | 标题 | 作者 | 发表会议/期刊 | |-----|-------|---------|-------| | P2SQL | *Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and Defenses* | Pedro et al. | ICSE 2025 | | Survey | *Prompting for LLM Security and RAG: A Survey* | Ruparel et al. | IEEE ICAIC 2026 | | Mathew | *Enhancing Security in LLMs: A Comprehensive Review of Prompt Injection Attacks and Defenses* | Mathew | Preprint 2024 | ## 项目结构 ``` paper/ ├── main.tex ├── sections/ │ ├── 00_titlepage.tex │ ├── 01_einleitung.tex │ ├── 02_llms_und_prompting.tex │ ├── 03_prompt_injection.tex │ ├── 04_bedrohungsmodell.tex │ ├── 05_struq.tex │ ├── 06_weitere_defences.tex │ ├── 07_benchmarks_und_metriken.tex │ ├── 08_grenzen_aktueller_defences.tex │ ├── 09_praxis_und_management.tex │ └── 10_fazit.tex ├── figures/ └── references.bib ``` ## 编译论文 需要包含 `biber` 和 `pdflatex` 的 LaTeX 发行版。 ``` pdflatex main.tex biber main pdflatex main.tex pdflatex main.tex ```
标签:DLL 劫持, TruffleHog, 大语言模型, 学术论文, 提示注入, 防御机制, 集群管理