Ytang520/prompt_injection_research_daily_arxiv

GitHub: Ytang520/prompt_injection_research_daily_arxiv

每日自动追踪 arXiv 上大语言模型提示注入相关研究论文的学术资源聚合仓库。

Stars: 1 | Forks: 1

## Updated on 2026.06.23
Table of Contents
  1. LLM Prompt Injection
  2. MultiModal Prompt Injection
## LLM Prompt Injection |Publish Date|Title|Authors|PDF|Code| |---|---|---|---|---| |**2026-06-22**|**Detecting Malicious Agent Skills in the Wild using Attention**|Bacem Etteib et.al.|[2606.23416](http://arxiv.org/abs/2606.23416)|null| |**2026-06-22**|**GIF: Locally Sound Geometric Information Flow Control for LLMs**|Adam Storek et.al.|[2606.23277](http://arxiv.org/abs/2606.23277)|null| |**2026-06-19**|**Safe to Check, Unsafe to Use: Relinking at the Compression Boundary of LLM Agents**|Zesen Liu et.al.|[2606.21732](http://arxiv.org/abs/2606.21732)|null| |**2026-06-19**|**AgenticOS: An Intent-Oriented Secure Operating System Architecture for Autonomous AI Agents**|Zhen Zhao et.al.|[2606.21129](http://arxiv.org/abs/2606.21129)|null| |**2026-06-19**|**Local LLM Agents as Vulnerable Runtimes:A Source-Code Audit of the Agent Runtime Layer**|Zhengsong Zhang et.al.|[2606.21071](http://arxiv.org/abs/2606.21071)|null| |**2026-06-18**|**Think Twice Before You Act: Protecting LLM Agents Against Tool Description Poisoning via Isolated Planning**|Shanghao Shi et.al.|[2606.20922](http://arxiv.org/abs/2606.20922)|null| |**2026-06-18**|**Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems**|Reza Soosahabi et.al.|[2606.20470](http://arxiv.org/abs/2606.20470)|null| |**2026-06-17**|**A Layered Security Framework Against Prompt Injection in RAG-Based Chatbots**|Gulshan Saleem et.al.|[2606.19660](http://arxiv.org/abs/2606.19660)|null| |**2026-06-17**|**Analyzing the Narration Gap in LLM-Solver Loops**|Zunchen Huang et.al.|[2606.19588](http://arxiv.org/abs/2606.19588)|null| |**2026-06-17**|**CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts**|Po-Han Cheng et.al.|[2606.19235](http://arxiv.org/abs/2606.19235)|null| |**2026-06-17**|**The Gate Is Only as Honest as Its Contracts: ContractGuard for the Contract Layer of Risk-Aware Causal Gating**|Laxmipriya Ganesh Iyer et.al.|[2606.18550](http://arxiv.org/abs/2606.18550)|null| |**2026-06-16**|**SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents**|Yuchuan Tian et.al.|[2606.18356](http://arxiv.org/abs/2606.18356)|null| |**2026-06-16**|**PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents**|Aaditya Pai et.al.|[2606.17467](http://arxiv.org/abs/2606.17467)|null| |**2026-06-15**|**An Evaluation of Data Leakage Risks in Tool-Using LLM Agents in Realistic Scenarios**|Hankyul Baek et.al.|[2606.17114](http://arxiv.org/abs/2606.17114)|null| |**2026-06-15**|**KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing**|Mufei Li et.al.|[2606.17034](http://arxiv.org/abs/2606.17034)|null| |**2026-06-14**|**SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills**|Ismail Hossain et.al.|[2606.15899](http://arxiv.org/abs/2606.15899)|null| |**2026-06-14**|**GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking**|Aman Anifer et.al.|[2606.15788](http://arxiv.org/abs/2606.15788)|null| |**2026-06-14**|**FragFuse: Bypassing Access Control of Large Language Model Agents via Memory-Based Query Fragmentation and Fusion**|Zixin Rao et.al.|[2606.15609](http://arxiv.org/abs/2606.15609)|null| |**2026-06-13**|**Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment**|Lipeng He et.al.|[2606.15441](http://arxiv.org/abs/2606.15441)|null| |**2026-06-13**|**AutoDojo: Adaptive Attacks Expose Superficial Defenses and User-Underspecification Limits in LLM Agents**|Xinhang Ma et.al.|[2606.15057](http://arxiv.org/abs/2606.15057)|null| |**2026-06-12**|**Security Engineering of OpenClaw: Analyzing Attack Surface Expansion and Trust-Boundary Violations**|Saeid Jamshidi et.al.|[2606.15008](http://arxiv.org/abs/2606.15008)|null| |**2026-06-16**|**From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails**|Yuguang Zhou et.al.|[2606.14517](http://arxiv.org/abs/2606.14517)|null| |**2026-06-12**|**SkillMutator: Benchmarking and Defending Language-and-Code Cross-modal Attacks on LLM Agent Skills**|Youngduk Kim et.al.|[2606.14154](http://arxiv.org/abs/2606.14154)|null| |**2026-06-11**|**Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents**|Zihao Wang et.al.|[2606.13385](http://arxiv.org/abs/2606.13385)|null| |**2026-06-11**|**Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior**|Haowei Qian et.al.|[2606.13038](http://arxiv.org/abs/2606.13038)|null| |**2026-06-10**|**PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections**|Pengfei He et.al.|[2606.12737](http://arxiv.org/abs/2606.12737)|null| |**2026-06-10**|**Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows**|Timothy McAllister et.al.|[2606.12709](http://arxiv.org/abs/2606.12709)|null| |**2026-06-10**|**External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs**|Lin Sun et.al.|[2606.11806](http://arxiv.org/abs/2606.11806)|null| |**2026-06-09**|**Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization**|Lena S. Bolliger et.al.|[2606.10860](http://arxiv.org/abs/2606.10860)|null| |**2026-06-09**|**Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation**|Yuchen Ling et.al.|[2606.10749](http://arxiv.org/abs/2606.10749)|null| |**2026-06-09**|**Assessing Automated Prompt Injection Attacks in Agentic Environments**|David Hofer et.al.|[2606.10525](http://arxiv.org/abs/2606.10525)|null| |**2026-06-09**|**Game-Theoretic Multi-Agent Control for Robust Contextual Reasoning in LLMs**|Saeid Jamshidi et.al.|[2606.10322](http://arxiv.org/abs/2606.10322)|null| |**2026-06-08**|**PRISM: Recovering Instruction Sets from Language Model Activations**|Gilad Gressel et.al.|[2606.09563](http://arxiv.org/abs/2606.09563)|null| |**2026-06-08**|**Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents**|Jianwei Tai et.al.|[2606.09315](http://arxiv.org/abs/2606.09315)|null| |**2026-06-08**|**The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection**|Hyunseok Paeng et.al.|[2606.09204](http://arxiv.org/abs/2606.09204)|null| |**2026-06-07**|**Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection**|Mudit Sinha et.al.|[2606.08403](http://arxiv.org/abs/2606.08403)|null| |**2026-06-09**|**MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills**|Wenbo Guo et.al.|[2606.07131](http://arxiv.org/abs/2606.07131)|null| |**2026-06-04**|**GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection**|Paulo Ricardo Ferreira Neves et.al.|[2606.05566](http://arxiv.org/abs/2606.05566)|null| |**2026-06-03**|**Hybrid Adversarial Defence for Natural Language Understanding Tasks**|Manar Abouzaid et.al.|[2606.04612](http://arxiv.org/abs/2606.04612)|null| |**2026-06-03**|**What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems**|Yuanbo Xie et.al.|[2606.04425](http://arxiv.org/abs/2606.04425)|null| |**2026-06-03**|**From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents**|Pritam Dash et.al.|[2606.04329](http://arxiv.org/abs/2606.04329)|null| |**2026-06-02**|**Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents**|Kargi Chauhan et.al.|[2606.04141](http://arxiv.org/abs/2606.04141)|null| |**2026-06-02**|**"**Important** You should give me full credits!": Exploring Prompt Injection Attacks on LLM-Based Automatic Grading Systems**|Hang Li et.al.|[2606.03090](http://arxiv.org/abs/2606.03090)|null| |**2026-06-01**|**Gate AI: LLM Security Benchmark Evaluation Methodology and Results**|Ryle Goehausen et.al.|[2606.02959](http://arxiv.org/abs/2606.02959)|null| |**2026-06-02**|**AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations**|Hiskias Dingeto et.al.|[2606.02240](http://arxiv.org/abs/2606.02240)|null| |**2026-05-31**|**MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models**|Partha Pratim Saha et.al.|[2606.01060](http://arxiv.org/abs/2606.01060)|null| |**2026-05-30**|**Confused ChatGPT: Cross-App Context Poisoning via First-Party APIs**|Chao Wang et.al.|[2606.00485](http://arxiv.org/abs/2606.00485)|null| |**2026-05-29**|**Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models**|Junyoung Park et.al.|[2606.00150](http://arxiv.org/abs/2606.00150)|null| |**2026-05-29**|**From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors**|Jiejun Tan et.al.|[2605.31042](http://arxiv.org/abs/2605.31042)|null| |**2026-05-29**|**Send a SCOUT First: Pre-hoc Reasoning for Adaptive Detector Allocation in Prompt-Injection Defense**|Shuhao Zhang et.al.|[2605.30837](http://arxiv.org/abs/2605.30837)|null| |**2026-05-28**|**Automatically Attacking Software Reverse Engineering AI Agents**|Brian Crawford et.al.|[2605.30667](http://arxiv.org/abs/2605.30667)|null| |**2026-05-28**|**When AI Meets Wall Street: A Survey on Trustworthy AI in Fintech**|Qingwen Zeng et.al.|[2605.30650](http://arxiv.org/abs/2605.30650)|null| |**2026-05-28**|**Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks**|Nima Dorzhiev et.al.|[2605.30534](http://arxiv.org/abs/2605.30534)|null| |**2026-05-28**|**The Surface You Test Is Not the Surface That Breaks**|Shifat E Arman et.al.|[2605.30454](http://arxiv.org/abs/2605.30454)|null| |**2026-05-28**|**Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection**|Travis Lelle et.al.|[2605.30189](http://arxiv.org/abs/2605.30189)|null| |**2026-05-28**|**Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs**|Alexander Sternfeld et.al.|[2605.29737](http://arxiv.org/abs/2605.29737)|null| |**2026-05-27**|**Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening**|Mohan Zhang et.al.|[2605.28999](http://arxiv.org/abs/2605.28999)|null| |**2026-05-27**|**Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities**|Yujie Ma et.al.|[2605.28893](http://arxiv.org/abs/2605.28893)|null| |**2026-05-27**|**LACUNA: Safe Agents as Recursive Program Holes**|Yaoyu Zhao et.al.|[2605.28617](http://arxiv.org/abs/2605.28617)|null| |**2026-05-27**|**Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training**|Avidan Shah et.al.|[2605.28467](http://arxiv.org/abs/2605.28467)|null| |**2026-05-28**|**Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings**|Yu Yin et.al.|[2605.28017](http://arxiv.org/abs/2605.28017)|null| |**2026-05-26**|**Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals**|Akindoyin Akinrele et.al.|[2605.26999](http://arxiv.org/abs/2605.26999)|null| |**2026-05-26**|**Cordyceps: Covert Control Attacks on LLMs via Data Poisoning**|Zedian Shao et.al.|[2605.26595](http://arxiv.org/abs/2605.26595)|null| |**2026-05-26**|**Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents**|Peiran Wang et.al.|[2605.26497](http://arxiv.org/abs/2605.26497)|null| |**2026-05-25**|**AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents**|Faruk Alpay et.al.|[2605.26269](http://arxiv.org/abs/2605.26269)|null| |**2026-05-24**|**Device Context Protocol: A Compact, Safety-First Architecture for LLM-Driven Control of Constrained Devices**|Dongxu Yang et.al.|[2605.26159](http://arxiv.org/abs/2605.26159)|null| |**2026-05-25**|**LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers**|Lingyao Li et.al.|[2605.25415](http://arxiv.org/abs/2605.25415)|null| |**2026-05-23**|**IterInject: Indirect Prompt Injection Against LLM Agents via Feedback-Guided Iterative Optimization**|Zixuan Chen et.al.|[2605.24659](http://arxiv.org/abs/2605.24659)|null| |**2026-05-23**|**Poisoning the Watchtower: Prompt Injection Attacks Against LLM-Augmented Security Operations Through Adversarial Log Content**|Rohan Pandey et.al.|[2605.24421](http://arxiv.org/abs/2605.24421)|null| |**2026-05-22**|**Prompt Overflow: What the Guardrail Inspects Is Not What the Model Infers**|Yuanbo Zhou et.al.|[2605.23196](http://arxiv.org/abs/2605.23196)|null| |**2026-05-20**|**Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms**|Saurabh Deochake et.al.|[2605.20704](http://arxiv.org/abs/2605.20704)|null| |**2026-05-18**|**On the Geometric Limits of Transformer Defenses against Obfuscation Attacks: Latent Embedding Collapse & Performance Robustness Gap**|Becky Mashaido et.al.|[2605.19159](http://arxiv.org/abs/2605.19159)|null| |**2026-05-18**|**ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense**|Yash Narendra et.al.|[2605.18918](http://arxiv.org/abs/2605.18918)|null| |**2026-05-18**|**An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments**|Hongjang Yang et.al.|[2605.18133](http://arxiv.org/abs/2605.18133)|null| |**2026-05-18**|**Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents**|Ahmad Al-Tawaha et.al.|[2605.17830](http://arxiv.org/abs/2605.17830)|null| |**2026-05-17**|**ADR: An Agentic Detection System for Enterprise Agentic AI Security**|Chenning Li et.al.|[2605.17380](http://arxiv.org/abs/2605.17380)|null| |**2026-05-17**|**ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents**|Udari Madhushani Sehwag et.al.|[2605.17324](http://arxiv.org/abs/2605.17324)|null| |**2026-05-16**|**STRIDE-AI: A Threat Modeling Framework for Generative AI Security Assessment**|Tsafac Nkombong Regine Cyrille et.al.|[2605.17163](http://arxiv.org/abs/2605.17163)|null| |**2026-05-13**|**Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture**|George Koomullil et.al.|[2605.16407](http://arxiv.org/abs/2605.16407)|null| |**2026-05-15**|**FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast**|Igor Bogdanov et.al.|[2605.16233](http://arxiv.org/abs/2605.16233)|null| |**2026-05-18**|**Hidden in Memory: Sleeper Memory Poisoning in LLM Agents**|Sidharth Pulipaka et.al.|[2605.15338](http://arxiv.org/abs/2605.15338)|null| |**2026-05-14**|**Web Agents Should Adopt the Plan-Then-Execute Paradigm**|Julien Piet et.al.|[2605.14290](http://arxiv.org/abs/2605.14290)|null| |**2026-05-13**|**No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills**|Ying Li et.al.|[2605.13044](http://arxiv.org/abs/2605.13044)|null| |**2026-05-12**|**IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection**|Chia-Pei et.al.|[2605.11868](http://arxiv.org/abs/2605.11868)|null| |**2026-05-11**|**The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck**|Linfeng Fan et.al.|[2605.11039](http://arxiv.org/abs/2605.11039)|null| |**2026-05-10**|**AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents**|Yassin H. Rassul et.al.|[2605.11026](http://arxiv.org/abs/2605.11026)|null| |**2026-05-11**|**RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems**|Joel Rorseth et.al.|[2605.10862](http://arxiv.org/abs/2605.10862)|null| |**2026-05-11**|**When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications**|Farzad Nourmohammadzadeh Motlagh et.al.|[2605.10176](http://arxiv.org/abs/2605.10176)|null| |**2026-05-10**|**CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring**|Li Lixing et.al.|[2605.09737](http://arxiv.org/abs/2605.09737)|null| |**2026-05-12**|**When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents**|Strick Sheng et.al.|[2605.08828](http://arxiv.org/abs/2605.08828)|null| |**2026-05-08**|**When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks**|Ziwen Cai et.al.|[2605.08460](http://arxiv.org/abs/2605.08460)|null| |**2026-05-08**|**LLM Advertisement based on Neuron Auctions**|Peiran Yun et.al.|[2605.08326](http://arxiv.org/abs/2605.08326)|null| |**2026-05-07**|**Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks**|Saisai Hu et.al.|[2605.08257](http://arxiv.org/abs/2605.08257)|null| |**2026-05-08**|**MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning**|Al Muhit Muhtadi et.al.|[2605.07269](http://arxiv.org/abs/2605.07269)|null| |**2026-05-21**|**Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs**|Alexandre Cristovão Maiorano et.al.|[2605.06669](http://arxiv.org/abs/2605.06669)|null| |**2026-05-06**|**WAAA! Web Adversaries Against Agentic Browsers**|Sohom Datta et.al.|[2605.05509](http://arxiv.org/abs/2605.05509)|null| |**2026-05-06**|**SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol**|Wonbae Kim et.al.|[2605.05260](http://arxiv.org/abs/2605.05260)|null| |**2026-05-05**|**ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection**|Shihao Weng et.al.|[2605.03378](http://arxiv.org/abs/2605.03378)|null| |**2026-05-07**|**When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI**|Javad Forough et.al.|[2605.03213](http://arxiv.org/abs/2605.03213)|null| |**2026-05-04**|**PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization**|Mingshuo Liu et.al.|[2605.03129](http://arxiv.org/abs/2605.03129)|null| |**2026-05-04**|**When Alignment Isn't Enough: Response-Path Attacks on LLM Agents**|Mingyu Luo et.al.|[2605.02187](http://arxiv.org/abs/2605.02187)|null| |**2026-05-02**|**LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training**|Yuyang Gong et.al.|[2605.01462](http://arxiv.org/abs/2605.01462)|null| |**2026-05-01**|**A Low-Latency Fraud Detection Layer for Detecting Adversarial Interaction Patterns in LLM-Powered Agents**|Sheldon Yu et.al.|[2605.01143](http://arxiv.org/abs/2605.01143)|null| |**2026-04-30**|**FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption**|Yanting Wang et.al.|[2604.28157](http://arxiv.org/abs/2604.28157)|null| |**2026-04-30**|**Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection**|Prashant Kulkarni et.al.|[2604.28129](http://arxiv.org/abs/2604.28129)|null| |**2026-04-29**|**Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives**|Soheil Khodayari et.al.|[2604.27202](http://arxiv.org/abs/2604.27202)|null| |**2026-04-28**|**BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task**|Tosin Adewumi et.al.|[2604.26986](http://arxiv.org/abs/2604.26986)|null| |**2026-04-28**|**Making AI-Assisted Grant Evaluation Auditable without Exposing the Model**|Kemal Bicakci et.al.|[2604.25200](http://arxiv.org/abs/2604.25200)|null| |**2026-05-22**|**SUDP: Secret-Use Delegation Protocol for Agentic Systems**|Xiaohang Yu et.al.|[2604.24920](http://arxiv.org/abs/2604.24920)|null| |**2026-04-27**|**Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models**|Nay Myat Min et.al.|[2604.24542](http://arxiv.org/abs/2604.24542)|null| |**2026-04-27**|**AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization**|Zonghao Ying et.al.|[2604.24118](http://arxiv.org/abs/2604.24118)|null| |**2026-05-12**|**Evaluation of Prompt Injection Defenses in Large Language Models**|Priyal Deep et.al.|[2604.23887](http://arxiv.org/abs/2604.23887)|null| |**2026-04-26**|**When AI reviews science: Can we trust the referee?**|Jialiang Wang et.al.|[2604.23593](http://arxiv.org/abs/2604.23593)|null| |**2026-04-25**|**Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents**|Yuandao Cai et.al.|[2604.23374](http://arxiv.org/abs/2604.23374)|null| |**2026-05-06**|**A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework**|Kexin Chu et.al.|[2604.23338](http://arxiv.org/abs/2604.23338)|null| |**2026-04-25**|**CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning**|Shuxu Chen et.al.|[2604.23270](http://arxiv.org/abs/2604.23270)|null| |**2026-04-24**|**RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents**|Wenjie Xiao et.al.|[2604.22888](http://arxiv.org/abs/2604.22888)|null| |**2026-04-22**|**Anchor-and-Resume Concession Under Dynamic Pricing for LLM-Augmented Freight Negotiation**|Hoang Nguyen et.al.|[2604.20732](http://arxiv.org/abs/2604.20732)|null| |**2026-04-20**|**Owner-Harm: A Missing Threat Model for AI Agent Safety**|Dongcheng Zhang et.al.|[2604.18658](http://arxiv.org/abs/2604.18658)|null| |**2026-05-18**|**Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection**|Thamilvendhan Munirathinam et.al.|[2604.18248](http://arxiv.org/abs/2604.18248)|null| |**2026-04-19**|**SafeAgent: A Runtime Protection Architecture for Agentic Systems**|Hailin Liu et.al.|[2604.17562](http://arxiv.org/abs/2604.17562)|null| |**2026-04-18**|**The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration**|Jiayuan Liu et.al.|[2604.17139](http://arxiv.org/abs/2604.17139)|null| |**2026-04-18**|**CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems**|İpek Abasıkeleş Turgut et.al.|[2604.17125](http://arxiv.org/abs/2604.17125)|null| |**2026-04-16**|**HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?**|Yukun Jiang et.al.|[2604.15415](http://arxiv.org/abs/2604.15415)|null| |**2026-04-15**|**LogJack: Indirect Prompt Injection Through Cloud Logs Against LLM Debugging Agents**|Harsh Shah et.al.|[2604.15368](http://arxiv.org/abs/2604.15368)|null| |**2026-04-14**|**DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection**|Junyu Ren et.al.|[2604.12548](http://arxiv.org/abs/2604.12548)|null| |**2026-04-14**|**TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs**|Qingchao Shen et.al.|[2604.12232](http://arxiv.org/abs/2604.12232)|null| |**2026-04-14**|**Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference**|Anes Abdennebi et.al.|[2604.12168](http://arxiv.org/abs/2604.12168)|null| |**2026-05-11**|**ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection**|Wei Zhao et.al.|[2604.11790](http://arxiv.org/abs/2604.11790)|null| |**2026-04-11**|**STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems**|Guijia Zhang et.al.|[2604.10286](http://arxiv.org/abs/2604.10286)|null| |**2026-04-11**|**PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification**|Guangyu Gong et.al.|[2604.10134](http://arxiv.org/abs/2604.10134)|null| |**2026-04-08**|**TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation**|Hengkai Ye et.al.|[2604.07536](http://arxiv.org/abs/2604.07536)|null| |**2026-04-08**|**TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories**|Yen-Shan Chen et.al.|[2604.07223](http://arxiv.org/abs/2604.07223)|null| |**2026-04-08**|**SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills**|Yinghan Hou et.al.|[2604.06550](http://arxiv.org/abs/2604.06550)|null| |**2026-04-11**|**The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?**|Manish Bhatt et.al.|[2604.06436](http://arxiv.org/abs/2604.06436)|null| |**2026-04-06**|**Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering**|Purva Chiniya et.al.|[2604.05179](http://arxiv.org/abs/2604.05179)|null| |**2026-04-06**|**Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation**|Geert Trooskens et.al.|[2604.05150](http://arxiv.org/abs/2604.05150)|null| |**2026-04-06**|**ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems**|Zhuowen Yuan et.al.|[2604.04426](http://arxiv.org/abs/2604.04426)|null| |**2026-04-15**|**LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories**|Fariha Tanjim Shifat et.al.|[2604.04288](http://arxiv.org/abs/2604.04288)|null| |**2026-04-05**|**Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework**|Dalal Alharthi et.al.|[2604.03912](http://arxiv.org/abs/2604.03912)|null| |**2026-04-04**|**Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs**|Wenhui Zhu et.al.|[2604.03870](http://arxiv.org/abs/2604.03870)|null| |**2026-04-04**|**AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models**|Jackson Wang et.al.|[2604.03598](http://arxiv.org/abs/2604.03598)|null| |**2026-04-03**|**Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study**|Zhihao Chen et.al.|[2604.03070](http://arxiv.org/abs/2604.03070)|null| |**2026-04-03**|**LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation**|Yilin Xiao et.al.|[2604.02954](http://arxiv.org/abs/2604.02954)|null| |**2026-03-31**|**KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents**|Cormac Guerin et.al.|[2604.02375](http://arxiv.org/abs/2604.02375)|null| |**2026-04-04**|**ClawSafety: "Safe" LLMs, Unsafe Agents**|Bowen Wei et.al.|[2604.01438](http://arxiv.org/abs/2604.01438)|null| |**2026-04-01**|**AgentWatcher: A Rule-based Prompt Injection Monitor**|Yanting Wang et.al.|[2604.01194](http://arxiv.org/abs/2604.01194)|null| |**2026-03-28**|**SafeClaw-R: Towards Safe and Secure Multi-Agent Personal Assistants**|Haoyu Wang et.al.|[2603.28807](http://arxiv.org/abs/2603.28807)|null| |**2026-03-30**|**Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code**|Zihao Xu et.al.|[2603.28345](http://arxiv.org/abs/2603.28345)|null| |**2026-04-20**|**Evaluating Privilege Usage of Agents with Real-World Tools**|Quan Zhang et.al.|[2603.28166](http://arxiv.org/abs/2603.28166)|null| |**2026-04-09**|**Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers**|Haochuan Kevin Wang et.al.|[2603.28013](http://arxiv.org/abs/2603.28013)|null| |**2026-05-13**|**A Security Analysis of the OpenClaw AI Agent Framework**|Surada Suwansathit et.al.|[2603.27517](http://arxiv.org/abs/2603.27517)|null| |**2026-03-26**|**Prompt Attack Detection with LLM-as-a-Judge and Mixture-of-Models**|Hieu Xuan Le et.al.|[2603.25176](http://arxiv.org/abs/2603.25176)|null| |**2026-03-26**|**PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems**|Haozhen Wang et.al.|[2603.25164](http://arxiv.org/abs/2603.25164)|null| |**2026-03-25**|**Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs**|Alexander Panfilov et.al.|[2603.24511](http://arxiv.org/abs/2603.24511)|null| |**2026-03-25**|**Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search**|Yulin Shen et.al.|[2603.24203](http://arxiv.org/abs/2603.24203)|null| |**2026-03-24**|**The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense**|Qianlong Lan et.al.|[2603.23791](http://arxiv.org/abs/2603.23791)|null| |**2026-03-24**|**SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy**|Ali Dehghantanha et.al.|[2603.22928](http://arxiv.org/abs/2603.22928)|null| |**2026-03-30**|**LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface**|Michael Hind et.al.|[2603.22519](http://arxiv.org/abs/2603.22519)|null| |**2026-03-23**|**Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning**|Charoes Huang et.al.|[2603.22489](http://arxiv.org/abs/2603.22489)|null| |**2026-03-23**|**SecureBreak -- A dataset towards safe and secure models**|Marco Arazzi et.al.|[2603.21975](http://arxiv.org/abs/2603.21975)|null| |**2026-03-23**|**Are AI-assisted Development Tools Immune to Prompt Injection?**|Charoes Huang et.al.|[2603.21642](http://arxiv.org/abs/2603.21642)|null| |**2026-03-21**|**Detection of adversarial intent in Human-AI teams using LLMs**|Abed K. Musaffar et.al.|[2603.20976](http://arxiv.org/abs/2603.20976)|null| |**2026-03-20**|**The production of meaning in the processing of natural language**|Christopher J. Agostino et.al.|[2603.20381](http://arxiv.org/abs/2603.20381)|null| |**2026-03-20**|**Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance**|Fazhong Liu et.al.|[2603.19974](http://arxiv.org/abs/2603.19974)|null| |**2026-03-19**|**A Framework for Formalizing LLM Agent Security**|Vincent Siu et.al.|[2603.19469](http://arxiv.org/abs/2603.19469)|null| |**2026-03-19**|**The Autonomy Tax: Defense Training Breaks LLM Agents**|Shawn Li et.al.|[2603.19423](http://arxiv.org/abs/2603.19423)|null| |**2026-03-19**|**Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems**|Md Takrim Ul Alam et.al.|[2603.18433](http://arxiv.org/abs/2603.18433)|null| |**2026-03-18**|**MCP-38: A Comprehensive Threat Taxonomy for Model Context Protocol Systems (v1.0)**|Yi Ting Shen et.al.|[2603.18063](http://arxiv.org/abs/2603.18063)|null| |**2026-03-18**|**VeriGrey: Greybox Agent Validation**|Yuntong Zhang et.al.|[2603.17639](http://arxiv.org/abs/2603.17639)|null| |**2026-03-17**|**CoMAI: A Collaborative Multi-Agent Framework for Robust and Equitable Interview Evaluation**|Gengxin Sun et.al.|[2603.16215](http://arxiv.org/abs/2603.16215)|null| |**2026-03-16**|**How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition**|Mateusz Dziemian et.al.|[2603.15714](http://arxiv.org/abs/2603.15714)|null| |**2026-03-16**|**Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities**|Vanshaj Khattar et.al.|[2603.15417](http://arxiv.org/abs/2603.15417)|null| |**2026-03-13**|**Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection**|Darren Cheng et.al.|[2603.13424](http://arxiv.org/abs/2603.13424)|null| |**2026-03-13**|**PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses**|Chenlong Yin et.al.|[2603.13026](http://arxiv.org/abs/2603.13026)|null| |**2026-03-13**|**Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw**|Zonghao Ying et.al.|[2603.12644](http://arxiv.org/abs/2603.12644)|null| |**2026-04-15**|**Prompt Injection as Role Confusion**|Charles Ye et.al.|[2603.12277](http://arxiv.org/abs/2603.12277)|null| |**2026-03-12**|**OpenClaw PRISM: A Zero-Fork, Defense-in-Depth Runtime Security Layer for Tool-Augmented LLM Agents**|Frank Li et.al.|[2603.11853](http://arxiv.org/abs/2603.11853)|null| |**2026-03-12**|**Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats**|Xinhao Deng et.al.|[2603.11619](http://arxiv.org/abs/2603.11619)|null| |**2026-04-17**|**Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover**|Indranil Halder et.al.|[2603.11331](http://arxiv.org/abs/2603.11331)|null| |**2026-03-11**|**AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations**|Yu He et.al.|[2603.10749](http://arxiv.org/abs/2603.10749)|null| |**2026-03-11**|**IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs**|Chuan Guo et.al.|[2603.10521](http://arxiv.org/abs/2603.10521)|null| |**2026-03-10**|**Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities**|Nanzi Yang et.al.|[2603.10163](http://arxiv.org/abs/2603.10163)|null| |**2026-03-10**|**Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice**|Yuxu Ge et.al.|[2603.07191](http://arxiv.org/abs/2603.07191)|null| |**2026-02-02**|**vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM**|Ching-Yun Ko et.al.|[2603.06588](http://arxiv.org/abs/2603.06588)|null| |**2026-03-04**|**Beyond Input Guardrails: Reconstructing Cross-Agent Semantic Flows for Execution-Aware Attack Detection**|Yangyang Wei et.al.|[2603.04469](http://arxiv.org/abs/2603.04469)|null| |**2026-05-14**|**Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks**|Junjie Chu et.al.|[2603.04459](http://arxiv.org/abs/2603.04459)|null| |**2026-02-22**|**AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems**|Emmanuel Bamidele et.al.|[2603.04443](http://arxiv.org/abs/2603.04443)|null| |**2026-03-04**|**Goal-Driven Risk Assessment for LLM-Powered Systems: A Healthcare Case Study**|Neha Nagaraja et.al.|[2603.03633](http://arxiv.org/abs/2603.03633)|null| |**2026-03-03**|**Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use**|Aradhye Agarwal et.al.|[2603.03205](http://arxiv.org/abs/2603.03205)|null| |**2026-03-02**|**DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern**|Xiaoyi Pang et.al.|[2603.01574](http://arxiv.org/abs/2603.01574)|null| |**2026-02-26**|**Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection**|Marcus Graves et.al.|[2603.00164](http://arxiv.org/abs/2603.00164)|null| |**2026-02-26**|**AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification**|Tian Zhang et.al.|[2602.22724](http://arxiv.org/abs/2602.22724)|null| |**2026-02-25**|**Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace**|Qianlong Lan et.al.|[2602.22450](http://arxiv.org/abs/2602.22450)|null| |**2026-02-24**|**Analysis of LLMs Against Prompt Injection and Jailbreak Attacks**|Piyush Jaiswal et.al.|[2602.22242](http://arxiv.org/abs/2602.22242)|null| |**2026-02-24**|**SoK: Agentic Skills -- Beyond Tool Use in LLM Agents**|Yanna Jiang et.al.|[2602.20867](http://arxiv.org/abs/2602.20867)|null| |**2026-02-24**|**AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs**|Che Wang et.al.|[2602.20720](http://arxiv.org/abs/2602.20720)|null| |**2026-02-25**|**Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks**|David Schmotz et.al.|[2602.20156](http://arxiv.org/abs/2602.20156)|null| |**2026-02-23**|**The LLMbda Calculus: AI Agents, Conversations, and Information Flow**|Zac Garby et.al.|[2602.20064](http://arxiv.org/abs/2602.20064)|null| |**2026-02-23**|**CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents**|Lei Ba et.al.|[2602.19547](http://arxiv.org/abs/2602.19547)|null| |**2026-02-19**|**Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models**|Manuel Wirth et.al.|[2602.18514](http://arxiv.org/abs/2602.18514)|null| |**2026-02-18**|**The Vulnerability of LLM Rankers to Prompt Injection Attacks**|Yu Yin et.al.|[2602.16752](http://arxiv.org/abs/2602.16752)|null| |**2026-02-19**|**Policy Compiler for Secure Agentic Systems**|Nils Palumbo et.al.|[2602.16708](http://arxiv.org/abs/2602.16708)|null| |**2026-01-26**|**From Transcripts to AI Agents: Knowledge Extraction, RAG Integration, and Robust Evaluation of Conversational AI Assistants**|Krittin Pachtrachai et.al.|[2602.15859](http://arxiv.org/abs/2602.15859)|null| |**2026-05-16**|**SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents**|Xiaojun Jia et.al.|[2602.14211](http://arxiv.org/abs/2602.14211)|null| |**2026-02-15**|**When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift**|Max Fomin et.al.|[2602.14161](http://arxiv.org/abs/2602.14161)|null| |**2026-02-21**|**AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks**|Yuqi Jia et.al.|[2602.13597](http://arxiv.org/abs/2602.13597)|null| |**2026-02-25**|**OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage**|Akshat Naik et.al.|[2602.13477](http://arxiv.org/abs/2602.13477)|null| |**2026-01-21**|**From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness**|Linbo Cao et.al.|[2602.12285](http://arxiv.org/abs/2602.12285)|null| |**2026-03-05**|**Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection**|J Alex Corll et.al.|[2602.11247](http://arxiv.org/abs/2602.11247)|null| |**2026-02-13**|**Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System**|Zhenhua Zou et.al.|[2602.10915](http://arxiv.org/abs/2602.10915)|null| |**2026-02-11**|**When Skills Lie: Hidden-Comment Injection in LLM Agents**|Qianli Wang et.al.|[2602.10498](http://arxiv.org/abs/2602.10498)|null| |**2026-02-11**|**Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI**|Mohan Rajagopalan et.al.|[2602.10481](http://arxiv.org/abs/2602.10481)|null| |**2026-02-11**|**The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis**|Peiran Wang et.al.|[2602.10453](http://arxiv.org/abs/2602.10453)|null| |**2026-02-09**|**MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks**|Georgios Syros et.al.|[2602.09222](http://arxiv.org/abs/2602.09222)|null| |**2026-02-08**|**Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation**|Shayan Ali Hassan et.al.|[2602.08062](http://arxiv.org/abs/2602.08062)|null| |**2026-02-07**|**AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management**|Ruoyao Wen et.al.|[2602.07398](http://arxiv.org/abs/2602.07398)|**[link](https://github.com/ruoyaow/agentsys-memory)**| |**2026-02-07**|**When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified**|Gautam Siddharth Kashyap et.al.|[2602.07381](http://arxiv.org/abs/2602.07381)|null| |**2026-02-06**|**MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs**|Junhyeok Lee et.al.|[2602.06268](http://arxiv.org/abs/2602.06268)|null| |**2026-02-05**|**Learning to Inject: Automated Prompt Injection via Reinforcement Learning**|Xin Chen et.al.|[2602.05746](http://arxiv.org/abs/2602.05746)|**[link](https://github.com/zj-jayzhang/AutoInject)**| |**2026-02-25**|**Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks**|Jafar Isbarov et.al.|[2602.05066](http://arxiv.org/abs/2602.05066)|null| |**2026-02-02**|**Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework**|Alsharif Abuadbba et.al.|[2602.01942](http://arxiv.org/abs/2602.01942)|null| |**2026-02-02**|**RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse**|Mingrui Liu et.al.|[2602.01795](http://arxiv.org/abs/2602.01795)|null| |**2026-02-01**|**Context Dependence and Reliability in Autoregressive Language Models**|Poushali Sengupta et.al.|[2602.01378](http://arxiv.org/abs/2602.01378)|null| |**2026-02-01**|**SMCP: Secure Model Context Protocol**|Xinyi Hou et.al.|[2602.01129](http://arxiv.org/abs/2602.01129)|null| |**2026-04-01**|**Bypassing Prompt Injection Detectors through Evasive Injections**|Md Jahedur Rahman et.al.|[2602.00750](http://arxiv.org/abs/2602.00750)|null| |**2026-01-30**|**Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection**|Tanusree Debi et.al.|[2601.22569](http://arxiv.org/abs/2601.22569)|null| |**2026-01-29**|**A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy**|Pedro H. Barcha Correia et.al.|[2601.22240](http://arxiv.org/abs/2601.22240)|null| |**2026-02-06**|**OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence**|Jarrod Barnes et.al.|[2601.21083](http://arxiv.org/abs/2601.21083)|null| |**2026-01-27**|**Proactive Hardening of LLM Defenses with HASTE**|Henry Chen et.al.|[2601.19051](http://arxiv.org/abs/2601.19051)|null| |**2026-01-25**|**Prompt Injection Evaluations: Refusal Boundary Instability and Artifact-Dependent Compliance in GPT-4-Series Models**|Thomas Heverin et.al.|[2601.17911](http://arxiv.org/abs/2601.17911)|null| |**2026-01-24**|**Breaking the Protocol: Security Analysis of the Model Context Protocol Specification and Prompt Injection Vulnerabilities in Tool-Integrated LLM Agents**|Narek Maloyan et.al.|[2601.17549](http://arxiv.org/abs/2601.17549)|null| |**2026-01-22**|**Machine-Assisted Grading of Nationwide School-Leaving Essay Exams with LLMs and Statistical NLP**|Andres Karjus et.al.|[2601.16314](http://arxiv.org/abs/2601.16314)|null| |**2026-01-21**|**Securing LLM-as-a-Service for Small Businesses: An Industry Case Study of a Distributed Chatbot Deployment Platform**|Jiazhu Xie et.al.|[2601.15528](http://arxiv.org/abs/2601.15528)|null| |**2026-01-20**|**PINA: Prompt Injection Attack against Navigation Agents**|Jiani Liu et.al.|[2601.13612](http://arxiv.org/abs/2601.13612)|null| |**2026-01-19**|**Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching**|Diego Gosmar et.al.|[2601.13186](http://arxiv.org/abs/2601.13186)|null| |**2026-01-18**|**Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents**|Arunkumar V et.al.|[2601.12560](http://arxiv.org/abs/2601.12560)|null| |**2026-01-18**|**AgenTRIM: Tool Risk Mitigation for Agentic AI**|Roy Betser et.al.|[2601.12449](http://arxiv.org/abs/2601.12449)|null| |**2026-01-18**|**Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs**|Anirudh Sekar et.al.|[2601.12359](http://arxiv.org/abs/2601.12359)|null| |**2026-01-16**|**SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation**|Aiman Al Masoud et.al.|[2601.11199](http://arxiv.org/abs/2601.11199)|null| |**2026-01-15**|**Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale**|Yi Liu et.al.|[2601.10338](http://arxiv.org/abs/2601.10338)|null| |**2026-01-15**|**ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack**|Hao Li et.al.|[2601.10173](http://arxiv.org/abs/2601.10173)|null| |**2026-01-15**|**ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback**|Yutao Mou et.al.|[2601.10156](http://arxiv.org/abs/2601.10156)|null| |**2026-02-10**|**The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism**|Oleg Brodt et.al.|[2601.09625](http://arxiv.org/abs/2601.09625)|null| |**2025-12-21**|**Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness**|Haotian Deng et.al.|[2601.08843](http://arxiv.org/abs/2601.08843)|null| |**2026-01-13**|**BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts**|Erin Feiglin et.al.|[2601.08490](http://arxiv.org/abs/2601.08490)|null| |**2026-01-09**|**FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments**|Zhi Yang et.al.|[2601.07853](http://arxiv.org/abs/2601.07853)|null| |**2026-01-12**|**When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent**|Xinyi Wu et.al.|[2601.07263](http://arxiv.org/abs/2601.07263)|null| |**2026-01-12**|**Defenses Against Prompt Attacks Learn Surface Heuristics**|Shawn Li et.al.|[2601.07185](http://arxiv.org/abs/2601.07185)|null| |**2026-01-11**|**Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems**|Hongyan Chang et.al.|[2601.07072](http://arxiv.org/abs/2601.07072)|null| |**2026-01-11**|**Paraphrasing Adversarial Attack on LLM-as-a-Reviewer**|Masahiro Kaneko et.al.|[2601.06884](http://arxiv.org/abs/2601.06884)|null| |**2026-01-14**|**VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit**|Junda Lin et.al.|[2601.05755](http://arxiv.org/abs/2601.05755)|null| |**2026-01-08**|**Defense Against Indirect Prompt Injection via Tool Result Parsing**|Qiang Yu et.al.|[2601.04795](http://arxiv.org/abs/2601.04795)|null| |**2026-01-08**|**Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning**|Zhiyuan Chang et.al.|[2601.04666](http://arxiv.org/abs/2601.04666)|null| |**2026-01-08**|**Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries**|Saad Alqithami et.al.|[2601.04583](http://arxiv.org/abs/2601.04583)|null| |**2026-02-24**|**What Matters For Safety Alignment?**|Xing Li et.al.|[2601.03868](http://arxiv.org/abs/2601.03868)|null| |**2026-03-20**|**Hidden State Poisoning Attacks against Mamba-based Language Models**|Alexandre Le Mercier et.al.|[2601.01972](http://arxiv.org/abs/2601.01972)|null| |**2026-01-03**|**MCP-SandboxScan: WASM-based Secure Execution and Runtime Analysis for MCP Tools**|Zhuoran Tan et.al.|[2601.01241](http://arxiv.org/abs/2601.01241)|null| |**2025-12-30**|**The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models**|Giuseppe Canale et.al.|[2601.00867](http://arxiv.org/abs/2601.00867)|null| |**2025-12-30**|**Language Model Agents Under Attack: A Cross Model-Benchmark of Profit-Seeking Behaviors in Customer Service**|Jingyu Zhang et.al.|[2512.24415](http://arxiv.org/abs/2512.24415)|null| |**2025-12-29**|**Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing**|Panagiotis Theocharopoulos et.al.|[2512.23684](http://arxiv.org/abs/2512.23684)|null| |**2025-12-29**|**It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents**|Karolina Korgul et.al.|[2512.23128](http://arxiv.org/abs/2512.23128)|null| |**2025-12-24**|**AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs**|Yihan Wang et.al.|[2512.20986](http://arxiv.org/abs/2512.20986)|null| |**2025-12-25**|**ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected**|Kanchon Gharami et.al.|[2512.20405](http://arxiv.org/abs/2512.20405)|null| |**2026-01-05**|**AprielGuard**|Jaykumar Kasundra et.al.|[2512.20293](http://arxiv.org/abs/2512.20293)|null| |**2026-01-09**|**PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline**|Akshaj Prashanth Rao et.al.|[2512.19011](http://arxiv.org/abs/2512.19011)|null| |**2025-12-18**|**MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval**|Saksham Sahai Srivastava et.al.|[2512.16962](http://arxiv.org/abs/2512.16962)|null| |**2025-12-18**|**Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks**|Safwan Shaheer et.al.|[2512.16307](http://arxiv.org/abs/2512.16307)|null| |**2025-12-17**|**Quantifying Return on Security Controls in LLM Systems**|Richard Helder Moulton et.al.|[2512.15081](http://arxiv.org/abs/2512.15081)|null| |**2025-12-16**|**Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks**|Viet K. Nguyen et.al.|[2512.14860](http://arxiv.org/abs/2512.14860)|null| |**2025-12-14**|**Detecting Prompt Injection Attacks Against Application Using Classifiers**|Safwan Shaheer et.al.|[2512.12583](http://arxiv.org/abs/2512.12583)|null| |**2026-01-06**|**When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection**|Devanshu Sahoo et.al.|[2512.10449](http://arxiv.org/abs/2512.10449)|null| |**2025-12-14**|**Phishing Email Detection Using Large Language Models**|Najmul Hasan et.al.|[2512.10104](http://arxiv.org/abs/2512.10104)|null| |**2025-12-15**|**ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data**|Reachal Wang et.al.|[2512.09321](http://arxiv.org/abs/2512.09321)|null| |**2025-12-09**|**Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy**|Botao 'Amber' Hu et.al.|[2512.08737](http://arxiv.org/abs/2512.08737)|null| |**2025-12-11**|**Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs**|Yinan Zhong et.al.|[2512.08417](http://arxiv.org/abs/2512.08417)|null| |**2025-12-13**|**Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem**|Shiva Gaire et.al.|[2512.08290](http://arxiv.org/abs/2512.08290)|null| |**2026-02-09**|**SoK: Trust-Authorization Mismatch in LLM Agent Interactions**|Guanquan Shi et.al.|[2512.06914](http://arxiv.org/abs/2512.06914)|null| |**2026-01-23**|**Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents**|Zhibo Liang et.al.|[2512.06716](http://arxiv.org/abs/2512.06716)|null| |**2025-12-06**|**Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks**|Saeid Jamshidi et.al.|[2512.06556](http://arxiv.org/abs/2512.06556)|null| |**2025-12-01**|**Securing Large Language Models (LLMs) from Prompt Injection Attacks**|Omar Farooq Khan Suri et.al.|[2512.01326](http://arxiv.org/abs/2512.01326)|null| |**2025-11-30**|**Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis**|Mintong Kang et.al.|[2512.00966](http://arxiv.org/abs/2512.00966)|null| |**2025-11-28**|**Are LLMs Good Safety Agents or a Propaganda Engine?**|Neemesh Yadav et.al.|[2511.23174](http://arxiv.org/abs/2511.23174)|null| |**2026-02-03**|**Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification**|Yanxi Li et.al.|[2511.21752](http://arxiv.org/abs/2511.21752)|null| |**2025-11-24**|**Prompt Fencing: A Cryptographic Approach to Establishing Security Boundaries in Large Language Model Prompts**|Steven Peh et.al.|[2511.19727](http://arxiv.org/abs/2511.19727)|null| |**2025-11-23**|**Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation**|Qingsong He et.al.|[2511.19483](http://arxiv.org/abs/2511.19483)|null| |**2025-11-19**|**Securing AI Agents Against Prompt Injection Attacks**|Badrinath Ramakrishnan et.al.|[2511.15759](http://arxiv.org/abs/2511.15759)|null| |**2025-11-19**|**Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks**|Zimo Ji et.al.|[2511.15203](http://arxiv.org/abs/2511.15203)|null| |**2025-11-15**|**Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification**|Hasini Jayathilaka et.al.|[2511.12295](http://arxiv.org/abs/2511.12295)|null| |**2025-11-13**|**PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization**|Runpeng Geng et.al.|[2511.10720](http://arxiv.org/abs/2511.10720)|null| |**2025-11-09**|**RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework**|Seif Ikbarieh et.al.|[2511.06212](http://arxiv.org/abs/2511.06212)|null| |**2026-03-24**|**Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs**|Alina Fastowski et.al.|[2511.05919](http://arxiv.org/abs/2511.05919)|null| |**2026-01-22**|**Can LLM Infer Risk Information From MCP Server System Logs?**|Jiayi Fu et.al.|[2511.05867](http://arxiv.org/abs/2511.05867)|null| |**2025-11-08**|**When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins**|Yigitcan Kaya et.al.|[2511.05797](http://arxiv.org/abs/2511.05797)|null| |**2026-02-18**|**Reasoning Up the Instruction Ladder for Controllable Language Models**|Zishuo Zheng et.al.|[2511.04694](http://arxiv.org/abs/2511.04694)|null| |**2025-11-06**|**Large Language Models for Cyber Security**|Raunak Somani et.al.|[2511.04508](http://arxiv.org/abs/2511.04508)|null| |**2025-11-05**|**Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond**|Botao 'Amber' Hu et.al.|[2511.03434](http://arxiv.org/abs/2511.03434)|null| |**2025-11-05**|**Death by a Thousand Prompts: Open Model Vulnerability Analysis**|Amy Chang et.al.|[2511.03247](http://arxiv.org/abs/2511.03247)|null| |**2025-11-11**|**Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models**|Daniyal Ganiuly et.al.|[2511.01634](http://arxiv.org/abs/2511.01634)|null| |**2025-11-18**|**DRIP: Defending Prompt Injection via Token-wise Representation Editing and Residual Instruction Fusion**|Ruofan Liu et.al.|[2511.00447](http://arxiv.org/abs/2511.00447)|null| |**2025-10-30**|**Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections**|David Schmotz et.al.|[2510.26328](http://arxiv.org/abs/2510.26328)|null| |**2025-10-26**|**Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks**|Md. Mehedi Hasan et.al.|[2510.22628](http://arxiv.org/abs/2510.22628)|null| |**2026-01-17**|**Soft Instruction De-escalation Defense**|Nils Philipp Walter et.al.|[2510.21057](http://arxiv.org/abs/2510.21057)|null| |**2025-10-20**|**CourtGuard: A Local, Multiagent Prompt Injection Classifier**|Isaac Wu et.al.|[2510.19844](http://arxiv.org/abs/2510.19844)|null| |**2026-02-04**|**Defending Against Prompt Injection with DataFilter**|Yizhu Wang et.al.|[2510.19207](http://arxiv.org/abs/2510.19207)|null| |**2025-10-29**|**OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models**|Thomas Wang et.al.|[2510.19169](http://arxiv.org/abs/2510.19169)|null| |**2025-10-18**|**ATA: A Neuro-Symbolic Approach to Implement Autonomous and Trustworthy Agents**|David Peer et.al.|[2510.16381](http://arxiv.org/abs/2510.16381)|null| |**2026-03-24**|**MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents**|Dongsen Zhang et.al.|[2510.15994](http://arxiv.org/abs/2510.15994)|null| |**2026-01-24**|**PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features**|Wei Zou et.al.|[2510.14005](http://arxiv.org/abs/2510.14005)|null| |**2025-10-15**|**In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers**|Avihay Cohen et.al.|[2510.13543](http://arxiv.org/abs/2510.13543)|null| |**2025-10-17**|**PromptLocate: Localizing Prompt Injection Attacks**|Yuqi Jia et.al.|[2510.12252](http://arxiv.org/abs/2510.12252)|null| |**2026-03-02**|**Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols**|Mikhail Terekhov et.al.|[2510.09462](http://arxiv.org/abs/2510.09462)|null| |**2025-10-10**|**Exploiting Web Search Tools of AI Agents for Data Exfiltration**|Dennis Rall et.al.|[2510.09093](http://arxiv.org/abs/2510.09093)|null| |**2025-10-10**|**The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections**|Milad Nasr et.al.|[2510.09023](http://arxiv.org/abs/2510.09023)|null| |**2025-10-09**|**CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization**|Debeshee Das et.al.|[2510.08829](http://arxiv.org/abs/2510.08829)|null| |**2025-10-07**|**Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling**|Mary Llewellyn et.al.|[2510.05709](http://arxiv.org/abs/2510.05709)|null| |**2025-10-06**|**Adversarial Reinforcement Learning for Large Language Model Agent Safety**|Zizhao Wang et.al.|[2510.05442](http://arxiv.org/abs/2510.05442)|null| |**2025-10-09**|**Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis**|Joachim Diederich et.al.|[2510.05106](http://arxiv.org/abs/2510.05106)|null| |**2025-10-06**|**RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection**|Yuxin Wen et.al.|[2510.04885](http://arxiv.org/abs/2510.04885)|null| |**2025-10-06**|**Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers**|Santhosh KumarRavindran et.al.|[2510.04528](http://arxiv.org/abs/2510.04528)|null| |**2025-10-05**|**VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy**|Yu Cui et.al.|[2510.04261](http://arxiv.org/abs/2510.04261)|null| |**2025-10-04**|**Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods**|Yulin Chen et.al.|[2510.03705](http://arxiv.org/abs/2510.03705)|null| |**2025-10-03**|**FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents**|Imene Kerboua et.al.|[2510.03204](http://arxiv.org/abs/2510.03204)|null| |**2025-10-02**|**AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning**|Zhenyu Pan et.al.|[2510.01586](http://arxiv.org/abs/2510.01586)|null| |**2025-10-01**|**A Call to Action for a Secure-by-Design Generative AI Paradigm**|Dalal Alharthi et.al.|[2510.00451](http://arxiv.org/abs/2510.00451)|null| |**2025-09-30**|**Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models**|Matheus Vinicius da Silva de Oliveira et.al.|[2509.26584](http://arxiv.org/abs/2509.26584)|null| |**2025-09-30**|**Better Privilege Separation for Agents by Restricting Data Types**|Dennis Jacob et.al.|[2509.25926](http://arxiv.org/abs/2509.25926)|null| |**2025-10-01**|**Fingerprinting LLMs via Prompt Injection**|Yuepeng Hu et.al.|[2509.25448](http://arxiv.org/abs/2509.25448)|null| |**2025-11-14**|**SecInfer: Preventing Prompt Injection via Inference-time Scaling**|Yupei Liu et.al.|[2509.24967](http://arxiv.org/abs/2509.24967)|null| |**2026-01-29**|**SafeSearch: Automated Red-Teaming of LLM-Based Search Agents**|Jianshuo Dong et.al.|[2509.23694](http://arxiv.org/abs/2509.23694)|null| |**2026-02-15**|**ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search**|Zeyu Shen et.al.|[2509.23519](http://arxiv.org/abs/2509.23519)|null| |**2026-01-30**|**ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents**|Hwan Chang et.al.|[2509.22830](http://arxiv.org/abs/2509.22830)|null| |**2025-09-26**|**"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors**|Yue Liu et.al.|[2509.22040](http://arxiv.org/abs/2509.22040)|null| |**2025-09-23**|**LLMZ+: Contextual Prompt Whitelist Principles for Agentic LLMs**|Tom Pawelek et.al.|[2509.18557](http://arxiv.org/abs/2509.18557)|null| |**2025-09-22**|**D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models**|Satyapriya Krishna et.al.|[2509.17938](http://arxiv.org/abs/2509.17938)|null| |**2025-09-23**|**SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models**|Haotian Xu et.al.|[2509.17371](http://arxiv.org/abs/2509.17371)|null| |**2025-09-18**|**Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems**|Diego Gosmar et.al.|[2509.14956](http://arxiv.org/abs/2509.14956)|null| |**2025-12-17**|**A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks**|S M Asif Hossain et.al.|[2509.14285](http://arxiv.org/abs/2509.14285)|null| |**2025-09-15**|**Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models**|Gustavo Sandoval et.al.|[2509.14271](http://arxiv.org/abs/2509.14271)|null| |**2025-09-16**|**Agentic JWT: A Secure Delegation Protocol for Autonomous AI Agents**|Abhishek Goswami et.al.|[2509.13597](http://arxiv.org/abs/2509.13597)|null| |**2025-09-14**|**Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications**|Aadil Gani Ganie et.al.|[2509.11431](http://arxiv.org/abs/2509.11431)|null| |**2025-09-25**|**Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications**|Janis Keuper et.al.|[2509.10248](http://arxiv.org/abs/2509.10248)|null| |**2025-09-12**|**When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review**|Changjia Zhu et.al.|[2509.09912](http://arxiv.org/abs/2509.09912)|null| |**2025-09-10**|**Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations**|Ron F. Del Rosario et.al.|[2509.08646](http://arxiv.org/abs/2509.08646)|null| |**2025-09-09**|**Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling**|Minghui Li et.al.|[2509.07617](http://arxiv.org/abs/2509.07617)|null| |**2025-11-11**|**Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization**|Ishaan Verma et.al.|[2509.05831](http://arxiv.org/abs/2509.05831)|null| |**2026-01-04**|**Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment**|Yuchong Xie et.al.|[2509.05755](http://arxiv.org/abs/2509.05755)|null| |**2026-02-28**|**BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints**|Waris Gill et.al.|[2509.05608](http://arxiv.org/abs/2509.05608)|null| |**2025-09-02**|**Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety**|Wenxiao Zhang et.al.|[2509.02163](http://arxiv.org/abs/2509.02163)|null| |**2025-08-29**|**A Whole New World: Creating a Parallel-Poisoned Web Only AI-Agents Can See**|Shaked Zychlinski et.al.|[2509.00124](http://arxiv.org/abs/2509.00124)|null| |**2025-10-09**|**AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema**|Ting-Chun Liu et.al.|[2509.00088](http://arxiv.org/abs/2509.00088)|null| |**2025-11-15**|**Cybersecurity AI: Hacking the AI Hackers via Prompt Injection**|Víctor Mayoral-Vilches et.al.|[2508.21669](http://arxiv.org/abs/2508.21669)|null| |**2025-09-16**|**PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance**|Mengxiao Wang et.al.|[2508.20890](http://arxiv.org/abs/2508.20890)|null| |**2026-03-30**|**Misleading Large Language Models used (or misused) in Scientific Peer-Reviewing via Hidden Prompt-Injection Attacks**|Matteo Gioele Collu et.al.|[2508.20863](http://arxiv.org/abs/2508.20863)|null| |**2025-08-26**|**Reliable Weak-to-Strong Monitoring of LLM Agents**|Neil Kale et.al.|[2508.19461](http://arxiv.org/abs/2508.19461)|null| |**2025-08-25**|**Tricking LLM-Based NPCs into Spilling Secrets**|Kyohei Shiomi et.al.|[2508.19288](http://arxiv.org/abs/2508.19288)|null| |**2025-10-23**|**PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents**|Toby Murray et.al.|[2508.17884](http://arxiv.org/abs/2508.17884)|null| |**2025-08-23**|**Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents**|Derek Lilienthal et.al.|[2508.17155](http://arxiv.org/abs/2508.17155)|null| |**2025-08-21**|**IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents**|Hengyu An et.al.|[2508.15310](http://arxiv.org/abs/2508.15310)|**[link](https://github.com/Greysahy/ipiguard)**| |**2025-11-10**|**EMNLP: Educator-role Moral and Normative Large Language Models Profiling**|Yilin Jiang et.al.|[2508.15250](http://arxiv.org/abs/2508.15250)|null| |**2025-08-19**|**CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection**|Jiaming Hu et.al.|[2508.14128](http://arxiv.org/abs/2508.14128)|null| |**2025-08-16**|**Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions**|Xuyang Guo et.al.|[2508.13214](http://arxiv.org/abs/2508.13214)|null| |**2025-08-16**|**Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous**|Ben Nassi et.al.|[2508.12175](http://arxiv.org/abs/2508.12175)|null| |**2026-03-25**|**SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication**|Ruijia Zhang et.al.|[2508.11733](http://arxiv.org/abs/2508.11733)|null| |**2026-01-08**|**MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI**|Wenpeng Xing et.al.|[2508.10991](http://arxiv.org/abs/2508.10991)|null| |**2025-08-18**|**Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs**|Aayush Gupta et.al.|[2508.09288](http://arxiv.org/abs/2508.09288)|null| |**2025-08-11**|**BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks**|Rui Miao et.al.|[2508.08127](http://arxiv.org/abs/2508.08127)|null| |**2025-08-08**|**Quantifying Conversation Drift in MCP via Latent Polytope**|Haoran Shi et.al.|[2508.06418](http://arxiv.org/abs/2508.06418)|null| |**2026-02-28**|**Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy**|Jairo Gudiño-Rosero et.al.|[2508.04281](http://arxiv.org/abs/2508.04281)|null| |**2025-08-05**|**AttnTrace: Attention-based Context Traceback for Long-Context LLMs**|Yanting Wang et.al.|[2508.03793](http://arxiv.org/abs/2508.03793)|null| |**2025-10-01**|**Defend LLMs Through Self-Consciousness**|Boshi Huang et.al.|[2508.02961](http://arxiv.org/abs/2508.02961)|null| |**2025-08-15**|**AgentSight: System-Level Observability for AI Agents Using eBPF**|Yusheng Zheng et.al.|[2508.02736](http://arxiv.org/abs/2508.02736)|null| |**2025-08-04**|**A Survey on Data Security in Large Language Models**|Kang Chen et.al.|[2508.02312](http://arxiv.org/abs/2508.02312)|null| |**2026-01-07**|**Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools**|Kanghua Mo et.al.|[2508.02110](http://arxiv.org/abs/2508.02110)|null| |**2025-11-18**|**AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection**|Peiran Wang et.al.|[2508.01249](http://arxiv.org/abs/2508.01249)|null| |**2025-08-01**|**LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks**|Francesco Panebianco et.al.|[2508.00602](http://arxiv.org/abs/2508.00602)|null| |**2026-02-23**|**Role-Aware Language Models for Secure and Contextualized Access Control in Organizations**|Saeed Almheiri et.al.|[2507.23465](http://arxiv.org/abs/2507.23465)|null| |**2025-12-12**|**Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems**|Lijia Liu et.al.|[2507.23453](http://arxiv.org/abs/2507.23453)|null| |**2025-07-28**|**Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition**|Andy Zou et.al.|[2507.20526](http://arxiv.org/abs/2507.20526)|null| |**2025-07-21**|**Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems**|Andrii Balashov et.al.|[2507.15613](http://arxiv.org/abs/2507.15613)|null| |**2025-07-21**|**PromptArmor: Simple yet Effective Prompt Injection Defenses**|Tianneng Shi et.al.|[2507.15219](http://arxiv.org/abs/2507.15219)|null| |**2025-07-20**|**Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree**|Sam Johnson et.al.|[2507.14799](http://arxiv.org/abs/2507.14799)|null| |**2025-10-04**|**TopicAttack: An Indirect Prompt Injection Attack via Topic Transition**|Yulin Chen et.al.|[2507.13686](http://arxiv.org/abs/2507.13686)|null| |**2025-07-17**|**Prompt Injection 2.0: Hybrid AI Threats**|Jeremy McHugh et.al.|[2507.13169](http://arxiv.org/abs/2507.13169)|null| |**2025-07-17**|**MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems**|Yu Cui et.al.|[2507.13038](http://arxiv.org/abs/2507.13038)|null| |**2026-01-21**|**Large Language Models Encode Semantics and Alignment in Linearly Separable Representations**|Baturay Saglam et.al.|[2507.09709](http://arxiv.org/abs/2507.09709)|null| |**2025-08-25**|**Defending Against Prompt Injection With a Few DefensiveTokens**|Sizhe Chen et.al.|[2507.07974](http://arxiv.org/abs/2507.07974)|null| |**2025-12-17**|**May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks**|Nishit V. Pandya et.al.|[2507.07417](http://arxiv.org/abs/2507.07417)|null| |**2025-11-04**|**The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover**|Matteo Lupinacci et.al.|[2507.06850](http://arxiv.org/abs/2507.06850)|null| |**2025-07-08**|**Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms**|Tarek Gasmi et.al.|[2507.06323](http://arxiv.org/abs/2507.06323)|null| |**2025-07-08**|**Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review**|Zhicheng Lin et.al.|[2507.06185](http://arxiv.org/abs/2507.06185)|null| |**2025-12-06**|**How Not to Detect Prompt Injections with an LLM**|Sarthak Choudhary et.al.|[2507.05630](http://arxiv.org/abs/2507.05630)|null| |**2026-02-01**|**DP-Fusion: Token-Level Differentially Private Inference for Large Language Models**|Rushil Thareja et.al.|[2507.04531](http://arxiv.org/abs/2507.04531)|**[link](https://github.com/MBZUAI-Trustworthy-ML/DP-Fusion-DPI)**| |**2025-07-03**|**Dynamic Long Short-Term Memory Based Memory Storage For Long Horizon LLM Interaction**|Yuyang Lou et.al.|[2507.03042](http://arxiv.org/abs/2507.03042)|null| |**2025-07-07**|**LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users**|Almog Hilel et.al.|[2507.02850](http://arxiv.org/abs/2507.02850)|null| |**2026-02-06**|**Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks**|Sizhe Chen et.al.|[2507.02735](http://arxiv.org/abs/2507.02735)|**[link](https://github.com/facebookresearch/Meta_SecAlign)**| |**2025-07-02**|**Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach**|Shuangquan Lyu et.al.|[2507.00601](http://arxiv.org/abs/2507.00601)|null| |**2025-12-14**|**From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows**|Mohamed Amine Ferrag et.al.|[2506.23260](http://arxiv.org/abs/2506.23260)|null| |**2025-06-23**|**Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems**|Valerii Gakh et.al.|[2506.19109](http://arxiv.org/abs/2506.19109)|null| |**2025-06-18**|**Context manipulation attacks : Web agents are susceptible to corrupted memory**|Atharv Singh Patlan et.al.|[2506.17318](http://arxiv.org/abs/2506.17318)|null| |**2025-10-29**|**OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents**|Thomas Kuntz et.al.|[2506.14866](http://arxiv.org/abs/2506.14866)|null| |**2025-06-17**|**AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models**|Ads Dawson et.al.|[2506.14682](http://arxiv.org/abs/2506.14682)|null| |**2026-03-26**|**DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents**|Hao Li et.al.|[2506.12104](http://arxiv.org/abs/2506.12104)|null| |**2025-06-11**|**LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge**|Sahar Abdelnabi et.al.|[2506.09956](http://arxiv.org/abs/2506.09956)|null| |**2025-06-27**|**Design Patterns for Securing LLM Agents against Prompt Injections**|Luca Beurer-Kellner et.al.|[2506.08837](http://arxiv.org/abs/2506.08837)|null| |**2025-06-09**|**TokenBreak: Bypassing Text Classification Models Through Token Manipulation**|Kasimir Schulz et.al.|[2506.07948](http://arxiv.org/abs/2506.07948)|null| |**2025-06-05**|**Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering**|Yi Ji et.al.|[2506.06384](http://arxiv.org/abs/2506.06384)|null| |**2025-06-06**|**To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt**|Zhilong Wang et.al.|[2506.05739](http://arxiv.org/abs/2506.05739)|null| |**2025-06-05**|**Sentinel: SOTA model to protect against prompt injections**|Dror Ivry et.al.|[2506.05446](http://arxiv.org/abs/2506.05446)|null| |**2025-06-26**|**TracLLM: A Generic Framework for Attributing Long Context LLMs**|Yanting Wang et.al.|[2506.04202](http://arxiv.org/abs/2506.04202)|null| |**2025-06-03**|**ATAG: AI-Agent Application Threat Assessment with Attack Graphs**|Parth Atulbhai Gandhi et.al.|[2506.02859](http://arxiv.org/abs/2506.02859)|null| |**2025-06-01**|**Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution**|Meysam Alizadeh et.al.|[2506.01055](http://arxiv.org/abs/2506.01055)|null| |**2025-05-30**|**Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems**|Chris M. Ward et.al.|[2506.00281](http://arxiv.org/abs/2506.00281)|null| |**2025-05-30**|**SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems**|Xu He et.al.|[2505.24201](http://arxiv.org/abs/2505.24201)|null| |**2025-05-29**|**LLM Agents Should Employ Security Principles**|Kaiyuan Zhang et.al.|[2505.24019](http://arxiv.org/abs/2505.24019)|null| |**2025-05-28**|**Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment**|Krti Tallam et.al.|[2505.22852](http://arxiv.org/abs/2505.22852)|null| |**2025-10-09**|**The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology**|Aideen Fay et.al.|[2505.20435](http://arxiv.org/abs/2505.20435)|null| |**2026-03-09**|**Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations**|Sanjay Kariyappa et.al.|[2505.18907](http://arxiv.org/abs/2505.18907)|null| |**2025-05-23**|**A Critical Evaluation of Defenses against Prompt Injection Attacks**|Yuqi Jia et.al.|[2505.18333](http://arxiv.org/abs/2505.18333)|null| |**2025-08-11**|**Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration**|Tatia Tsmindashvili et.al.|[2505.17066](http://arxiv.org/abs/2505.17066)|null| |**2025-05-22**|**In-Context Watermarks for Large Language Models**|Yepeng Liu et.al.|[2505.16934](http://arxiv.org/abs/2505.16934)|null| |**2025-10-16**|**Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses**|Xiaoxue Yang et.al.|[2505.15738](http://arxiv.org/abs/2505.15738)|null| |**2025-09-30**|**Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries**|Yuhao Wang et.al.|[2505.15420](http://arxiv.org/abs/2505.15420)|null| |**2026-01-26**|**Can Large Language Models Really Recognize Your Name?**|Dzung Pham et.al.|[2505.14549](http://arxiv.org/abs/2505.14549)|null| |**2025-05-20**|**Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs**|Jiawen Wang et.al.|[2505.14368](http://arxiv.org/abs/2505.14368)|null| |**2025-05-19**|**Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks**|Narek Maloyan et.al.|[2505.13348](http://arxiv.org/abs/2505.13348)|null| |**2025-05-19**|**The Hidden Dangers of Browsing AI Agents**|Mykyta Mudryi et.al.|[2505.13076](http://arxiv.org/abs/2505.13076)|null| |**2025-06-17**|**CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement**|Gauri Kholkar et.al.|[2505.12368](http://arxiv.org/abs/2505.12368)|null| |**2025-05-18**|**The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models**|Linghan Huang et.al.|[2505.12287](http://arxiv.org/abs/2505.12287)|null| |**2025-09-17**|**Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data**|Adel ElZemity et.al.|[2505.09974](http://arxiv.org/abs/2505.09974)|null| |**2025-05-10**|**Practical Reasoning Interruption Attacks on Reasoning Large Language Models**|Yu Cui et.al.|[2505.06643](http://arxiv.org/abs/2505.06643)|null| |**2025-10-19**|**System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection**|Zongze Li et.al.|[2505.06493](http://arxiv.org/abs/2505.06493)|null| |**2025-09-17**|**Defending against Indirect Prompt Injection by Instruction Detection**|Tongyu Wen et.al.|[2505.06311](http://arxiv.org/abs/2505.06311)|null| |**2025-06-14**|**AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents**|Zhun Wang et.al.|[2505.05849](http://arxiv.org/abs/2505.05849)|null| |**2025-05-13**|**Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs**|Chetan Pathade et.al.|[2505.04806](http://arxiv.org/abs/2505.04806)|null| |**2026-03-02**|**Maris: A Formally Verifiable Privacy Policy Enforcement Paradigm for Multi-Agent Collaboration Systems**|Jian Cui et.al.|[2505.04799](http://arxiv.org/abs/2505.04799)|null| |**2025-05-07**|**A Proposal for Evaluating the Operational Risk for ChatBots based on Large Language Models**|Pedro Pinacho-Davidson et.al.|[2505.04784](http://arxiv.org/abs/2505.04784)|null| |**2025-05-06**|**LlamaFirewall: An open source guardrail system for building secure AI agents**|Sahana Chennabasappa et.al.|[2505.03574](http://arxiv.org/abs/2505.03574)|null| |**2025-05-01**|**OET: Optimization-based prompt injection Evaluation Toolkit**|Jinsheng Pan et.al.|[2505.00843](http://arxiv.org/abs/2505.00843)|null| |**2025-05-05**|**The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)**|Zihao Wang et.al.|[2505.00626](http://arxiv.org/abs/2505.00626)|null| |**2025-12-12**|**CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks**|Rui Wang et.al.|[2504.21228](http://arxiv.org/abs/2504.21228)|null| |**2025-09-10**|**ACE: A Security Architecture for LLM-Integrated App Systems**|Evan Li et.al.|[2504.20984](http://arxiv.org/abs/2504.20984)|null| |**2025-04-29**|**Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption**|Wenxiao Wang et.al.|[2504.20769](http://arxiv.org/abs/2504.20769)|null| |**2025-04-29**|**Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression**|Yu Cui et.al.|[2504.20493](http://arxiv.org/abs/2504.20493)|null| |**2025-04-29**|**Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction**|Yulin Chen et.al.|[2504.20472](http://arxiv.org/abs/2504.20472)|null| |**2025-08-24**|**Prompt Injection Attack to Tool Selection in LLM Agents**|Jiawen Shi et.al.|[2504.19793](http://arxiv.org/abs/2504.19793)|null| |**2025-08-21**|**Security Steerability is All You Need**|Itay Hazan et.al.|[2504.19521](http://arxiv.org/abs/2504.19521)|null| |**2025-04-27**|**Small Models, Big Tasks: An Exploratory Empirical Study on Small Language Models for Function Calling**|Ishan Kavathekar et.al.|[2504.19277](http://arxiv.org/abs/2504.19277)|null| |**2025-04-25**|**Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections**|Narek Maloyan et.al.|[2504.18333](http://arxiv.org/abs/2504.18333)|null| |**2025-04-20**|**Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection**|Xiangyu Chang et.al.|[2504.16125](http://arxiv.org/abs/2504.16125)|null| |**2025-04-14**|**You've Changed: Detecting Modification of Black-Box Large Language Models**|Alden Dima et.al.|[2504.12335](http://arxiv.org/abs/2504.12335)|null| |**2025-08-30**|**Progent: Programmable Privilege Control for LLM Agents**|Tianneng Shi et.al.|[2504.11703](http://arxiv.org/abs/2504.11703)|null| |**2025-11-12**|**DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks**|Yupei Liu et.al.|[2504.11358](http://arxiv.org/abs/2504.11358)|null| |**2025-07-14**|**Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems**|William Hackett et.al.|[2504.11168](http://arxiv.org/abs/2504.11168)|null| |**2025-04-14**|**StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models**|Yang Feng et.al.|[2504.09841](http://arxiv.org/abs/2504.09841)|null| |**2025-04-17**|**ControlNET: A Firewall for RAG-based LLM System**|Hongwei Yao et.al.|[2504.09593](http://arxiv.org/abs/2504.09593)|null| |**2025-04-10**|**Defense against Prompt Injection Attacks via Mixture of Encodings**|Ruiyi Zhang et.al.|[2504.07467](http://arxiv.org/abs/2504.07467)|null| |**2025-04-08**|**Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators**|Xitao Li et.al.|[2504.05689](http://arxiv.org/abs/2504.05689)|null| |**2025-03-29**|**Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions**|Shih-Han Chan et.al.|[2503.23250](http://arxiv.org/abs/2503.23250)|null| |**2025-03-27**|**Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection**|Ryan Marinelli et.al.|[2503.21464](http://arxiv.org/abs/2503.21464)|null| |**2025-06-24**|**Defeating Prompt Injections by Design**|Edoardo Debenedetti et.al.|[2503.18813](http://arxiv.org/abs/2503.18813)|null| |**2025-05-19**|**Detecting LLM-Generated Peer Reviews**|Vishisht Rao et.al.|[2503.15772](http://arxiv.org/abs/2503.15772)|null| |**2025-09-12**|**Multi-Agent Systems Execute Arbitrary Malicious Code**|Harold Triedman et.al.|[2503.12188](http://arxiv.org/abs/2503.12188)|null| |**2026-02-09**|**ASIDE: Architectural Separation of Instructions and Data in Language Models**|Egor Zverev et.al.|[2503.10566](http://arxiv.org/abs/2503.10566)|null| |**2025-09-17**|**CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning**|Adel ElZemity et.al.|[2503.09334](http://arxiv.org/abs/2503.09334)|null| |**2025-07-04**|**Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States**|Xin Wei Chia et.al.|[2503.09066](http://arxiv.org/abs/2503.09066)|null| |**2025-03-04**|**Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents**|Qiusi Zhan et.al.|[2503.00061](http://arxiv.org/abs/2503.00061)|null| |**2025-10-04**|**Can Indirect Prompt Injection Attacks Be Detected and Removed?**|Yulin Chen et.al.|[2502.16580](http://arxiv.org/abs/2502.16580)|null| |**2025-04-07**|**A Cautionary Tale About "Neutrally" Informative AI Tools Ahead of the 2025 Federal Elections in Germany**|Ina Dormuth et.al.|[2502.15568](http://arxiv.org/abs/2502.15568)|null| |**2025-02-19**|**What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis**|Peiran Wang et.al.|[2502.13490](http://arxiv.org/abs/2502.13490)|null| |**2025-02-18**|**UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models**|Huawei Lin et.al.|[2502.13141](http://arxiv.org/abs/2502.13141)|null| |**2025-11-17**|**The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1**|Kaiwen Zhou et.al.|[2502.12659](http://arxiv.org/abs/2502.12659)|null| |**2025-02-16**|**G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems**|Shilong Wang et.al.|[2502.11127](http://arxiv.org/abs/2502.11127)|null| |**2025-02-16**|**Prompt Inject Detection with Generative Explanation as an Investigative Tool**|Jonathan Pan et.al.|[2502.11006](http://arxiv.org/abs/2502.11006)|null| |**2025-02-14**|**RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage**|Peter Yong Zhong et.al.|[2502.08966](http://arxiv.org/abs/2502.08966)|null| |**2025-06-10**|**MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents**|Kaijie Zhu et.al.|[2502.05174](http://arxiv.org/abs/2502.05174)|null| |**2025-11-23**|**Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation**|Youngjoon Lee et.al.|[2501.18416](http://arxiv.org/abs/2501.18416)|null| |**2025-04-12**|**PromptShield: Deployable Detection for Prompt Injection Attacks**|Dennis Jacob et.al.|[2501.15145](http://arxiv.org/abs/2501.15145)|null| |**2025-01-21**|**An Empirically-grounded tool for Automatic Prompt Linting and Repair: A Case Study on Bias, Vulnerability, and Optimization in Developer Prompts**|Dhia Elhaq Rzig et.al.|[2501.12521](http://arxiv.org/abs/2501.12521)|null| |**2025-05-10**|**Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface**|Andrey Labunets et.al.|[2501.09798](http://arxiv.org/abs/2501.09798)|null| |**2024-12-24**|**Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning**|Alex Beutel et.al.|[2412.18693](http://arxiv.org/abs/2412.18693)|null| |**2024-12-21**|**The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents**|Feiran Jia et.al.|[2412.16682](http://arxiv.org/abs/2412.16682)|null| |**2024-12-18**|**Lightweight Safety Classification Using Pruned Language Models**|Mason Sawtell et.al.|[2412.13435](http://arxiv.org/abs/2412.13435)|null| |**2025-06-12**|**Towards Action Hijacking of Large Language Model-based Agent**|Yuyang Zhang et.al.|[2412.10807](http://arxiv.org/abs/2412.10807)|null| |**2024-12-08**|**Trust No AI: Prompt Injection Along The CIA Security Triad**|Johann Rehberger et.al.|[2412.06090](http://arxiv.org/abs/2412.06090)|null| |**2025-06-30**|**Trust & Safety of LLMs and LLMs in Trust & Safety**|Doohee You et.al.|[2412.02113](http://arxiv.org/abs/2412.02113)|null| |**2024-12-02**|**Improved Large Language Model Jailbreak Detection via Pretrained Embeddings**|Erick Galinkin et.al.|[2412.01547](http://arxiv.org/abs/2412.01547)|null| |**2024-11-25**|**Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective**|Jean Marie Tshimula et.al.|[2411.16642](http://arxiv.org/abs/2411.16642)|null| |**2024-11-22**|**Universal and Context-Independent Triggers for Precise Control of LLM Outputs**|Jiashuo Liang et.al.|[2411.14738](http://arxiv.org/abs/2411.14738)|null| |**2025-03-30**|**Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors**|Yuefeng Peng et.al.|[2411.01705](http://arxiv.org/abs/2411.01705)|null| |**2025-08-02**|**Defense Against Prompt Injection Attack by Leveraging Attack Techniques**|Yulin Chen et.al.|[2411.00459](http://arxiv.org/abs/2411.00459)|**[link](https://github.com/LukeChen-go/pia-defense-by-attack)**| |**2025-04-23**|**Attention Tracker: Detecting Prompt Injection Attacks in LLMs**|Kuo-Han Hung et.al.|[2411.00348](http://arxiv.org/abs/2411.00348)|**[link](https://github.com/khhung-906/Attention-Tracker)**| |**2024-10-28**|**Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures**|Victoria Benjamin et.al.|[2410.23308](http://arxiv.org/abs/2410.23308)|null| |**2025-03-30**|**InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models**|Hao Li et.al.|[2410.22770](http://arxiv.org/abs/2410.22770)|null| |**2024-10-29**|**Embedding-based classifiers can detect prompt injection attacks**|Md. Ahsan Ayub et.al.|[2410.22284](http://arxiv.org/abs/2410.22284)|null| |**2024-11-25**|**FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks**|Jiongxiao Wang et.al.|[2410.21492](http://arxiv.org/abs/2410.21492)|null| |**2024-11-07**|**Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection**|Md Abdur Rahman et.al.|[2410.21337](http://arxiv.org/abs/2410.21337)|null| |**2024-10-27**|**LLM Robustness Against Misinformation in Biomedical Question Answering**|Alexander Bondarenko et.al.|[2410.21330](http://arxiv.org/abs/2410.21330)|null| |**2024-10-28**|**Palisade -- Prompt Injection Detection Framework**|Sahasra Kokkula et.al.|[2410.21146](http://arxiv.org/abs/2410.21146)|null| |**2024-11-18**|**Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks**|Dario Pasquini et.al.|[2410.20911](http://arxiv.org/abs/2410.20911)|null| |**2024-10-23**|**Countering Autonomous Cyber Threats**|Kade M. Heckel et.al.|[2410.18312](http://arxiv.org/abs/2410.18312)|null| |**2026-01-27**|**MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control**|Juyong Lee et.al.|[2410.17520](http://arxiv.org/abs/2410.17520)|null| |**2024-10-22**|**Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In**|Itay Nakash et.al.|[2410.16950](http://arxiv.org/abs/2410.16950)|null| |**2024-10-21**|**SMILES-Prompting: A Novel Approach to LLM Jailbreak Attacks in Chemical Synthesis**|Aidan Wong et.al.|[2410.15641](http://arxiv.org/abs/2410.15641)|null| |**2025-09-15**|**Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment**|Zedian Shao et.al.|[2410.14827](http://arxiv.org/abs/2410.14827)|null| |**2024-10-18**|**Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models**|Cody Clop et.al.|[2410.14479](http://arxiv.org/abs/2410.14479)|null| |**2025-02-10**|**LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild**|Reworr et.al.|[2410.13919](http://arxiv.org/abs/2410.13919)|null| |**2024-10-17**|**SPIN: Self-Supervised Prompt INjection**|Leon Zhou et.al.|[2410.13236](http://arxiv.org/abs/2410.13236)|null| |**2024-10-17**|**Data Defenses Against Large Language Models**|William Agnew et.al.|[2410.13138](http://arxiv.org/abs/2410.13138)|**[link](https://github.com/Noykarde/NoykardeRepository)**| |**2025-07-02**|**Large Language Models, and LLM-Based Agents, Should Be Used to Enhance the Digital Public Sphere**|Seth Lazar et.al.|[2410.12123](http://arxiv.org/abs/2410.12123)|null| |**2025-03-01**|**Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy**|Tong Wu et.al.|[2410.09102](http://arxiv.org/abs/2410.09102)|null| |**2024-10-14**|**F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents**|Yupeng Ren et.al.|[2410.08776](http://arxiv.org/abs/2410.08776)|null| |**2024-10-09**|**Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems**|Donghyun Lee et.al.|[2410.07283](http://arxiv.org/abs/2410.07283)|null| |**2025-07-03**|**SecAlign: Defending Against Prompt Injection with Preference Optimization**|Sizhe Chen et.al.|[2410.05451](http://arxiv.org/abs/2410.05451)|**[link](https://github.com/facebookresearch/SecAlign)**| |**2024-10-07**|**A test suite of prompt injection attacks for LLM-based machine translation**|Antonio Valerio Miceli-Barone et.al.|[2410.05047](http://arxiv.org/abs/2410.05047)|null| |**2025-04-09**|**LLM Safeguard is a Double-Edged Sword: Exploiting False Positives for Denial-of-Service Attacks**|Qingzhao Zhang et.al.|[2410.02916](http://arxiv.org/abs/2410.02916)|null| |**2025-05-30**|**Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents**|Hanrong Zhang et.al.|[2410.02644](http://arxiv.org/abs/2410.02644)|**[link](https://github.com/TongWang121/agent-security-bench)**| |**2024-09-29**|**GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks**|Rongchang Li et.al.|[2409.19521](http://arxiv.org/abs/2409.19521)|null| |**2024-10-10**|**System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective**|Fangzhou Wu et.al.|[2409.19091](http://arxiv.org/abs/2409.19091)|null| |**2025-04-03**|**PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs**|Jiahao Yu et.al.|[2409.14729](http://arxiv.org/abs/2409.14729)|null| |**2024-09-20**|**Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection**|Md Abdur Rahman et.al.|[2409.13331](http://arxiv.org/abs/2409.13331)|null| |**2025-11-25**|**Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks**|Benji Peng et.al.|[2409.08087](http://arxiv.org/abs/2409.08087)|null| |**2025-02-07**|**Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles**|Zhilong Wang et.al.|[2408.11182](http://arxiv.org/abs/2408.11182)|null| |**2025-01-31**|**Evaluating LLM-based Personal Information Extraction and Countermeasures**|Yupei Liu et.al.|[2408.07291](http://arxiv.org/abs/2408.07291)|null| |**2024-08-12**|**Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks**|Gianluca De Stefano et.al.|[2408.05025](http://arxiv.org/abs/2408.05025)|null| |**2024-08-19**|**WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models**|Prannaya Gupta et.al.|[2408.03837](http://arxiv.org/abs/2408.03837)|null| |**2025-03-31**|**TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration**|Siqi Gu et.al.|[2408.03095](http://arxiv.org/abs/2408.03095)|null| |**2024-08-01**|**WHITE PAPER: A Brief Exploration of Data Exfiltration using GCG Suffixes**|Victor Valbuena et.al.|[2408.00925](http://arxiv.org/abs/2408.00925)|null| |**2024-11-17**|**Blockchain for Large Language Model Security and Safety: A Holistic Survey**|Caleb Geren et.al.|[2407.20181](http://arxiv.org/abs/2407.20181)|null| |**2025-06-05**|**Scaling Trends in Language Model Robustness**|Nikolaus Howe et.al.|[2407.18213](http://arxiv.org/abs/2407.18213)|null| |**2024-07-25**|**Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context**|Nilanjana Das et.al.|[2407.14644](http://arxiv.org/abs/2407.14644)|null| |**2024-07-18**|**LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation**|David Schlangen et.al.|[2407.13744](http://arxiv.org/abs/2407.13744)|null| |**2025-07-22**|**ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs**|Yuchen Yang et.al.|[2407.09164](http://arxiv.org/abs/2407.09164)|null| |**2024-07-03**|**Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning**|Simon Ostermann et.al.|[2407.03391](http://arxiv.org/abs/2407.03391)|null| |**2024-12-06**|**Monitoring Latent World States in Language Models with Propositional Probes**|Jiahai Feng et.al.|[2406.19501](http://arxiv.org/abs/2406.19501)|null| |**2024-06-20**|**Prompt Injection Attacks in Defended Systems**|Daniil Khomsky et.al.|[2406.14048](http://arxiv.org/abs/2406.14048)|null| |**2024-11-24**|**AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents**|Edoardo Debenedetti et.al.|[2406.13352](http://arxiv.org/abs/2406.13352)|null| |**2024-06-11**|**Knowledge Return Oriented Prompting (KROP)**|Jason Martin et.al.|[2406.11880](http://arxiv.org/abs/2406.11880)|null| |**2024-06-16**|**Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications**|Stephen Burabari Tete et.al.|[2406.11007](http://arxiv.org/abs/2406.11007)|null| |**2025-02-05**|**SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner**|Xunguang Wang et.al.|[2406.05498](http://arxiv.org/abs/2406.05498)|null| |**2024-09-25**|**Ranking Manipulation for Conversational Search Engines**|Samuel Pfrommer et.al.|[2406.03589](http://arxiv.org/abs/2406.03589)|null| |**2025-03-06**|**Get my drift? Catching LLM Task Drift with Activation Deltas**|Sahar Abdelnabi et.al.|[2406.00799](http://arxiv.org/abs/2406.00799)|null| |**2024-06-01**|**Exploring Vulnerabilities and Protections in Large Language Models: A Survey**|Frank Weizhen Liu et.al.|[2406.00240](http://arxiv.org/abs/2406.00240)|null| |**2024-05-31**|**Preemptive Answer "Attacks" on Chain-of-Thought Reasoning**|Rongwu Xu et.al.|[2405.20902](http://arxiv.org/abs/2405.20902)|null| |**2024-05-23**|**Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models**|Johan S Daniel et.al.|[2405.14490](http://arxiv.org/abs/2405.14490)|null| |**2025-01-17**|**Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities**|Mohamed Amine Ferrag et.al.|[2405.12750](http://arxiv.org/abs/2405.12750)|null| |**2024-04-19**|**The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions**|Eric Wallace et.al.|[2404.13208](http://arxiv.org/abs/2404.13208)|null| |**2024-04-19**|**CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models**|Manish Bhatt et.al.|[2404.13161](http://arxiv.org/abs/2404.13161)|null| |**2024-11-09**|**Goal-guided Generative Prompt Injection Attack on Large Language Models**|Chong Zhang et.al.|[2404.07234](http://arxiv.org/abs/2404.07234)|null| |**2024-09-09**|**Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes**|Divyanshu Kumar et.al.|[2404.04392](http://arxiv.org/abs/2404.04392)|null| |**2025-08-24**|**Optimization-based Prompt Injection Attack to LLM-as-a-Judge**|Jiawen Shi et.al.|[2403.17710](http://arxiv.org/abs/2403.17710)|**[link](https://github.com/ShiJiawenwen/JudgeDeceiver)**| |**2024-03-20**|**Defending Against Indirect Prompt Injection Attacks With Spotlighting**|Keegan Hines et.al.|[2403.14720](http://arxiv.org/abs/2403.14720)|**[link](https://github.com/realArcherL/spotlighting-datamarking)**| |**2024-03-26**|**SelfIE: Self-Interpretation of Large Language Model Embeddings**|Haozhe Chen et.al.|[2403.10949](http://arxiv.org/abs/2403.10949)|**[link](https://github.com/tonychenxyz/selfie)**| |**2024-03-14**|**Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks**|Zhifan Sun et.al.|[2403.09832](http://arxiv.org/abs/2403.09832)|**[link](https://github.com/Avmb/MT_Scaling_Prompt_Injection)**| |**2024-03-19**|**Review of Generative AI Methods in Cybersecurity**|Yagmur Yigit et.al.|[2403.08701](http://arxiv.org/abs/2403.08701)|null| |**2024-03-12**|**Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models**|Andrew Parry et.al.|[2403.07654](http://arxiv.org/abs/2403.07654)|null| |**2025-01-31**|**Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?**|Egor Zverev et.al.|[2403.06833](http://arxiv.org/abs/2403.06833)|**[link](https://github.com/egozverev/Should-It-Be-Executed-Or-Processed)**| |**2024-03-07**|**Automatic and Universal Prompt Injection Attacks against Large Language Models**|Xiaogeng Liu et.al.|[2403.04957](http://arxiv.org/abs/2403.04957)|null| |**2024-08-04**|**InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents**|Qiusi Zhan et.al.|[2403.02691](http://arxiv.org/abs/2403.02691)|null| |**2024-10-06**|**Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems**|Zhenting Qi et.al.|[2402.17840](http://arxiv.org/abs/2402.17840)|null| |**2024-02-15**|**AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns**|Ashfak Md Shibli et.al.|[2402.09728](http://arxiv.org/abs/2402.09728)|null| |**2024-09-25**|**StruQ: Defending Against Prompt Injection with Structured Queries**|Sizhe Chen et.al.|[2402.06363](http://arxiv.org/abs/2402.06363)|null| |**2024-02-08**|**In-Context Learning Can Re-learn Forbidden Tasks**|Sophie Xhonneux et.al.|[2402.05723](http://arxiv.org/abs/2402.05723)|null| |**2024-01-31**|**An Early Categorization of Prompt Injection Attacks on Large Language Models**|Sippo Rossi et.al.|[2402.00898](http://arxiv.org/abs/2402.00898)|null| |**2024-10-15**|**Mitigating the Influence of Distractor Tasks in LMs with Prior-Aware Decoding**|Raymond Douglas et.al.|[2401.17692](http://arxiv.org/abs/2401.17692)|null| |**2024-07-10**|**The Ethics of Interaction: Mitigating Security Threats in LLMs**|Ashutosh Kumar et.al.|[2401.12273](http://arxiv.org/abs/2401.12273)|null| |**2025-03-18**|**AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models**|Dong Shu et.al.|[2401.09002](http://arxiv.org/abs/2401.09002)|null| |**2024-01-15**|**Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications**|Xuchen Suo et.al.|[2401.07612](http://arxiv.org/abs/2401.07612)|null| |**2024-01-02**|**A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models**|Daniel Wankit Yip et.al.|[2401.00991](http://arxiv.org/abs/2401.00991)|null| |**2024-01-08**|**Jatmo: Prompt Injection Defense by Task-Specific Finetuning**|Julien Piet et.al.|[2312.17673](http://arxiv.org/abs/2312.17673)|null| |**2025-01-27**|**Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models**|Jingwei Yi et.al.|[2312.14197](http://arxiv.org/abs/2312.14197)|null| |**2023-12-12**|**Maatphor: Automated Variant Analysis for Prompt Injection Attacks**|Ahmed Salem et.al.|[2312.11513](http://arxiv.org/abs/2312.11513)|null| |**2023-12-16**|**Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries**|Poorna Chander Reddy Puttaparthi et.al.|[2312.10524](http://arxiv.org/abs/2312.10524)|null| |**2023-12-13**|**Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models**|Alexandre Variengien et.al.|[2312.10091](http://arxiv.org/abs/2312.10091)|null| |**2023-11-30**|**ArthModel: Enhance Arithmetic Skills to Large Language Model**|Yingdi Guo et.al.|[2311.18609](http://arxiv.org/abs/2311.18609)|null| |**2024-03-03**|**Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition**|Sander Schulhoff et.al.|[2311.16119](http://arxiv.org/abs/2311.16119)|null| |**2024-05-25**|**Assessing Prompt Injection Risks in 200+ Custom GPTs**|Jiahao Yu et.al.|[2311.11538](http://arxiv.org/abs/2311.11538)|null| |**2025-05-29**|**Hijacking Large Language Models via Adversarial In-Context Learning**|Xiangyu Zhou et.al.|[2311.09948](http://arxiv.org/abs/2311.09948)|null| |**2023-11-02**|**Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game**|Sam Toyer et.al.|[2311.01011](http://arxiv.org/abs/2311.01011)|null| |**2025-11-12**|**Formalizing and Benchmarking Prompt Injection Attacks and Defenses**|Yupei Liu et.al.|[2310.12815](http://arxiv.org/abs/2310.12815)|null| |**2025-02-27**|**Demystifying RCE Vulnerabilities in LLM-Integrated Apps**|Tong Liu et.al.|[2309.02926](http://arxiv.org/abs/2309.02926)|null| |**2023-11-25**|**Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection**|Zekun Li et.al.|[2308.10819](http://arxiv.org/abs/2308.10819)|null| |**2024-05-15**|**"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models**|Xinyue Shen et.al.|[2308.03825](http://arxiv.org/abs/2308.03825)|null| |**2025-01-27**|**From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?**|Rodrigo Pedro et.al.|[2308.01990](http://arxiv.org/abs/2308.01990)|null| |**2024-04-03**|**Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection**|Jun Yan et.al.|[2307.16888](http://arxiv.org/abs/2307.16888)|null| |**2023-06-15**|**Safeguarding Crowdsourcing Surveys from ChatGPT with Prompt Injection**|Chaofan Wang et.al.|[2306.08833](http://arxiv.org/abs/2306.08833)|null| |**2025-12-29**|**Prompt Injection attack against LLM-integrated Applications**|Yi Liu et.al.|[2306.05499](http://arxiv.org/abs/2306.05499)|null| |**2023-05-05**|**Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection**|Kai Greshake et.al.|[2302.12173](http://arxiv.org/abs/2302.12173)|null| |**2022-10-10**|**Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts**|Cicero Nogueira dos Santos et.al.|[2210.04726](http://arxiv.org/abs/2210.04726)|null| |**2022-07-15**|**Prompt Injection: Parameterization of Fixed Inputs**|Eunbi Choi et.al.|[2206.11349](http://arxiv.org/abs/2206.11349)|null|

(back to top)

## MultiModal Prompt Injection |Publish Date|Title|Authors|PDF|Code| |---|---|---|---|---| |**2026-06-22**|**When AUC 0.998 Is Not Enough: A Candidate Evaluation Protocol for Hidden-State Probes of Indirect Prompt Injection in Multimodal Computer-Use Agents**|Yanhang Li et.al.|[2606.22864](http://arxiv.org/abs/2606.22864)|null| |**2026-06-22**|**DE-FIVE: Detecting Malicious Image Prompts via Fourier Features and Image Vector Embeddings**|Xingwei Zhong et.al.|[2606.22779](http://arxiv.org/abs/2606.22779)|null| |**2026-06-16**|**MIRAGE: Stealthy Visual Prompt Injection for Vulnerability Detection in Web Agents**|Xuelong Dai et.al.|[2606.20717](http://arxiv.org/abs/2606.20717)|null| |**2026-06-13**|**Forced Deferral: Manipulating Routing Decisions in Multimodal LLM Cascades**|Zhongye Liu et.al.|[2606.15308](http://arxiv.org/abs/2606.15308)|null| |**2026-06-10**|**Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review**|Xinyu Zhao et.al.|[2606.12716](http://arxiv.org/abs/2606.12716)|null| |**2026-05-27**|**MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content**|Ruoqi Guo et.al.|[2605.28116](http://arxiv.org/abs/2605.28116)|null| |**2026-05-27**|**Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security**|Xiang Fang et.al.|[2605.27823](http://arxiv.org/abs/2605.27823)|null| |**2026-05-24**|**Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack**|Dongpeng Zhang et.al.|[2605.25194](http://arxiv.org/abs/2605.25194)|null| |**2026-05-22**|**MixFake: Benchmarking and Enhancing Audio Deepfake Detection in Diverse Real-world Mixed Audio**|Qingcao Li et.al.|[2605.23201](http://arxiv.org/abs/2605.23201)|null| |**2026-05-15**|**A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation**|Hao Yang et.al.|[2605.16090](http://arxiv.org/abs/2605.16090)|null| |**2026-06-10**|**ProjGuard: Safety Monitoring for Computer-Use Agents via Low-Dimensional Projections**|Kebin Contreras et.al.|[2605.13631](http://arxiv.org/abs/2605.13631)|null| |**2026-05-12**|**Agents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RAN**|Pranshav Gajjar et.al.|[2605.11516](http://arxiv.org/abs/2605.11516)|null| |**2026-05-05**|**Laundering AI Authority with Adversarial Examples**|Jie Zhang et.al.|[2605.04261](http://arxiv.org/abs/2605.04261)|null| |**2026-05-02**|**VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models**|Pang Liu et.al.|[2605.01449](http://arxiv.org/abs/2605.01449)|null| |**2026-04-28**|**SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents**|Mengyao Du et.al.|[2604.25562](http://arxiv.org/abs/2604.25562)|null| |**2026-04-28**|**One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations**|Ravikumar Balakrishnan et.al.|[2604.25102](http://arxiv.org/abs/2604.25102)|null| |**2026-04-21**|**If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems**|Jiamin Chang et.al.|[2604.19844](http://arxiv.org/abs/2604.19844)|null| |**2026-04-16**|**Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection**|Meng Chen et.al.|[2604.14604](http://arxiv.org/abs/2604.14604)|null| |**2026-04-15**|**Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models**|Ravikumar Balakrishnan et.al.|[2604.12371](http://arxiv.org/abs/2604.12371)|null| |**2026-04-14**|**WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents**|Yulin Chen et.al.|[2604.12284](http://arxiv.org/abs/2604.12284)|null| |**2026-04-10**|**Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection**|Zedian Shao et.al.|[2604.09024](http://arxiv.org/abs/2604.09024)|null| |**2026-04-06**|**SALLIE: Safeguarding Against Latent Language & Image Exploits**|Guy Azov et.al.|[2604.06247](http://arxiv.org/abs/2604.06247)|null| |**2026-03-31**|**Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks**|Chong Xiang et.al.|[2603.30016](http://arxiv.org/abs/2603.30016)|null| |**2026-03-31**|**Adversarial Prompt Injection Attack on Multimodal Large Language Models**|Meiwen Ding et.al.|[2603.29418](http://arxiv.org/abs/2603.29418)|null| |**2026-03-29**|**Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models**|Duanyi Yao et.al.|[2603.27522](http://arxiv.org/abs/2603.27522)|null| |**2026-03-18**|**Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation**|Haocheng Li et.al.|[2603.17705](http://arxiv.org/abs/2603.17705)|null| |**2026-03-18**|**Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare**|Saikat Maiti et.al.|[2603.17419](http://arxiv.org/abs/2603.17419)|null| |**2026-03-14**|**Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs**|Zijian Ling et.al.|[2603.13847](http://arxiv.org/abs/2603.13847)|null| |**2026-03-04**|**Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions**|Neha Nagaraja et.al.|[2603.03637](http://arxiv.org/abs/2603.03637)|null| |**2026-02-24**|**ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction**|Che Wang et.al.|[2602.20708](http://arxiv.org/abs/2602.20708)|null| |**2026-02-06**|**Extended to Reality: Prompt Injection in 3D Environments**|Zhuoheng Li et.al.|[2602.07104](http://arxiv.org/abs/2602.07104)|null| |**2026-02-05**|**Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection**|Takashi Koide et.al.|[2602.05484](http://arxiv.org/abs/2602.05484)|null| |**2026-01-24**|**Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems**|Narek Maloyan et.al.|[2601.17548](http://arxiv.org/abs/2601.17548)|null| |**2026-01-24**|**Physical Prompt Injection Attacks on Large Vision-Language Models**|Chen Ling et.al.|[2601.17383](http://arxiv.org/abs/2601.17383)|null| |**2026-01-12**|**SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations**|Mohammed Himayath Ali et.al.|[2601.07835](http://arxiv.org/abs/2601.07835)|null| |**2026-01-08**|**From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)**|Suyash Mishra et.al.|[2601.05059](http://arxiv.org/abs/2601.05059)|null| |**2025-12-17**|**Trust in LLM-controlled Robotics: a Survey of Security Threats, Defenses and Challenges**|Xinyu Huang et.al.|[2601.02377](http://arxiv.org/abs/2601.02377)|null| |**2025-12-29**|**Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks**|Toqeer Ali Syed et.al.|[2512.23557](http://arxiv.org/abs/2512.23557)|null| |**2025-12-26**|**Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs**|Jiayu Hu et.al.|[2512.21999](http://arxiv.org/abs/2512.21999)|null| |**2025-12-15**|**Cisco Integrated AI Security and Safety Framework Report**|Amy Chang et.al.|[2512.12921](http://arxiv.org/abs/2512.12921)|null| |**2025-12-05**|**ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior**|Weikai Lu et.al.|[2512.05745](http://arxiv.org/abs/2512.05745)|null| |**2025-12-04**|**Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems**|M Zeeshan et.al.|[2512.04895](http://arxiv.org/abs/2512.04895)|null| |**2025-12-04**|**ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications**|Eranga Bandara et.al.|[2512.04785](http://arxiv.org/abs/2512.04785)|null| |**2025-11-22**|**Building Browser Agents: Architecture, Security, and Practical Solutions**|Aram Vardanyan et.al.|[2511.19477](http://arxiv.org/abs/2511.19477)|null| |**2025-11-20**|**The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks**|Chunyang Li et.al.|[2511.16347](http://arxiv.org/abs/2511.16347)|null| |**2025-11-16**|**GRAPHTEXTACK: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs**|Jiaji Ma et.al.|[2511.12423](http://arxiv.org/abs/2511.12423)|null| |**2025-10-19**|**Black-box Optimization of LLM Outputs by Asking for Directions**|Jie Zhang et.al.|[2510.16794](http://arxiv.org/abs/2510.16794)|null| |**2025-10-15**|**Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems**|Karthik Avinash et.al.|[2510.13351](http://arxiv.org/abs/2510.13351)|null| |**2025-10-13**|**Countermind: A Multi-Layered Security Architecture for Large Language Models**|Dominik Schwarz et.al.|[2510.11837](http://arxiv.org/abs/2510.11837)|null| |**2025-10-10**|**Text Prompt Injection of Vision Language Models**|Ruizhe Zhu et.al.|[2510.09849](http://arxiv.org/abs/2510.09849)|null| |**2026-04-09**|**Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents**|Renhua Ding et.al.|[2510.07809](http://arxiv.org/abs/2510.07809)|null| |**2025-10-06**|**Imperceptible Jailbreaking against Large Language Models**|Kuofeng Gao et.al.|[2510.05025](http://arxiv.org/abs/2510.05025)|null| |**2025-10-05**|**AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents**|Yanjie Li et.al.|[2510.04257](http://arxiv.org/abs/2510.04257)|null| |**2025-10-01**|**WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents**|Yinuo Liu et.al.|[2510.01354](http://arxiv.org/abs/2510.01354)|null| |**2025-09-19**|**EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model**|Yiqing Yang et.al.|[2509.15775](http://arxiv.org/abs/2509.15775)|null| |**2025-09-06**|**EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System**|Pavan Reddy et.al.|[2509.10540](http://arxiv.org/abs/2509.10540)|null| |**2025-09-07**|**Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs**|Andrew Yeo et.al.|[2509.05883](http://arxiv.org/abs/2509.05883)|null| |**2025-07-30**|**Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding**|Chetan Pathade et.al.|[2507.22304](http://arxiv.org/abs/2507.22304)|null| |**2026-01-05**|**Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models**|Gabriel Downer et.al.|[2507.20704](http://arxiv.org/abs/2507.20704)|null| |**2025-08-14**|**Hierarchical Cross-modal Prompt Learning for Vision-Language Models**|Hao Zheng et.al.|[2507.14976](http://arxiv.org/abs/2507.14976)|null| |**2025-06-25**|**VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models**|Binesh Sadanandan et.al.|[2507.00052](http://arxiv.org/abs/2507.00052)|null| |**2025-06-10**|**Evaluation empirique de la sécurisation et de l'alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks**|Rafaël Nouailles et.al.|[2506.10029](http://arxiv.org/abs/2506.10029)|null| |**2025-08-24**|**Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs**|Hao Wang et.al.|[2505.15075](http://arxiv.org/abs/2505.15075)|null| |**2025-10-17**|**WebInject: Prompt Injection Attack to Web Agents**|Xilong Wang et.al.|[2505.11717](http://arxiv.org/abs/2505.11717)|null| |**2025-07-27**|**Manipulating Multimodal Agents via Cross-Modal Prompt Injection**|Le Wang et.al.|[2504.14348](http://arxiv.org/abs/2504.14348)|null| |**2025-04-14**|**Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding**|Tao Zhang et.al.|[2504.10465](http://arxiv.org/abs/2504.10465)|null| |**2025-11-05**|**Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models**|Hao Cheng et.al.|[2503.11519](http://arxiv.org/abs/2503.11519)|null| |**2026-05-12**|**LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM Interaction**|Junfeng Jiao et.al.|[2502.11242](http://arxiv.org/abs/2502.11242)|null| |**2026-04-14**|**Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety**|Xingjun Ma et.al.|[2502.05206](http://arxiv.org/abs/2502.05206)|null| |**2025-05-17**|**Adversarial Attacks of Vision Tasks in the Past 10 Years: A Survey**|Chiyu Zhang et.al.|[2410.23687](http://arxiv.org/abs/2410.23687)|null| |**2025-11-25**|**Jailbreaking and Mitigation of Vulnerabilities in Large Language Models**|Benji Peng et.al.|[2410.15236](http://arxiv.org/abs/2410.15236)|**[link](https://github.com/Glor1us/llm-jailbreak-vulnerability-analysis)**| |**2025-12-03**|**IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web**|Hongcheng Guo et.al.|[2409.18980](http://arxiv.org/abs/2409.18980)|**[link](https://github.com/HC-Guo/IWBench)**| |**2024-08-07**|**Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection**|Subaru Kimura et.al.|[2408.03554](http://arxiv.org/abs/2408.03554)|null| |**2024-09-09**|**A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems**|Wenxiao Zhang et.al.|[2408.03515](http://arxiv.org/abs/2408.03515)|**[link](https://github.com/MoeBuTa/LLMEyesim)**| |**2024-07-23**|**Prompt Injection Attacks on Large Language Models in Oncology**|Jan Clusmann et.al.|[2407.18981](http://arxiv.org/abs/2407.18981)|**[link](https://github.com/JanClusmann/Prompt_Injection_Attacks)**| |**2025-06-13**|**Self-interpreting Adversarial Images**|Tingwei Zhang et.al.|[2407.08970](http://arxiv.org/abs/2407.08970)|**[link](https://github.com/Tingwei-Zhang/Soft-Prompts-Go-Hard)**| |**2024-07-12**|**A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends**|Daizong Liu et.al.|[2407.07403](http://arxiv.org/abs/2407.07403)|null| |**2024-08-24**|**Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors**|Jiachen Sun et.al.|[2405.10529](http://arxiv.org/abs/2405.10529)|null|

(back to top)

标签:AI安全, Chat Copilot, DLL 劫持, Web报告查看器, 大语言模型, 学术追踪, 情报收集, 漏洞研究, 防御加固