Robin-WZQ/Awesome-Backdoor-on-LMMs

GitHub: Robin-WZQ/Awesome-Backdoor-on-LMMs

系统收集整理了大型多模态模型（包括视觉语言预训练模型、文生图扩散模型、大型视觉语言模型和具身智能体）领域后门攻击与防御的最新研究论文与代码资源。

Stars: 133 | Forks: 0

🤗 Awesome-Backdoor-on-LMMs 🤗

关于大型多模态模型（LMMs）的后门攻击与防御的精选列表，与我们的工作保持一致：
大型多模odal模型上的后门攻击与防御：一项综述

欢迎提供任何关于后门的补充内容、PR 或 issue。如有任何问题，请联系 wangzhongqi23s@ict.ac.cn。如果您发现此仓库对您的研究或工作有帮助，非常欢迎您 star 本仓库并引用我们的论文 [此处](#Reference)。 :sparkles: ## 📜 目录 - [视觉语言预训练模型](@LINK_URL_1/>) - [文本条件扩散模型](@LINK_URL_2/>) - [大型视觉语言模型 (LVLMs)](#Large-Vision-Language-Models) - [基于 VLM 的具身智能 (Embodied AI)](#VLM-based-Embodied-AI) ## 👑 优秀论文 ### 视觉语言预训练模型 #### 后门攻击 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ----------- | ----- | :---: | :------: | :------: | | 2021.06 | POISONING AND BACKDOORING CONTRASTIVE LEARNING | arXiv | [链接](https://arxiv.org/abs/2106.09667v2) | - | | 2021.07 | BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning | SSP'22 | [链接](https://ieeexplore.ieee.org/document/9833644) | [代码](https://github.com/jinyuan-jia/BadEncoder) | | 2022.09 | Data Poisoning Attacks Against Multimodal Encoders | ICML'23 | [链接](https://proceedings.mlr.press/v202/yang23f.html) | [代码](https://github.com/zqypku/mm_poison/) | | 2023.10 | GhostEncoder: Stealthy backdoor attacks with dynamic triggers to pre-trained encoders in self-supervised learning | CS'24 | [链接](https://www.sciencedirect.com/science/article/abs/pii/S0167404824001561) | - | | 2023.11 | BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning | CVPR'24 | [链接](https://openaccess.thecvf.com/content/CVPR2024/html/Liang_BadCLIP_Dual-Embedding_Guided_Backdoor_Attack_on_Multimodal_Contrastive_Learning_CVPR_2024_paper.html) | [代码](https://github.com/LiangSiyuan21/BadCLIP) | | 2024.05 | Distribution Preserving Backdoor Attack in Self-supervised Learning | SSP'24 |[链接](https://ieeexplore.ieee.org/abstract/document/10646825) |-| | 2024.08 | BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning | MICCAI'24 | [链接](https://papers.miccai.org/miccai-2024/094-Paper3117.html) | [代码](https://github.com/asif-hanif/baple) | | 2025.03 | MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning | ICML'25 | [链接](https://openreview.net/forum?id=zy7VeNtSLM) | - | | 2025.03 | Backdooring CLIP through Concept Confusion | arXiv |[链接](https://arxiv.org/abs/2503.09095) | - | | 2025.10 | Invisible Backdoor Attack against Self-supervised Learning | CVPR'25 | [链接](https://openaccess.thecvf.com/content/CVPR2025/papers/Zhang_Invisible_Backdoor_Attack_against_Self-supervised_Learning_CVPR_2025_paper.pdf) | [代码](https://github.com/Zhang-Henry/INACTIVE) | | 2025.11 | Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing | CIKM'25 | [链接](https://dl.acm.org/doi/10.1145/3746252.3761408) | [代码](https://github.com/donglgcn/Editing/) | | 2025.11 | ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training | NeurIPS'25 | [链接](https://arxiv.org/pdf/2511.00446) | [代码](https://github.com/xinyaocse/ToxicTextCLIP/) | | 2026.01 |Backdoor Attacks on Multi-modal Contrastive Learning |arXiv| [链接](https://www.arxiv.org/abs/2601.11006) | - | | 2026.01 | Stealthy Backdoor Carriers: The Threat of Visual Prompts to CLIP | IOTJ | [链接](https://ieeexplore.ieee.org/abstract/document/11328089) | [代码](https://github.com/Maozhen-Zhang/sbc) | | 2026.02 | BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning | arXiv | [链接](https://arxiv.org/pdf/2602.17168) | - | | 2026.03 | Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks against Pretrained Models | AAAI'26 | [链接](https://ojs.aaai.org/index.php/AAAI/article/view/39480) | [代码](https://github.com/Blury233/FinetuningBackdoor/) | #### 后门防御 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ---- | ----- | :---: | :------: | :------: | | 2023.02 |ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms |USENIX'23| [链接](https://www.usenix.org/conference/usenixsecurity23/presentation/pan) | [代码](https://github.com/ruoxi-jia-group/ASSET) | | 2023.03 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning | ICCV'23 | [链接](https://openaccess.thecvf.com/content/ICCV2023/html/Bansal_CleanCLIP_Mitigating_Data_Poisoning_Attacks_in_Multimodal_Contrastive_Learning_ICCV_2023_paper.html) | [代码](https://github.com/nishadsinghi/CleanCLIP) | | 2023.03 | Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks | NeurIPS'23 | [链接](https://proceedings.neurips.cc/paper_files/paper/2023/hash/2232e8fee69b150005ac420bfa83d705-Abstract-Conference.html) | - | | 2023.03 | Detecting Backdoors in Pre-trained Encoders | CVPR'23 | [链接](https://openaccess.thecvf.com/content/CVPR2023/html/Feng_Detecting_Backdoors_in_Pre-Trained_Encoders_CVPR_2023_paper.html) | [代码](https://github.com/GiantSeaweed/DECREE) | | 2023.10 | Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks | ICML'24 | [链接](https://arxiv.org/abs/2310.05862) | [代码](https://github.com/BigML-CS-UCLA/SafeCLIP)| | 2024.03 | Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning | CVPRW'24 | [链接](https://arxiv.org/abs/2403.16257) | - | | 2024.09 | Adversarial Backdoor Defense in CLIP | arXiv | [链接](https://arxiv.org/abs/2409.15968) | - | | 2024.09 | CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor| arXiv | [链接](https://arxiv.org/abs/2409.17601) | - | |2024.11 |Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment| CVPR'24| [链接](https://openaccess.thecvf.com/content/CVPR2024/html/Ishmam_Semantic_Shield_Defending_Vision-Language_Models_Against_Backdooring_and_Poisoning_via_CVPR_2024_paper.html) | [代码](https://github.com/IshmamAlvi/Semantic-Shield) | |2024.11 | DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders| CVPR'25| [链接](https://openaccess.thecvf.com/content/CVPR2025/html/Hou_DeDe_Detecting_Backdoor_Samples_for_SSL_Encoders_via_Decoders_CVPR_2025_paper.html) | [代码](https://github.com/tardisblue9/DeDe)| |2024.12 |Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning| arXiv| [链接](https://arxiv.org/abs/2412.20392)| - | |2024.12 | DETECTING BACKDOOR SAMPLES IN CONTRASTIVE LANGUAGE IMAGE PRETRAINING| ICLR'25|[链接](https://iclr.cc/virtual/2025/poster/30032) | [代码](https://github.com/HanxunH/Detect-CLIP-Backdoor-Samples) | |2024.12 |Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP|arXiv| [链接](https://arxiv.org/abs/2412.00727) | [代码](https://github.com/nmndeep/PerturbAndRecover) | | 2025.02 | A Closer Look at Backdoor Attacks on CLIP | ICML'25 | [链接](https://personal.ntu.edu.sg/boan/papers/ICML25_CLIP.pdf) | - | | 2025.02 |Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP|arXiv| [链接](https://arxiv.org/abs/2502.19269) |-| | 2025.12 |Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models|arXiv| [链接](https://www.arxiv.org/abs/2512.00343) |[代码](https://github.com/Robin-WZQ/AMDET)| | 2026.01 |Robust defense strategies for multimodal contrastive learning: efficient fine-tuning against backdoor attacks | Multimedia Tools and Applications | [链接](https://link.springer.com/article/10.1007/s11042-026-21339-x) | - | | 2026.02 |InverTune: A Backdoor Defense Method for Multimodal Contrastive Learning via Backdoor-Adversarial Correlation Analysis | NDSS'26 | [链接](https://www.ndss-symposium.org/wp-content/uploads/2026-f1666-paper.pdf) | - | | 2026.03 | DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks | AAAI'26 | [链接](https://ojs.aaai.org/index.php/AAAI/article/view/37141) | - | | 2026.03 | BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder | arXiv | [链接](https://arxiv.org/pdf/2603.11664) | [代码](https://github.com/siquanhuang/BackdoorIDS) | ### 文本条件扩散模型 #### 后门攻击 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ---- | ----- | :---: | :------: | :------: | | 2022.11 | Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis | ICCV'23 | [链接](https://arxiv.org/abs/2211.02408) | [代码](https://github.com/LukasStruppek/Rickrolling-the-Artist) | | 2023.05 | Personalization as a Shortcut for Few-Shot Backdoor Attack against Text-to-Image Diffusion Models | AAAI'24 | [链接](https://arxiv.org/abs/2305.10701) | [代码](https://github.com/Huang-yihao/Personalization-based_backdoor) | | 2023.05 | Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning | ACM MM'23 | [链接](https://arxiv.org/abs/2305.04175) | [代码](https://github.com/zhaisf/BadT2I) | | 2023.06 | VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models | NeurIPS'23 | [链接](https://arxiv.org/abs/2306.06874) | [代码](https://github.com/IBM/VillanDiffusion) | | 2023.07 | BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models | TIFS'24 | [链接](https://arxiv.org/abs/2307.16489) | [代码](https://github.com/JJ-Vice/BAGM) | | 2023.08 | Backdooring Textual Inversion for Concept Censorship | arXiv | [链接](https://arxiv.org/abs/2308.10718) | [代码](https://github.com/concept-censorship/concept-censorship.github.io) | | 2023.10 | Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models | S&P'24 | [链接](https://arxiv.org/abs/2310.13828) | [代码](https://github.com/Shawn-Shan/nightshade-release) | | 2024.01 | The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning Pipeline | ICML'24 | [链接](https://arxiv.org/abs/2401.04136) | [代码](https://github.com/haonan3/SilentBadDiffusion) | | 2024.06 | Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors | arXiv | [链接](https://arxiv.org/abs/2406.15213) | - | | 2024.07 | Control ControlNet: Multidimensional Backdoor Attack Based on ControlNet | ICONIP'24 | [链接](https://easychair.org/publications/preprint/L1B4) | [代码](https://github.com/paoche11/ControlNetBackdoor) | | 2024.10 | EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second | ACM MM'24 | [链接](https://dl.acm.org/doi/10.1145/3664647.3680689) | [代码](https://github.com/haowang-cqu/EvilEdit) | | 2024.11 | Combinational Backdoor Attack against Customized Text-to-Image Models | arXiv | [链接](https://arxiv.org/abs/2411.12389) | - | | 2024.11 | TrojanEdit: Backdooring Text-Based Image Editing Models | arXiv | [链接](https://arxiv.org/abs/2411.14681) | - | | 2025.02 | Imperceptible Backdoor Attacks on Text-Guided 3D Scene Grounding | TMM'25 | [链接](https://dblp.org/rec/journals/tmm/LiuH25) | - | | 2025.03 | Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model | arXiv | [链接](https://arxiv.org/abs/2503.17724) | [代码](https://github.com/Robin-WZQ/IBA) | | 2025.03 | Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models | CVPR'25 | [链接](https://arxiv.org/abs/2503.09669) | [代码](https://github.com/agwmon/silent-branding-attack) | | 2025.04 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | ICCV'25 | [链接](https://arxiv.org/abs/2504.16907) | [代码](_URL_89/>) | | 2025.04 | Erased but Not Forgotten: How Backdoors Compromise Concept Erasure | arXiv | [链接](https://arxiv.org/abs/2504.21072) | [代码](https://github.com/jonasgrebe/erased-but-not-forgotten) | | 2025.04 | REDEditing: Relationship-Driven Precise Backdoor Poisoning on Text-to-Image Diffusion Models | arXiv | [链接](https://arxiv.org/abs/2504.14554) | - | | 2025.06 | TWIST: Text-encoder Weight-editing for Inserting Secret Trojans in Text-to-Image Models | ACL'25 | [链接](https://aclanthology.org/2025.acl-long.541/) | - | | 2025.08 | Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models | arXiv | [链接](https://arxiv.org/abs/2508.01605) | - | | 2025.08 | BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models | arXiv | [链接](https://arxiv.org/abs/2508.03221) | - | | 2026.01 | Key-Value Mapping-Based Text-to-Image Diffusion Model Backdoor Attacks | Algorithms | [链接](https://www.mdpi.com/1999-4893/19/1/74) | [代码](https://github.com/wenkfjsf/key_to_value) | | 2026.02 | Bad-PoseDiff: Pose-Guided Backdoor Triggering in Diffusion Models | TrustCom'25 | [链接](https://ieeexplore.ieee.org/abstract/document/11354876) | - | | 2026.02 | Semantic-level Backdoor Attack against Text-to-Image Diffusion Models | arXiv'26 | [链接](https://arxiv.org/pdf/2602.04898) | - | | 2026.02 | When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks | arXiv'26 | [链接](https://arxiv.org/pdf/2602.20193) | - | | 2026.02 | When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters | CVPR'26 | [链接](https://arxiv.org/pdf/2602.21977) | [代码](https://github.com/spectre-init/MasqLora) | | 2026.03 | Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models | ICLRW'26 | [链接](https://arxiv.org/pdf/2603.04064) | - | #### 后门防御 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ---- | ----- | :---: | :------: | :------: | | 2024.04 | UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models | AAAI'25 | [链接](https://arxiv.org/abs/2404.01101) | [代码](https://github.com/GuanZihan/official_UFID) | | 2024.07 | T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models | ECCV'24 | [链接](https://arxiv.org/abs/2407.04215) | [代码](https://github.com/Robin-WZQ/T2IShield) | | 2024.08 | Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks | ECCVW'24 | [链接](https://arxiv.org/abs/2408.15721) | [代码](https://github.com/oscarchew/t2i-backdoor-defense) | | 2024.11 | Fine-grained Prompt Screening: Defending Against Backdoor Attack on Text-to-Image Diffusion Models | IJCAI'25 | [链接](https://www.ijcai.org/proceedings/2025/0068.pdf) | - | | 2025.01 | Backdoor Defense for Text Encoders in Text-to-Image Generative Models | IEEE TDSC'25 | [链接](https://www.computer.org/csdl/journal/tq/2025/06/11112743/28UtsBBMwZa) | [代码](https://github.com/Wu-sm/Defense-against-backdoor-attacks-in-text-to-image) | | 2025.02 | BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model | NeurIPS'25 | [链接](https://arxiv.org/abs/2502.11798) | [代码](https://github.com/linweiii/BackdoorDM) | | 2025.03 | Efficient Input-level Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation | arXiv | [链接](https://arxiv.org/abs/2503.06453) | - | | 2025.04 | Backdoor Defense in Diffusion Models via Spatial Attention Unlearning | arXiv | [链接](https://arxiv.org/abs/2504.18563) | - | | 2025.04 | Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models | TPAMI'25 | [链接](https://arxiv.org/abs/2504.20518) | [代码](https://github.com/Robin-WZQ/DAA) | | 2026.01 | On the Fairness, Diversity and Reliability of Text-to-Image Generative Models | Artificial Intelligence Review | [链接](https://link.springer.com/article/10.1007/s10462-025-11424-2) | [代码](https://github.com/JJ-Vice/T2I_Fairness_Diversity_Reliability) | | 2026.02 | Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency | arXiv | [链接](https://arxiv.org/pdf/2602.01765) | - | | 2026.03 | BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation | CVPR'26 | [链接](https://arxiv.org/html/2603.05921v1) | [代码](https://github.com/Ferry-Li/BlackMirror) | ### 大型视觉语言模型 #### 后门攻击 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :--------: | :----------------------------------------------------------: | :------------------------------------------------------: | | 2024.02 | Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models | NeurIPS'24 | [链接](https://openreview.net/forum?id=JhqyeppMiD) | [代码](https://vlm-poison.github.io/) | | 2024.02 | VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models | IJCV'25 | [链接](https://link.springer.com/article/10.1007/s11263-025-02368-9) | [代码](https://github.com/JWLiang007/VL-Trojan) | | 2024.02 | Test-Time BACKDOOR ATTACKS ON MULTIMODAL LARGE LANGUAGE MODELS | arXiv | [链接](https://github.com/sail-sg/AnyDoor) | [代码](https://github.com/sail-sg/AnyDoor) | | 2024.03 | ImgTrojan: Jailbreaking Vision-Language Models with ONE Image | NAACL'25 | [链接](https://aclanthology.org/2025.naacl-long.360/) | [代码](https://github.com/xijia-tao/ImgTrojan) | | 2024.03 | TrojVLM: Backdoor Attack Against Vision Language Models | ECCV'24 | [链接](https://link.springer.com/chapter/10.1007/978-3-031-73650-6_27) | - | | 2024.04 | PHYSICAL BACKDOOR ATTACK CAN JEOPARDIZE DRIVING WITH VISION-LARGE-LANGUAGE MODELS | ICMLW'24 | [链接](https://icml.cc/virtual/2024/38112) | - | | 2024.06 | Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift | CVPR'25 | [链接](https://www.computer.org/csdl/proceedings-article/cvpr/2025/436400j477/2999SblAZlC) | [代码](https://github.com/LiangSiyuan21/MABA) | | 2024.10 | BACKDOORING VISION-LANGUAGE MODELS WITH OUT-OF-DISTRIBUTION DATA | ICLR'25 | [链接](https://openreview.net/forum?id=tZozeR3VV7) | - | | 2025.02 | Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models | CVPR'25 | [链接](https://ieeexplore.ieee.org/document/11092582) | [代码](https://github.com/6zHAOyi/BadVision) | | 2025.03 | BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models | CVPR'25 | [链接](https://www.computer.org/csdl/proceedings-article/cvpr/2025/436400ae927/299buyfnYBy) | - | | 2025.05 | Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving | arXiv | [链接](https://arxiv.org/abs/2505.06413) | - | | 2025.06 | Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation | arXiv | [链接](https://arxiv.org/abs/2506.07214) | | | 2025.07 | Shadow-Activated Backdoor Attacks on Multimodal Large Language Models | ACL'25 | [链接](https://aclanthology.org/2025.findings-acl.248/) | [代码](https://github.com/ericyinyzy/BadMLLM) | | 2025.08 | IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding | arXiv | [链接](https://arxiv.org/abs/2508.09456) | - | | 2025.09 | TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models | arXiv | [链接](https://arxiv.org/abs/2509.24566) | [代码](https://anonymous.4open.science/r/tokenswap-341F) | | 2025.11 | MTAttack: Multi-Target Backdoor Attacks against Large Vision-LanguageModels | arXiv | [链接](https://arxiv.org/abs/2511.10098) | [代码](https://github.com/mala-lab/MTAttack) | | 2025.11 | BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models | arXiv | [链接](https://arxiv.org/abs/2511.18921) | [代码](https://github.com/bin015/BackdoorVLM)| #### 后门防御 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :--------: | :------------------------------------------------: | :----------------------------------------: | | 2025.05 | Backdoor Cleaning without External Guidance in MLLM Fine-tuning | NeurIPS'25 | [链接](https://openreview.net/forum?id=os4QYDf3Ms) | [代码](https://github.com/XuankunRong/BYE) | | 2025.06 | ROBUST ANTI-BACKDOOR INSTRUCTION TUNING IN LVLMS | arXiv | [链接](https://arxiv.org/abs/2506.05401) | - | | 2025.06 | SRD: Reinforcement-Learned Semantic Perturbation | AAAI'26 | [链接](https://www.arxiv.org/abs/2506.04743) | [代码](https://github.com/Ciconey/SRD) | | 2026.01 | From Internal Diagnosis to External Auditing: A VLM-Driven Paradigm for Online Test-Time Backdoor Defense | arXiv | [链接](https://arxiv.org/pdf/2601.19448) | - | | 2026.01 | TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning | arXiv | [链接](https://arxiv.org/pdf/2601.21692) | [代码](https://github.com/m1ng2u/TCAP) | | 2026.03 | Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model | AAAI'26 | [链接](https://ojs.aaai.org/index.php/AAAI/article/view/40891) |- | | 2026.03 | PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models | AAAI'26 | [链接](https://ojs.aaai.org/index.php/AAAI/article/view/40867) | - | | 2026.03 | Test-Time Attention Purification for Backdoored Large Vision Language Models | arXiv | [链接](https://arxiv.org/pdf/2603.12989) | - | | 2026.03 | Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models | arXiv | [链接](https://arxiv.org/pdf/2602.22246) | [代码](https://arxiv.org/pdf/2602.22246) | ### 基于 VLM 的具身智能 (Embodied AI) #### 后门攻击 - VLA | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :------: | :----------------------------------------------------------: | :-------------------------------------------------: | | 2025.05 | BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization | arXiv | [链接](https://arxiv.org/abs/2505.16640) | [代码](https://github.com/Zxy-MLlab/BadVLA) | | 2025.10 | TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models | arXiv | [链接](https://arxiv.org/abs/2510.10932) | [代码](https://github.com/megaknight114/TabVLA) | | 2025.11 | AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models | arXiv | [链接](https://arxiv.org/abs/2511.12149) | - | | 2026.01 | State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space | arXiv | [链接](https://www.arxiv.org/abs/2601.04266) | - | | 2026.02 | Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning | arXiv |[链接](https://arxiv.org/pdf/2602.00500) | [代码](https://jianyi2004.github.io/infuse-vla-backdoor/) | - GUI | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :------: | :----------------------------------------------------------: | :-------------------------------------------------: | | 2025.05 | Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents | EMNLP’25 | [链接](https://aclanthology.org/2025.findings-emnlp.411/) | [代码](https://github.com/CTZhou-byte/AgentGhost) | | 2025.06 | Poison Once, Control Anywhere: Clean-Text Visual Backdoors in VLM-based Mobile Agents | arXiv | [链接](https://arxiv.org/abs/2506.13205) | - | | 2025.07 | VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation | COLM‘25 | [链接](https://openreview.net/forum?id=7HPuAkgdVm#discussion) | [代码](https://github.com/whi497/VisualTrap) | | 2025.09 | Realistic Environmental Injection Attacks on GUI Agents | arXiv | [链接](https://arxiv.org/abs/2509.11250) | [代码](https://github.com/zhangyitonggg/attack2gui) | | 2026.03 | SlowBA: An efficiency backdoor attack towards VLM-based GUI agents | arXiv | [链接](https://arxiv.org/pdf/2603.08316) | [代码](https://github.com/tu-tuing/SlowBA) | #### 后门防御 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :------: | :----------------------------------------------------------: | :-------------------------------------------------: | | 2026.02 | When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens | ICRA'26 | [链接](https://arxiv.org/abs/2602.03153) | - | ### 其他 | 时间 | 标题 | 出版物 | 论文 | 代码 | | ------- | ------------------------------------------------------------ | :------: | :----------------------------------------------------------: | :-------------------------------------------------: | | 2026.03 | Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models | arXiv | [链接](https://arxiv.org/pdf/2602.22246) | [代码](https://github.com/bigglesworthnotacat/Diffusion_Self_Purification) | ## 其他相关的优秀仓库 - [Awesome Data Poisoning and Backdoor Attacks](https://github.com/penghui-yang/awesome-data-poisoning-and-backdoor-attacks) - [Awesome-Backdoor-in-Deep-Learning](https://github.com/zihao-ai/Awesome-Backdoor-in-Deep-Learning) - [Backdoor Learning Resources](https://github.com/THUYimingLi/backdoor-learning-resources) - [Awesome-LVLM-Attack](https://github.com/liudaizong/Awesome-LVLM-Attack/tree/main) - [Awesome-Large-Model-Safety](https://github.com/xingjunm/Awesome-Large-Model-Safety) ## 🥳 参考文献如果您发现此仓库对您的研究有帮助，如果您能引用我们的论文，我们将不胜感激。 :sparkles: ``` @article{Wang_2025, title={Backdoor Attacks and Defenses on Large Multimodal Models: A Survey}, DOI={10.36227/techrxiv.176618816.64264497/v1}, publisher={Institute of Electrical and Electronics Engineers (IEEE)}, author={Wang, Zhongqi and Zhang, Jie and Bao, Kexin and Liang, Yifei and Shan, Shiguang and Chen, Xilin}, year={2025}, month=dec } @article{wang2025amdet, title={Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models}, author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen}, journal={arXiv preprint arXiv:2512.00343}, year={2025}, } @article{wang2025dynamicattentionanalysisbackdoor, title={Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models}, author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, year={2025}, } @article{zhang2025twt, title={Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models}, author={Jie Zhang and Zhongqi Wang and Shiguang Shan and Xilin Chen}, journal={arXiv preprint arXiv:2503.17724}, year={2025}, } @InProceedings{10.1007/978-3-031-73013-9_7, author="Wang, Zhongqi and Zhang, Jie and Shan, Shiguang and Chen, Xilin", title="T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models", booktitle="Computer Vision -- ECCV 2024", year="2025", publisher="Springer Nature Switzerland", address="Cham", pages="107--124", isbn="978-3-031-73013-9" } ```

标签：AI安全, AI治理, Chat Copilot, CISA项目, Embodied AI, Linux系统监控, LMM, TDM, VLM, VLP, 人工智能, 具身智能, 后门攻击, 后门防御, 多模态大模型, 大模型安全, 文本条件扩散模型, 文献列表, 模型安全, 用户模式Hook绕过, 网络安全, 视觉语言模型, 视觉语言预训练模型, 论文集, 隐私保护