ItsNishi/AI-Agent-Security

GitHub: ItsNishi/AI-Agent-Security

面向AI编程智能体生态的教育性安全研究合集，系统梳理攻击面与防御模式。

Stars: 3 | Forks: 0

# 🛡️ AI Agent 安全研究 [![Educational](https://img.shields.io/badge/Purpose-Educational%20%26%20Defensive-blue)]() [![AI Agents](https://img.shields.io/badge/Scope-AI%20Coding%20Agents-purple)]() [![No Live Exploits](https://img.shields.io/badge/Exploits-None%20Executable-green)]() ## 🔍 什么是这个仓库一个不断增长的研究、标注的攻击示例与防御策略集合，专注于 **AI 编码代理** 的安全性——Claude Code、Cursor、GitHub Copilot、Windsurf 以及所有由 LLM 驱动的开发工具。每一次攻击都有对应的防御。每个 Payload 都被标注、禁用（defanged）并具有教育意义。 ## 📝 研究笔记 | | 主题 | 你将学到的内容 | |---|------|----------------| | 🗂️ | [工具与框架索引](notes/00_Tools_And_Frameworks_Index.md) | 所有提及的工具、框架、基准与标准的快速参考 | | 💉 | [提示注入与技能注入](notes/01_Skill_Injection_Analysis.md) | 基础注入概念、代理攻击面、毒化技能拆解、供应链对比 | | 🧱 | [防御模式](notes/02_Defense_Patterns.md) | 净化、沙箱与缓解策略，含可运行代码 | | ⚙️ | [Claude Code 技能架构](notes/03_Claude_Code_Skill_Architecture.md) | Claude Code 的可扩展性（技能、钩子、MCP）如何创建攻击面 | | 👻 | [LLM 幻觉预防](notes/04_LLM_Hallucination_Prevention.md) | 为什么模型会编造、如何检测并阻止 | | 🌐 | [AI 编码语言性能](notes/05_AI_Coding_Language_Performance.md) | 多语言基准、令牌效率与语言引导攻击 | | 🔓 | [LLM 越狱深度研究](notes/06_LLM_Jailbreaking_Deep_Dive.md) | 完整分类学：DAN 到 GCG 到 Crescendo、防御与基准、代理影响 | | 🔍 | [技能扫描与检测生态](notes/07_Skill_Scanning_And_Detection_Landscape.md) | Cisco 技能扫描器、VirusTotal、ToxicSkills 审计、差距分析与下一步构建 | | 📋 | [AI GRC 与策略生态](notes/08_AI_GRC_And_Policy_Landscape.md) | NIST AI RMF、欧盟 AI 法案、ISO 42001、国家法律、代理治理、OWASP 代理十大 | | 🧠 | [AI 内存与损坏](notes/09_AI_Memory_And_Corruption.md) | 内存架构、RAG 中毒、MINJA、持久化风险、真实案例研究、防御 | | 📄 | [代理配置文件](notes/10_Agent_MD_Configuration_Files.md) | 跨工具指令文件攻击面：CLAUDE.md、AGENTS.md、Cursor、CoPilot、Unicode 混淆、加固建议 | | 🧠 | [聊天机器人与 AI 精神病症](notes/11_Chatbot_And_AI_Psychosis.md) | AI 诱导的精神病症、谄媚机制、记录死亡案例、二人妄想症、武器化、RAND 国家安全分析 | | 🦞 | [OpenClaw 与 ClawHub 安全](notes/12_OpenClaw_And_ClawHub_Security.md) | OpenClaw 架构、ClawHub 供应链、CVE-2026-25253、ClawHavoc 行动、AMOS 窃取器、内存中毒、42K 暴露实例 | | 🏪 | [AI 应用生态系统安全](notes/13_AI_Application_Ecosystem_Security.md) | GPT Store、MCP 工具中毒、LangChain、HuggingFace、AutoGPT、CrewAI、Devin、IDE 灾难、GlassWorm、OWASP 代理十大、MITRE ATLAS | | ⚔️ | [AI 黑客框架](notes/14_AI_Hacking_Frameworks.md) | XBOW、Shannon、Strix、PanetAGI、CAI、Reaper、Nebula、CHECKMATE、Garak、Promptfoo、PyRIT、基准与架构模式 | | 💩 | [胡说八道基准与 LLM 诚实](notes/15_Bullshit_Benchmark_And_LLM_Honesty.md) | BullshitBench、TruthfulQA、SimpleQA、谄媚基准、Bullshit Index、弃权、slopsquatting、RLHF 与安全张力 | | 🛡️ | [AI 蓝队与防御型 AI](notes/16_AI_Blue_Teaming_And_Defensive_AI.md) | AI SOC 代理、CrowdStrike Charlotte、Microsoft Security Copilot、恶意软件逆向、DARPA AIxCC、NIST AI 100-2、防守方优势分析 | | 🔤 | [Unicode 变体选择器攻击](notes/17_Unicode_Variation_Selector_Attacks.md) | 隐形越狱、防护绕过、Sneaky Bits 编码、GlassWorm 恶意软件、令牌膨胀 DoS、防御：显式剥离 | | 🪙 | [令牌优化与 LLM 效率](notes/18_Token_Optimization_And_LLM_Efficiency.md) | 上下文工程、提示结构、模型路由、缓存、批处理、代理循环优化、Claude Code 成本管理 | | 💸 | [基于令牌的攻击与资源滥用](notes/19_Token_Based_Attacks_And_Resource_Exploitation.md) | 拒绝钱包、示例耗尽、推理炸弹（ThinkTrap/ReasoningBomb）、上下文窗口中毒、令牌走私、令牌器安全、模型提取、LLMjacking | | 📊 | [LLM 生态与令牌经济学](notes/20_LLM_Landscape_Tokens_And_Pricing.md) | 以安全为框架的模型参考：令牌化攻击面、成本经济学用于威胁建模、模型选择用于安全工作、开放模型供应链风险、中文审查影响 | | 🤝 | [多智能体安全](notes/21_Multi_Agent_Security.md) | 智能体间攻击、A2A 协议伪造、82.4% 对等绕过、委托链注入、内存传染、跨智能体配置攻击、进攻性集群、防御模式 | ## 🧪 攻击与防御示例动手标注的场景——每个示例同时展示攻击 **与** 修复方法。 | | 技术 | 简述 | |---|------|------| | 🕵️ | [隐藏注释注入](examples/01_Hidden_Comment_Injection/) | HTML 注释在 Markdown 预览中不可见，但 LLM 会读取每一个词 | | 🌊 | [间接提示注入](examples/02_Indirect_Prompt_Injection/) | 毒化网页、API 响应或文件，代理会遵循 | | 📤 | [通过代理的数据泄露](examples/03_Data_Exfiltration_Via_Agent/) | 代理成为不知情的秘密、密钥与凭证搬运工 | | 📦 | [幻觉包注入](examples/04_Hallucinated_Package_Skill_Injection/) | LLM 编造包名，攻击者注册它——即时供应链攻击 | | 🔧 | [MCP 工具中毒](examples/05_MCP_Tool_Poisoning/) | 隐藏在工具描述中的恶意指令静默劫持代理行为 | | 💸 | [资源耗尽与拒绝钱包](examples/06_Resource_Exhaustion_And_Denial_Of_Wallet/) | 推理炸弹、上下文饱和与成本计算——在不崩溃服务的情况下耗尽 API 预算 | | 🔤 | [Unicode 隐形注入](examples/07_Unicode_Invisible_Injection/) | 变体选择器与标签块编码隐藏指令，能躲过差异审查与 Unicode 标准化 | ## 🗂️ 攻击分类学 ``` ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ AI Agent Attacks │ ├──────────────┬──────────────────┬───────────────────┬───────────────────────────────┤ │ 🎯 Injection │ 🔗 Supply Chain │ 📤 Exfiltration │ 🧠 Memory & Persistence │ │ │ │ │ │ │ Direct │ Trojan skills │ Secrets & keys │ RAG poisoning │ │ Indirect │ Hallucinated │ Source code │ Memory injection (MINJA) │ │ Hidden │ packages │ Environment │ Context window manipulation │ │ comments │ Poisoned docs │ variables │ Persistent backdoors │ │ MCP tool │ Rules file │ Credentials │ Config file persistence │ │ poisoning │ backdoor │ Agent tokens │ Instruction drift │ │ Language- │ Namespace │ Chat history │ SOUL.md/MEMORY.md poisoning │ │ steering │ squatting │ IDE telemetry │ │ │ Sampling │ GlassWorm │ │ │ │ injection │ extension worm │ │ │ ├──────────────┬──────────────────┴───────────────────┴───────────────────────────────┤ │ 💸 Resource │ 🏗️ Framework & Platform │ │ Attacks │ │ │ │ MCP server compromise (CVE-2025-6514) │ │ Denial of │ OpenClaw gateway exposure (42K+ instances) │ │ wallet │ GPT Store plugin OAuth flaws │ │ Sponge │ HuggingFace pickle deserialization │ │ examples │ IDE Chromium CVEs (94+ in Cursor/Windsurf) │ │ Reasoning │ ClawHub malicious skills (1184+) │ │ exhaustion ├─────────────────────────────────────────────────────────────────────┤ │ Model routing│ 🛡️ Bypass & Escalation │ │ manipulation│ │ │ Token │ Sandbox escape (numpy allowlist) │ │ smuggling │ Cross-agent privilege escalation │ │ LLMjacking │ Tool confusion / confused deputy │ │ │ Rug pull / bait-and-switch │ │ │ IDEsaster (30+ CVEs across AI IDEs) │ │ │ Agent-to-agent prompt injection │ └──────────────┴─────────────────────────────────────────────────────────────────────┘ ``` ## 🔧 安全技能套件基于 Claude Code 的技能 + 钩子架构构建的实际防御工具。将上述研究转化为可操作的检测。 ### 从 ClawHub 安装最快的安装方式——每个链接都指向对应的 ClawHub 列表项： | 技能 | ClawHub | 功能说明 | |------|---------|----------| | **vet-repo** | [clawhub.ai/ItsNishi/vet-repo](https://clawhub.ai/ItsNishi/vet-repo) | 扫描 `.claude/`、`.mcp.json`、`CLAUDE.md`、VS Code/Cursor 配置，检测钩子滥用、注入、MCP 中毒 | | **scan-skill** | [clawhub.ai/ItsNishi/scan-skill](https://clawhub.ai/ItsNishi/scan-skill) | 安装前深度分析单个技能——前端元数据、HTML 注释、持久化触发器、支撑脚本 | | **audit-code** | [clawhub.ai/ItsNishi/audit-code](https://clawhub.ai/ItsNishi/audit-code) | 代码安全审查——硬编码密钥、危险调用、SQL 注入、`.env` 文件、文件权限 | ### 从源码安装如果你更喜欢从本仓库手动安装： ``` # 克隆仓库 git clone git@github.com:ItsNishi/AI-Agent-Security.git # 将所需技能复制到您的项目或个人技能目录中 # 项目级（限定于一个仓库）： cp -r AI-Agent-Security/.claude/skills/vet-repo /path/to/your/project/.claude/skills/ cp -r AI-Agent-Security/.claude/skills/scan-skill /path/to/your/project/.claude/skills/ cp -r AI-Agent-Security/.claude/skills/audit-code /path/to/your/project/.claude/skills/ # 个人级（在所有项目中可用）： cp -r AI-Agent-Security/.claude/skills/vet-repo ~/.claude/skills/ cp -r AI-Agent-Security/.claude/skills/scan-skill ~/.claude/skills/ cp -r AI-Agent-Security/.claude/skills/audit-code ~/.claude/skills/ ``` ### 使用方法安装后，在任意 Claude Code 会话中调用： ``` /vet-repo # Scan current repo's agent configs /scan-skill # Analyze a skill before installing it /audit-code [path] # Security review of project code (defaults to project root) ``` ### 先决条件 - **Python 3.10+** —— 扫描脚本仅使用标准库，无第三方依赖 - **Claude Code** —— 技能通过 `/skill-name` 在 Claude Code 会话中调用 ### 钩子 `.claude/settings.json` 中的建议 `PreTool` 钩子会发出警告（而非阻止）以下行为： - **Bash**：管道到 Shell、`rm -rf /`、`chmod 777`、带变量的 eval、base64 到执行 - **Write**：写入 `~/.ssh/`、`~/.aws/`、`.claude/settings.json`、Shell 配置文件要安装钩子，将 `.claude/settings.json` 复制到项目的 `.claude/` 目录。 ### 共享模式数据库 151 个检测模式，覆盖 15 个类别。每个技能都自带 `patterns.py` 的副本，因此可独立工作： ``` skill_injection | hook_abuse | mcp_config | secrets | dangerous_calls exfiltration | encoding_obfuscation | instruction_override | supply_chain | file_permissions code_before_review | config_backdoor | memory_corruption | confused_delegation | persistence ``` 所有模式均来源于本仓库的研究笔记与示例。 ## 📁 项目结构 ``` AI-Agent-Security/ ├── 📄 README.md ├── 📝 notes/ # Research writeups and analysis ├── 🧪 examples/ # Annotated attack/defense pairs └── 🔧 .claude/ ├── settings.json # Hook configurations └── skills/ ├── vet-repo/ # Repository agent config scanner │ ├── SKILL.md │ └── scripts/ │ ├── patterns.py # Pattern database │ └── vet_repo.py ├── scan-skill/ # Individual skill analyzer │ ├── SKILL.md │ └── scripts/ │ ├── patterns.py # Pattern database │ └── scan_skill.py └── audit-code/ # Code security auditor ├── SKILL.md └── scripts/ ├── patterns.py # Pattern database └── audit_code.py ``` ## ⚠️ 免责声明本研究仅用于 **教育与防御目的**。所有示例使用禁用 URL（`hxxps://`、`[.]`）、标注的 Payload（`[MALICIOUS]`）和非可执行演示。每个攻击技术都包含相应的防御措施。 ## 📜 许可证 MIT

标签：C2, 防御加固