aparnaa19/AgentForensics

GitHub: aparnaa19/AgentForensics

针对 LLM Agent 的间接提示注入攻击，提供实时五阶段检测、攻击指纹识别和取证分析的轻量级安全框架。

Stars: 12 | Forks: 1

# AgentForensics **针对 LLM agents 的实时提示注入检测与取证分析。** [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/b3ce6eadf2120741.svg)](https://github.com/aparnaa19/agentforensics/actions) [![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE) [![ARPIbench](https://img.shields.io/badge/ARPIbench-100%25%20detection-brightgreen)](benchtest_results.json) AgentForensics 是一个开源安全框架，可实时监控 LLM agent 会话，检测来自多个源的提示注入攻击，对攻击活动进行指纹识别，并在仪表盘中展示取证证据——所有这些都**无需更改你现有的 agent 代码**。 📖 **[完整用户指南](USERGUIDE.md)** - 安装说明、所有用例、仪表盘演示及故障排除 ## 为什么选择 AgentForensics？随着 LLM agents 越来越多地阅读电子邮件、浏览网站、处理文档和执行代码，它们暴露在一个全新的攻击面上：**间接提示注入**——即隐藏在外部内容中的恶意指令，这些指令会劫持 agent 的行为。现有工具主要侧重于在用户层面进行输入过滤。而 AgentForensics 监控的是**整个 agent 会话**，包括工具输出、检索到的文档和 API 响应——而这些正是真正容易发生攻击的地方。 ## 基准测试结果针对 **ARPIbench** ([alexcbecker/ARPIbench](https://huggingface.co/datasets/alexcbecker/ARPIbench)) 进行了评估，这是一个包含网络、电子邮件和文档场景下真实间接提示注入载荷的学术基准： | 指标 | 结果 | |--------|--------| | 检测率 | **100%** (2000/2000 载荷) | | 误报率 | **0%** (7 个良性样本) | | 覆盖的攻击类型 | naive、completion、ignore、urgent_request、helpful_assistant、multi-turn | | 场景 | web、email、本地文档 | ``` python benchtest.py --ml # reproduce results ``` ## 功能特性 - **五阶段检测管道** - 启发式分析、ML 分类器、指令边界分析、语义漂移以及滑动窗口多轮检测 - **实时取证仪表盘** - 会话时间线、每轮证据、实时 SSE 警报 - **攻击指纹识别** - 将跨会话的重复注入模式聚类为命名的攻击活动 - **误报管理** - 一次标记安全内容，永久不再提示 - **VS Code 扩展** - 内联波浪线提示、问题面板、打开/粘贴时自动扫描 - **Claude Desktop MCP** - 直接插入 Claude Desktop，零配置 - **代码包装器** - 只需一次导入即可用于 OpenAI / Anthropic SDK 应用 - **LangChain 和 AutoGen** - 原生 callback/tracer 集成 ## 工作原理 ``` flowchart TD A([External Content\nweb page · email · document · tool result]) --> B B[Stage 1 - Heuristics · less than 1ms\n8 regex rule groups · H01 to H08] B --> C[Stage 2 - ML Classifier · ~50ms\nFine-tuned DistilBERT on injection patterns] C --> D[Stage 3 - Instruction Boundary · less than 1ms\n10 boundary pattern groups · IB01 to IB10] D --> E[Stage 4 - Semantic Drift · ~30ms\nLLM response topic vs original query] E --> F[Stage 5 - Sliding Window · less than 1ms\nScore aggregation across last N turns] F --> G{Verdict} G -->|score below 0.25| H([CLEAN]) G -->|score 0.25 to 0.75| I([SUSPICIOUS]) G -->|score above 0.75| J([COMPROMISED]) I --> K[Store · Fingerprint · Alert · Dashboard] J --> K style H fill:#2d6a2d,color:#ffffff,stroke:#2d6a2d style I fill:#a67c00,color:#ffffff,stroke:#a67c00 style J fill:#8b1a1a,color:#ffffff,stroke:#8b1a1a ``` ### 检测规则 | 规则 | 模式 | 权重 | |------|---------|--------| | H01 | `ignore previous/all/prior instructions` | 0.35 | | H02 | `you are now / your new role/persona` | 0.30 | | H03 | `do not tell/inform the user` | 0.25 | | H04 | Token 注入标记 `[[SYSTEM]]`、`[INST]`、`<\|im_start\|>` | 0.15 | | H05 | 突发的语言/脚本切换 (规避检测) | 0.20 | | H06 | 工具输出中的祈使命令 (send、forward、upload、exfiltrate) | 0.20 | | H07 | 引用 system prompt / context window | 0.25 | | H08 | 数据泄露设置：secret/key/password 附近的 send/forward | 0.40 | | IB01–IB10 | 指令边界模式 (上下文清除、人设劫持、任务重定义等) | 0.20–0.35 | ## 安装 ### 一键安装 (推荐) ``` git clone https://github.com/YOUR_USERNAME/agentforensics cd agentforensics python install.py ``` 这将安装 Python 依赖、构建 VS Code 扩展、创建桌面应用程序并自动配置 Claude Desktop MCP。 ### 手动安装 ``` pip install agentforensics[dashboard] # 启动 dashboard python -m agentforensics.cli dashboard # → http://localhost:7890 ``` **要求：** Python 3.10+，Node.js 18+ (用于构建 VS Code 扩展) ## 使用说明 ### 1. Claude Desktop (MCP) 运行一次 `python install.py`。AgentForensics 会被自动配置为 MCP 服务器。重启 Claude Desktop - 监控将立即开始，无需更改任何代码。 ### 2. 代码包装器 - OpenAI / Anthropic 直接替换使用。只需导入一次，无需其他更改： ``` from agentforensics import trace from openai import OpenAI client = trace(OpenAI(api_key="sk-...")) response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Summarise this webpage: ..."}], ) # 自动捕获的注入信号 → 在 dashboard 中可见 ``` ### 3. VS Code 扩展从 `agentforensics-vscode/` 安装 `.vsix` 文件： ``` Extensions → ⋯ → Install from VSIX ``` 你打开、编辑或粘贴的每个文件都会被自动扫描。注入威胁会以红色波浪线显示，并出现在问题面板 (`Ctrl+Shift+M`) 中。 ### 4. LangChain ``` from agentforensics.integrations.langchain_handler import AgentForensicsHandler agent_executor.invoke( {"input": user_input}, config={"callbacks": [AgentForensicsHandler(session_id="my-session")]}, ) ``` ### 5. AutoGen ``` from agentforensics.integrations.autogen_tracer import patch_autogen patch_autogen() # 所有后续 AutoGen 对话均受监控 ``` ## 仪表盘通过桌面应用启动并点击 go to dashboard，或者运行 `python -m agentforensics.cli dashboard` | 视图 | 描述 | |------|-------------| | **Sessions** | 所有受监控的 agent 会话及其判定结果 (Clean / Suspicious / Compromised) | | **Live Monitor** | 通过 Server-Sent Events 实时推送注入警报 | | **Campaigns** | 按相似度跨会话分组的指纹化攻击模式 | | **False Alarms** | 已批准的安全内容 - 永久排除在未来的警报之外 | ## 基准测试 ### ARPIbench (推荐) 使用 7,560 个真实的间接注入载荷进行测试： ``` pip install datasets python benchtest.py # first 2000 rows python benchtest.py --all # full dataset python benchtest.py --ml # with ML stage enabled ``` ### 内置基准测试在无需网络的情况下测试每条规则的召回率和误报率： ``` python benchmark.py python benchmark.py --verbose ``` ## 环境变量 | 变量 | 默认值 | 描述 | |----------|---------|-------------| | `AF_INJECT_THRESHOLD` | `0.25` | 触发警报的最低综合得分 | | `AF_FP_THRESHOLD` | `0.70` | 指纹分组的余弦相似度阈值 | | `AF_DISABLE_ML` | `false` | 设为 `true` 以跳过 ML 阶段 (启动更快，无需 GPU) | | `AF_MODEL_PATH` | `agentforensics/model/` | 微调 DistilBERT 权重的路径 | | `AF_FP_DB` | `~/.agentforensics/fingerprints.db` | SQLite 指纹数据库路径 | | `AF_WINDOW_SIZE` | `3` | 用于滑动窗口多轮检测的轮次数量 | ## 项目结构 ``` agentforensics/ ├── classifier.py # Five-stage detection pipeline ├── fingerprinter.py # Attack fingerprinting & campaign clustering ├── alerting.py # Alert routing and SSE broadcast ├── reporter.py # HTML forensic report generation ├── semantic.py # Instruction boundary + semantic similarity ├── tracer.py # OpenAI/Anthropic SDK wrapper (trace()) ├── store.py # SQLite session/signal storage ├── mcp_server.py # Claude Desktop MCP integration ├── model/ # Fine-tuned DistilBERT weights (gitignored) ├── dashboard/ │ ├── server.py # FastAPI backend + SSE │ └── ui.html # Single-file dashboard UI └── integrations/ ├── langchain_handler.py └── autogen_tracer.py agentforensics-vscode/ # VS Code extension (TypeScript) benchmark.py # Built-in per-rule benchmark (no internet) benchtest.py # ARPIbench external benchmark install.py # One-command installer launcher.py # Desktop app launcher (tkinter + pywebview) ``` ## 引用如果你在研究中使用了 AgentForensics，请引用： ``` @software{agentforensics2026, author = Aparnaa, title = {AgentForensics: Real-Time Prompt Injection Detection and Forensics for LLM Agents}, year = {2026}, url = {https://github.com/aparnaa19/agentforensics}, license = {MIT} } ``` ## 许可证 MIT - 详见 [LICENSE](LICENSE)。

标签：AgentForensics, AI安全, API响应监控, ARPIbench, Chat Copilot, CISA项目, DistilBERT, DLL 劫持, LLM Agent, MIT License, PFX证书, Python, 仪表盘, 会话取证, 启发式规则, 多轮对话检测, 大语言模型, 工具输出监控, 开源安全工具, 指令边界检测, 提示词注入检测, 搜索语句（dork）, 攻击检测, 无后门, 机器学习分类器, 滑动窗口, 网络安全, 网络攻击指纹, 网页内容安全, 语义漂移, 逆向工具, 逆向工程平台, 间接提示注入, 隐私保护, 零代码侵入