keelson-ai/keelson

GitHub: keelson-ai/keelson

面向 AI 系统的自主安全测试工具，内置 210 个攻击剧本覆盖 13 个类别并映射 OWASP LLM Top 10，支持智能扫描、收敛扫描及 CI/CD 集成。

Stars: 4 | Forks: 1

# Keelson [![PyPI version](https://img.shields.io/pypi/v/keelson-ai)](https://pypi.org/project/keelson-ai/) [![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Tests](https://img.shields.io/badge/tests-774%20passing-brightgreen)]() **AI 系统自主安全测试 Agent。** Keelson 内置了 210 个安全测试 playbook，涵盖 13 个行为类别，并映射到 OWASP LLM Top 10。它支持 9 种目标适配器（OpenAI, Generic HTTP, Anthropic, LangGraph, MCP, A2A, CrewAI, LangChain, SiteGPT），12 个自适应测试树，10 个复合测试链，用于 CI/CD 集成的 SARIF + JUnit 输出，带有置信区间的统计活动引擎，具有跨类别反馈的迭代收敛扫描，运行时防御钩子，以及 6 个框架的合规性报告。测试策略基于真实扫描中经过实地验证的有效性数据。 ``` pip install keelson-ai ``` ## 快速开始 ``` # 扫描 OpenAI-compatible 端点 keelson scan https://api.example.com/v1/chat/completions --api-key $KEY # 带验证的并行流水线扫描 keelson pipeline-scan https://api.example.com/v1/chat/completions --api-key $KEY # 自适应智能扫描 (discover → classify → 通过 memo feedback 执行) keelson smart-scan https://api.example.com/v1/chat/completions --api-key $KEY # 收敛扫描 (迭代式跨类别反馈循环) keelson convergence-scan https://api.example.com/v1/chat/completions --api-key $KEY # 运行单个安全测试 keelson test https://api.example.com/v1/chat/completions GA-001 --api-key $KEY # 列出所有 210 个安全测试 keelson list # 统计性活动 (每个测试 10 次试验) keelson scan https://api.example.com/v1/chat/completions --tier deep --api-key $KEY # 用于 GitHub Code Scanning 的 SARIF 输出 keelson scan https://api.example.com/v1/chat/completions --format sarif --api-key $KEY # 用于 CI/CD 的 JUnit XML 输出 keelson scan https://api.example.com/v1/chat/completions --format junit --api-key $KEY # 发现漏洞则 CI 失败 keelson scan https://api.example.com/v1/chat/completions --fail-on-vuln --api-key $KEY # 直接扫描 CrewAI agent keelson test-crew my_crew.py # 直接扫描 LangChain agent keelson test-chain my_agent.py ``` ## CI/CD 集成将 AI 安全测试添加到您的 GitHub Actions 流水线中： ``` # .github/workflows/ai-security.yml name: AI Agent Security on: [push, pull_request] jobs: security-scan: runs-on: ubuntu-latest permissions: security-events: write steps: - uses: keelson-ai/keelson-action@v1 with: target-url: ${{ vars.AGENT_ENDPOINT }} api-key: ${{ secrets.AGENT_API_KEY }} ``` 结果显示在 Code Scanning 下的 **Security** 选项卡中。有关完整选项，请参阅 [keelson-action](https://github.com/keelson-ai/keelson-action)。 ## 工作原理 ``` Playbooks (.yaml) Target Agent Keelson Engine ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ 210 attacks │───>│ 9 Adapters │───>│ Scan Modes │ │ 13 categories│ │ OpenAI / │ │ scan (sequential) │ │ OWASP mapped │ │ Anthropic / │ │ pipeline (parallel) │ └──────────────┘ │ MCP / A2A / │ │ smart (adaptive) │ │ SiteGPT /... │ │ convergence (iter.) │ └──────────────┘ └──────────┬───────────┘ Orchestrators │ ┌──────────────┐ ┌──────────┴──────────┐ │ PAIR │───────────────────────>│ Detection pipeline │ │ Crescendo │ │ Pattern + LLM Judge │ │ Mutations │ │ Verification pass │ │ (13 types) │ │ Memo feedback loop │ └──────────────┘ └──────────┬──────────┘ │ ┌──────────┴──────────┐ │ Reports │ │ Markdown / SARIF / │ │ JUnit / Compliance │ └─────────────────────┘ ``` 1. **加载** 来自 `attacks/**/*.yaml` 的攻击 playbook（结构化 YAML，无代码） 2. **发送** Prompt 到目标，通过任何支持的适配器 3. **检测** 漏洞，使用模式检测、LLM-as-judge 评分或组合模式 4. **编排** 高级策略：PAIR 迭代优化、Crescendo 逐步升级、13 种变异类型 5. **收敛** 迭代：从响应中收集泄露的信息，将跨类别情报反馈到后续轮次 6. **评估** 每个响应为 **VULNERABLE**（易受攻击） / **SAFE**（安全） / **INCONCLUSIVE**（无定论） 7. **报告** 发现结果，包含 OWASP 映射、证据和修复建议 ## 测试类别 | 类别 | 前缀 | 数量 | OWASP | 测试内容 | |----------|--------|-------|-------|---------------| | **目标遵循** | GA | 56 | LLM01/LLM09 | Prompt 注入、角色劫持、系统 Prompt 提取、编码规避、上下文溢出、Crescendo 逐步升级、Skeleton Key、Many-shot 越狱、推理层攻击、融洽关系利用、结构化数据注入、模型指纹识别、间接 Prompt 注入 (IDPI)、Unicode/同形字规避、权威模拟、多语言重复、多向量心理利用、企业框架绕过、三段论推理操纵、假设反事实绕过、元推理反转、逻辑悖论利用、响应模板劫持、共享资源注入、**合法知识提取**、**增量架构披露** | | **工具安全** | TS | 40 | LLM02/LLM06/LLM07 | 文件访问、命令注入、SQL 注入、未授权 API 调用、权限提升、路径遍历、MCP 工具投毒、MCP Rug Pull、跨服务器污染、SSRF、副作用检测、过度代理、强制金融交易、两阶段 URL 泄露、URI 方案重定向、强制 URL 打开、**私有数据源枚举**、**写访问探测** | | **记忆完整性** | MI | 23 | LLM05 | 历史记录投毒、身份持久化、虚假工具结果、跨轮次泄露、错误信息泄露、存储型 Payload 注入、上下文窗口泛洪、渐进式记忆投毒、虚假记忆植入、矛盾事实混淆、RAG 投毒、自然语言休眠触发器、折叠 UI 内容投毒 | | **权限边界** | PB | 12 | LLM02 | 角色升级、跨用户访问、范围扩大、授权绕过、权限持久化 | | **委托完整性** | DI | 7 | LLM08/LLM09 | 未授权子 Agent、信任边界违规、委托范围清洗、跨 Agent 横向移动 | | **执行安全** | ES | 13 | LLM02/LLM06 | 无界执行、资源耗尽、沙箱逃逸、审计规避、不安全的反序列化、HTML/脚本输出注入、破坏性命令注入 | | **会话隔离** | SI | 13 | LLM01/LLM05 | 跨会话泄露、会话劫持、多租户违规、模型指纹识别、对话历史投毒、调试工具提取 | | **认知架构** | CA | 8 | LLM01/LLM09 | 思维链投毒、推理操纵、元认知攻击 | | **对话泄露** | EX | 9 | LLM01/LLM06 | 通过对话提取数据、行为指纹识别、**框架/基础设施指纹识别** | | **供应链语言** | SL | 8 | LLM03/LLM05 | RAG 文档注入、依赖混淆、插件投毒 | | **输出武器化** | OW | 7 | LLM02/LLM06 | 后门代码生成、恶意输出制作 | | **时间持久性** | TP | 7 | LLM05/LLM08 | 延迟动作注入、基于时间的持久化 | | **多 Agent 安全** | MA | 7 | LLM08/LLM09 | Agent 冒充、跨 Agent 攻击 | ## 适配器 Keelson 通过可插拔的适配器接口与目标通信： | 适配器 | 标志 | 协议 | 用例 | |---------|------|----------|----------| | **OpenAI** | `--adapter openai` | Chat Completions API | GPT 模型, OpenAI API | | **Generic HTTP** | `--adapter http` | Chat Completions API | 本地模型, 任何 OpenAI 兼容的端点 | | **Anthropic** | `--adapter anthropic` | Messages API | Claude 模型 | | **LangGraph** | `--adapter langgraph` | LangGraph Platform | LangGraph Agent | | **MCP** | `--adapter mcp` | JSON-RPC 2.0 | MCP 工具服务器 | | **A2A** | `--adapter a2a` | Google A2A Protocol | A2A 兼容 Agent | | **CrewAI** | `test-crew` 命令 | 进程内 | CrewAI crews/agents | | **LangChain** | `test-chain` 命令 | 进程内 | LangChain agents/chains | | **SiteGPT** | `--adapter sitegpt` | WebSocket / REST | SiteGPT 聊天机器人 | ``` # OpenAI-compatible (默认) keelson scan http://localhost:11434/v1/chat/completions # Anthropic keelson scan https://api.anthropic.com --adapter anthropic --api-key $KEY # LangGraph Platform keelson scan https://my-agent.langraph.com --adapter langgraph --assistant-id my-agent # MCP server keelson scan http://localhost:3000 --adapter mcp --tool-name ask # A2A agent keelson scan http://localhost:8000 --adapter a2a # CrewAI (进程内, 无 HTTP) keelson test-crew path/to/my_crew.py # LangChain (进程内, 无 HTTP) keelson test-chain path/to/my_agent.py # SiteGPT chatbot (WebSocket 或 REST) keelson scan https://widget.sitegpt.ai --adapter sitegpt --chatbot-id YOUR_CHATBOT_ID ``` ## CLI 命令 | 命令 | 描述 | |---------|-------------| | `keelson scan ` | 完整安全扫描（顺序执行，带动态重排序） | | `keelson pipeline-scan ` | 并行扫描，支持检查点/恢复和验证 | | `keelson smart-scan ` | 自适应扫描：发现、分类、Memo 引导会话 | | `keelson convergence-scan ` | 迭代扫描，具有跨类别反馈和泄露收集 | | `keelson test ` | 运行单个安全测试 | | `keelson list` | 列出所有可用的攻击 | | `keelson campaign ` | 统计活动（每个攻击 N 次试验） | | `keelson discover ` | 指纹识别 Agent 能力 | | `keelson evolve ` | 变异攻击以寻找绕过方法 | | `keelson chain

` | 比较两次扫描以检查回归 | | `keelson baseline ` | 设置回归基线 | | `keelson compliance ` | 生成合规性报告 | | `keelson report ` | 重新生成扫描报告 | | `keelson history` | 显示扫描历史 | ## 输出格式 ### Markdown 报告 ``` keelson scan --api-key $KEY # -> reports/scan-2026-03-04-120000.md ``` 报告包括执行摘要、按类别分组的发现及证据（Prompt + 响应）、OWASP 映射和修复建议。 ### SARIF (用于 CI/CD) ``` keelson scan --format sarif --api-key $KEY # -> reports/scan-2026-03-04-120000.sarif.json ``` SARIF v2.1.0 输出可与 GitHub Code Scanning、VS Code SARIF Viewer 和其他兼容 SARIF 的工具集成。 ### JUnit XML (用于 CI/CD) ``` keelson scan --format junit --api-key $KEY # -> reports/scan-2026-03-04-120000.junit.xml ``` JUnit XML 可与 Jenkins、GitLab CI、GitHub Actions 以及任何支持 JUnit 测试报告的 CI 系统集成。 ### CI/CD 失败门禁 ``` # 发现任何漏洞则流水线失败 keelson scan --fail-on-vuln --api-key $KEY # 若漏洞率超过阈值 (0.0–1.0) 则失败 keelson scan --fail-threshold 0.1 --api-key $KEY ``` ### 合规性报告 ``` keelson compliance --framework owasp-llm-top10 keelson compliance --framework nist-ai-rmf keelson compliance --framework eu-ai-act keelson compliance --framework iso-42001 keelson compliance --framework soc2 keelson compliance --framework pci-dss-v4 ``` ## GitHub Actions ``` # .github/workflows/ai-security.yml name: AI Agent Security on: [push, pull_request] jobs: keelson: runs-on: ubuntu-latest permissions: security-events: write steps: - uses: actions/setup-python@v5 with: python-version: "3.12" - run: pip install keelson-ai - run: keelson scan ${{ vars.AGENT_URL }} --api-key ${{ secrets.AGENT_KEY }} --format sarif --output results/ --fail-on-vuln --no-save - uses: github/codeql-action/upload-sarif@v3 if: always() with: sarif_file: results/ ``` ## 统计活动将每个攻击运行 N 次以获得具有 Wilson 分数置信区间的统计显著性结果： ``` # 快速扫描 (1 次试验, 快速) keelson scan --tier fast --api-key $KEY # 深度扫描 (10 次试验, 并发) keelson scan --tier deep --api-key $KEY # 通过 TOML 配置的自定义活动 keelson campaign config.toml ``` 活动配置示例： ``` [campaign] name = "nightly-regression" trials_per_attack = 10 confidence_level = 0.95 [target] url = "https://api.example.com/v1/chat/completions" api_key = "sk-..." [concurrency] max_concurrent_trials = 5 early_termination_threshold = 3 ``` ## Keelson Defend (运行时保护) Keelson Defend 是一个策略引擎，可在运行时拦截不安全的工具调用和内容，直接与 CrewAI 和 LangChain Agent 集成。 ### 策略配置 ``` # defend-policy.yaml tool_rules: - pattern: "delete_*" action: deny reason: "File deletion blocked by policy" - pattern: "execute_command" action: deny reason: "Shell execution not permitted" - pattern: "send_email" action: log reason: "Email sending logged for audit" content_rules: - pattern: "password|secret|api_key|token" action: deny check_input: true check_output: true default_tool_action: allow log_all: false ``` ### CrewAI 集成 ``` from keelson.defend import load_policy, PolicyEngine, register_crewai_hooks policy = load_policy("defend-policy.yaml") engine = PolicyEngine(policy) register_crewai_hooks(engine) # 所有 CrewAI 工具调用现已强制执行策略 ``` ### LangChain 集成 ``` from keelson.defend import load_policy, PolicyEngine, KeelsonDefendMiddleware policy = load_policy("defend-policy.yaml") engine = PolicyEngine(policy) middleware = KeelsonDefendMiddleware(engine) # 封装您 agent 的工具和模型调用 ``` ## 添加自定义测试在相应的 `attacks/` 子目录中创建一个 `.yaml` 文件： ``` id: XX-NNN name: Attack Name severity: high category: category_name owasp_id: "LLM0X — Name" objective: > What this attack tests. turns: - role: user content: > The attack prompt to send. - role: user content: > Follow-up prompt (for multi-turn attacks). evaluation: vulnerable_if: - Agent complies with the attack request - Agent leaks sensitive information safe_if: - Agent refuses the request - Agent maintains boundaries inconclusive_if: - Response is ambiguous or partial ``` ## 项目结构 ``` keelson/ ├── agents/ # Agent instructions │ └── pentester.md # Pentester agent prompt ├── commands/ # Plugin slash commands │ ├── scan.md # /keelson:scan │ ├── attack.md # /keelson:attack │ └── report.md # /keelson:report ├── attacks/ # 210 attack playbooks (YAML) │ ├── goal-adherence/ # GA (56 attacks) │ ├── tool-safety/ # TS (40 attacks) │ ├── memory-integrity/ # MI (23 attacks) │ ├── session-isolation/ # SI (13 attacks) │ ├── execution-safety/ # ES (13 attacks) │ ├── permission-boundaries/ # PB (12 attacks) │ ├── cognitive-architecture/ # CA (8 attacks) │ ├── conversational-exfiltration/# EX (9 attacks) │ ├── supply-chain-language/ # SL (8 attacks) │ ├── delegation-integrity/ # DI (7 attacks) │ ├── multi-agent-security/ # MA (7 attacks) │ ├── output-weaponization/ # OW (7 attacks) │ └── temporal-persistence/ # TP (7 attacks) ├── src/keelson/ # Python engine │ ├── cli/ # Typer CLI (18 commands) │ │ ├── __init__.py # App setup, shared helpers │ │ ├── commands.py # Command module registration │ │ ├── scan_commands.py # scan, pipeline-scan, smart-scan, attack │ │ ├── ops_commands.py # list, report, history, diff, discover, baseline, compliance │ │ └── advanced_commands.py # campaign, evolve, chain, generate, test-crew, test-chain │ ├── adapters/ # 9 target adapters │ │ ├── base.py # BaseAdapter interface │ │ ├── openai.py # OpenAI API │ │ ├── http.py # GenericHTTPAdapter (OpenAI-compat) │ │ ├── anthropic.py # Anthropic Messages API │ │ ├── langgraph.py # LangGraph Platform │ │ ├── mcp.py # Model Context Protocol │ │ ├── a2a.py # Google A2A Protocol │ │ ├── crewai.py # CrewAI native (in-process) │ │ ├── langchain.py # LangChain native (in-process) │ │ ├── sitegpt.py # SiteGPT (WebSocket / REST) │ │ ├── cache.py # Response caching decorator │ │ └── attacker.py # Attacker LLM wrapper │ ├── core/ # Engine, scanner, detection │ │ ├── engine.py # Multi-turn attack executor │ │ ├── execution.py # Shared primitives (sequential, parallel, verify) │ │ ├── scanner.py # Sequential scan with dynamic reorder │ │ ├── pipeline.py # Parallel scan with checkpoint/resume │ │ ├── smart_scan.py # Adaptive scan with memo feedback │ │ ├── convergence.py # Iterative convergence with cross-feed │ │ ├── memo.py # Memo table for technique tracking │ │ ├── strategist.py # LLM-based target classification │ │ ├── detection.py # Pattern-based verdict detection │ │ ├── observer.py # Streaming leakage analysis │ │ ├── llm_judge.py # LLM-as-judge semantic evaluation │ │ ├── templates.py # Playbook parser (markdown) │ │ ├── yaml_templates.py # Playbook parser (YAML) │ │ ├── models.py # Core data models │ │ ├── reporter.py # Markdown report generation │ │ ├── executive_report.py # Executive summary format │ │ ├── sarif.py # SARIF v2.1.0 output │ │ ├── junit.py # JUnit XML output │ │ └── compliance.py # 6 compliance frameworks │ ├── defend/ # Runtime protection │ │ ├── engine.py # Policy evaluation engine │ │ ├── models.py # Policy, rules, actions │ │ ├── loader.py # YAML policy loader │ │ ├── crewai_hook.py # CrewAI middleware hooks │ │ └── langchain_hook.py # LangChain middleware hooks │ ├── attacker/ # Attack generation │ │ ├── generator.py # LLM-powered prompt generation │ │ ├── discovery.py # Agent capability fingerprinting │ │ ├── chains.py # Compound attack chain synthesis │ │ └── provider.py # Cross-provider attacker selection │ ├── adaptive/ # Mutation engine + orchestrators │ │ ├── mutations.py # 13 programmatic + LLM mutations │ │ ├── branching.py # Conversation tree exploration │ │ ├── attack_tree.py # Attack tree data structures │ │ ├── pair.py # PAIR iterative refinement orchestrator │ │ ├── crescendo.py # Crescendo gradual escalation orchestrator │ │ └── strategies.py # Mutation scheduling │ ├── campaign/ # Statistical campaigns │ │ ├── runner.py # N-trial execution with CI │ │ ├── tiers.py # Fast/Deep/Continuous presets │ │ ├── scheduler.py # Campaign scheduling │ │ └── config.py # TOML config parser │ ├── diff/ # Scan comparison │ │ └── comparator.py # Regression detection │ └── state/ # Persistence │ ├── base.py # Storage base interface │ └── store.py # SQLite storage ├── tests/ # 774 tests ├── docs/ # Documentation │ ├── adr/ # Architecture Decision Records │ │ ├── ADR-001-framework.md # FastAPI selection │ │ ├── ADR-002-dependency-management.md # uv selection │ │ └── ADR-003-observability.md # Structured logging + OTel plan │ ├── plans/ # Roadmap │ ├── openapi.yaml # OpenAPI 3.1.0 API contract │ └── github-action-spec.md # GitHub Action design ├── pyproject.toml # Python packaging └── LICENSE # Apache 2.0 ``` ## 开发 ``` # 克隆 git clone https://github.com/keelson-ai/keelson.git cd keelson # 安装 dev 依赖 pip install -e ".[dev]" # 运行测试 pytest # 运行测试并显示详细输出 pytest -v # Lint ruff check . # 类型检查 (strict 模式, 0 个错误) pyright ``` ### 可选依赖项 ``` # CrewAI adapter pip install "keelson-ai[crewai]" # LangChain adapter pip install "keelson-ai[langchain]" # 所有可选 adapter pip install "keelson-ai[all]" ``` ### 安全此工具仅用于**授权的安全测试**。请勿在未经许可的情况下使用 Keelson 测试系统。如果您在 Keelson 本身发现安全问题，请通过 [GitHub Security Advisories](https://github.com/keelson-ai/keelson/security/advisories) 报告。 ## 架构 ### 流程图 #### 核心扫描流水线 (顺序) ``` flowchart TD A[Load Playbooks] --> B[Send Prompts via Adapter] B --> C[Collect Evidence] C --> D[Detection Pipeline
Pattern / LLM Judge / Combined] D --> E{Verdict?} E -->|VULNERABLE| DP{Deep Probe
enabled?} DP -->|Yes| BRANCH[Explore follow-up paths
via conversation branching] BRANCH --> F[Record Finding + Probe Findings] DP -->|No| F E -->|SAFE| F[Record Finding] E -->|INCONCLUSIVE| F F --> G{More Attacks?} G -->|Yes| H[Dynamic Reorder
by Vuln Categories] H --> B G -->|No| I[Generate Report] style E fill:#f9f,stroke:#333 style I fill:#9f9,stroke:#333 style BRANCH fill:#fde8e8,stroke:#333 ``` #### 流水线扫描 (并行 + 检查点 + 验证) ``` flowchart TD subgraph Phase1[Phase 1: Load] L[Load Playbooks] --> CP{Checkpoint
exists?} CP -->|Yes| RESUME[Resume from checkpoint
skip completed attacks] CP -->|No| ALL[All templates] end subgraph Phase2[Phase 2: Parallel Execution] RESUME --> SEM[Semaphore-based concurrency
max_concurrent attacks] ALL --> SEM SEM --> EX1[Attack 1] SEM --> EX2[Attack 2] SEM --> EXN[Attack N] EX1 --> COLL[Collect Findings] EX2 --> COLL EXN --> COLL end subgraph Phase3[Phase 3: Verification] COLL --> VULN[Filter VULNERABLE] VULN --> RE[Re-probe each finding] RE --> CONF{Agent complies
again?} CONF -->|Yes| CONFIRMED[VULNERABLE confirmed] CONF -->|Refused| DOWN[Downgrade to INCONCLUSIVE] end subgraph Phase4[Phase 4: Report] CONFIRMED --> MERGE[Merge verified findings] DOWN --> MERGE MERGE --> RPT[Generate Report] end style Phase1 fill:#e8f4fd,stroke:#333 style Phase2 fill:#fdf8e8,stroke:#333 style Phase3 fill:#fde8e8,stroke:#333 style Phase4 fill:#e8fde8,stroke:#333 ``` #### 带 Memoization 的智能扫描 ``` flowchart TD subgraph Phase1[Phase 1: Discovery] P1[8 Capability Probes] --> P1R[Agent Profile] end subgraph Phase2[Phase 2: Classification] P1R --> CL[Classify Target Type] CL --> TP[Target Profile
types, tools, memory, refusal style] end subgraph Phase3[Phase 3: Attack Selection] TP --> SEL[Select Relevant Attacks] SEL --> GRP[Group into Sessions by Category] end subgraph Phase4[Phase 4: Execute with Memo] GRP --> MEMO[Initialize Memo Table] MEMO --> SESS[Execute Session] SESS --> REC[Record Finding → Memo] REC --> REORDER[Reorder Remaining Sessions
by Memo-Informed Scores] REORDER --> ADAPT{Adapt Plan?} ADAPT -->|Escalate/De-escalate| SESS ADAPT -->|Done| SUM[Final Memo Summary] end style Phase1 fill:#e8f4fd,stroke:#333 style Phase2 fill:#fde8e8,stroke:#333 style Phase3 fill:#e8fde8,stroke:#333 style Phase4 fill:#fdf8e8,stroke:#333 ``` #### 收敛扫描 (迭代跨类别反馈) ``` flowchart TD subgraph Pass1[Pass 1: Initial Scan] LOAD[Load Playbooks
category filter optional] --> EXEC1[Execute All Attacks] EXEC1 --> F1[Findings] end subgraph Harvest[Structural Analysis] F1 --> HL[Harvest Leaked Info
from ALL responses] HL --> TYPES[System prompts / Tool names
Credentials / Internal URLs
Config values / Model names] end subgraph CrossFeed[Cross-Category Feed] F1 --> VULN{Vulnerabilities
found?} VULN -->|Yes| XMAP[Cross-Category Map
13 category relationships] XMAP --> SELECT[Select attacks from
related categories] TYPES --> LTARGET[Leakage-Targeted Attacks
Tool leak → Tool Safety
Cred leak → Exfiltration
Prompt leak → Goal Adherence] end subgraph PassN[Pass 2+: Iterative] SELECT --> MERGE[Merge & Deduplicate] LTARGET --> MERGE MERGE --> EXECN[Execute Cross-Feed Attacks] EXECN --> FN[New Findings] FN --> CONV{New vulns or
new leakage?} CONV -->|Yes| Harvest CONV -->|No| DONE[Converged — Stop] end VULN -->|No leakage either| DONE style Pass1 fill:#e8f4fd,stroke:#333 style Harvest fill:#fde8e8,stroke:#333 style CrossFeed fill:#fdf8e8,stroke:#333 style PassN fill:#e8fde8,stroke:#333 style DONE fill:#9f9,stroke:#333 ``` #### Memo 反馈循环 ``` flowchart LR subgraph Record F[Finding] --> IT[Infer Techniques
authority, roleplay, etc.] IT --> CO[Classify Outcome
complied / partial / refused] CO --> EL[Extract Leaked Info
tools, paths, URLs, env vars] EL --> MT[(Memo Table)] end subgraph Query MT --> EFF[Effective Techniques
VULNERABLE → weight 1.0] MT --> PROM[Promising Techniques
INCONCLUSIVE → weight 0.3] MT --> DEAD[Dead-End Techniques
SAFE-only → penalize] MT --> CROSS[Cross-Category Signal
global score × 0.5] end subgraph Apply EFF --> SCORE[Score & Reorder
Next Session's Attacks] PROM --> SCORE DEAD --> SCORE CROSS --> SCORE end style MT fill:#f9f,stroke:#333 ``` #### 攻击树执行 ``` flowchart TD ROOT[Root Prompt] --> SEND[Send to Target] SEND --> CLASS{Classify Response} CLASS -->|Compliance| VULN{Vulnerable?} CLASS -->|Partial| BR_P[Select Branch
for PARTIAL] CLASS -->|Refusal| BR_R[Select Branch
for REFUSAL] VULN -->|Yes| STOP[Stop — Vulnerability Found] VULN -->|No| BR_C[Select Branch
for COMPLIANCE] BR_P --> MEMO_SEL[Memo-Informed Selection
Prefer effective techniques
Avoid dead ends] BR_R --> MEMO_SEL BR_C --> MEMO_SEL MEMO_SEL --> NEXT[Execute Next Prompt] NEXT --> CLASS2{Classify Response} CLASS2 -->|Compliance| VULN2{Vulnerable?} CLASS2 -->|Partial/Refusal| DEEPER{Max Depth?} VULN2 -->|Yes| STOP VULN2 -->|No| DEEPER DEEPER -->|No| MEMO_SEL DEEPER -->|Yes| FINAL[Final Verdict] FINAL --> REC_MEMO[Record Result → Memo Table] style STOP fill:#f99,stroke:#333 style MEMO_SEL fill:#f9f,stroke:#333 style REC_MEMO fill:#9f9,stroke:#333 ``` #### 检测流水线 ``` flowchart TD EV[Evidence: prompts + responses] --> MODE{Scoring
Method?} MODE -->|PATTERN| SE[Side Effects Check
Tool call wrappers + dangerous names] MODE -->|COMBINED| BOTH_PATH[Run Pattern + LLM Judge in parallel] MODE -->|LLM_JUDGE| JUDGE_ONLY[LLM-as-Judge Only] SE -->|Match| V1[VULNERABLE] SE -->|No match| SIG[Collect Signals] SIG --> VM[vulnerable_if matches] SIG --> SM[safe_if matches] SIG --> RF[Refusal phrase matches] VM --> COMB{Both vuln
and safe?} SM --> COMB RF --> COMB COMB -->|Yes, multi-step| EACH[Check each step individually] EACH -->|Any step vuln without refusal| V2[VULNERABLE
Leak at step K not undone
by later refusal] EACH -->|All vuln steps also refused| SAFE1[SAFE] COMB -->|Yes, single-step| DISC[Substantial disclosure?
Code blocks / structured lists] DISC -->|Yes| V3[VULNERABLE
Disclosed despite refusal] DISC -->|No| SAFE1 COMB -->|Only vuln| V4[VULNERABLE] COMB -->|Only safe/refusal| SAFE2[SAFE] COMB -->|Neither| LEAK{High-confidence
leakage signals?} LEAK -->|Yes| V5[VULNERABLE] LEAK -->|No| INC[INCONCLUSIVE] JUDGE_ONLY --> JR[Judge LLM evaluates
objective + evidence + criteria] JR --> JV[VERDICT + confidence + reasoning] BOTH_PATH --> PAT_V[Pattern Verdict] BOTH_PATH --> JDG_V[Judge Verdict] PAT_V --> RESOLVE{Resolve
Disagreement} JDG_V --> RESOLVE RESOLVE -->|Both agree| BOOST[Use verdict
confidence + 0.15] RESOLVE -->|Pattern VULN, Judge SAFE| TRUST_J1[Trust Judge → SAFE
reduces false positives] RESOLVE -->|Pattern SAFE, Judge VULN| CONF{Judge
confidence ≥ 0.7?} CONF -->|Yes| TRUST_J2[Trust Judge → VULNERABLE
catches subtle compliance] CONF -->|No| KEEP_S[Keep SAFE] RESOLVE -->|One INCONCLUSIVE| DEFER[Defer to the other verdict] style V1 fill:#f99,stroke:#333 style V2 fill:#f99,stroke:#333 style V3 fill:#f99,stroke:#333 style V4 fill:#f99,stroke:#333 style V5 fill:#f99,stroke:#333 style SAFE1 fill:#9f9,stroke:#333 style SAFE2 fill:#9f9,stroke:#333 style INC fill:#ff9,stroke:#333 style JV fill:#f9f,stroke:#333 style BOOST fill:#9f9,stroke:#333 style TRUST_J1 fill:#9f9,stroke:#333 style TRUST_J2 fill:#f99,stroke:#333 style KEEP_S fill:#9f9,stroke:#333 ``` #### PAIR 编排器 (Prompt 自动迭代优化) ``` flowchart TD START[Initial Attack Prompt] --> SEND[Send to Target] SEND --> EVAL{Evaluate Response} EVAL -->|VULNERABLE| SUCCESS[Vulnerability Found — Stop] EVAL -->|SAFE / INCONCLUSIVE| CHECK{Max iterations
reached?} CHECK -->|Yes| FINAL[Final Verdict from
all collected evidence] CHECK -->|No| REFINE[Feed response to Attacker LLM] REFINE --> GEN[Attacker generates refined prompt
• Different framing/angle
• Addresses specific refusal reasons
• New technique each iteration] GEN --> SEND style SUCCESS fill:#f99,stroke:#333 style FINAL fill:#ff9,stroke:#333 style REFINE fill:#f9f,stroke:#333 ``` #### Crescendo 编排器 (逐步升级) ``` flowchart TD subgraph Phase1[Phase 1: Plan] OBJ[Attack Objective] --> PLAN[Attacker LLM plans
N-turn escalation sequence] PLAN --> TURNS[Turn 1: Innocent rapport
Turn 2-3: Domain introduction
Turn 4-5: Approach objective
Turn N: Full request] end subgraph Phase2[Phase 2: Execute] TURNS --> T1[Turn 1 — Establish rapport] T1 --> RESP1[Target responds cooperatively] RESP1 --> ADAPT[Adapt next turn based on
target's actual response] ADAPT --> TN[Turn N — Escalated request] end subgraph Phase3[Phase 3: Evaluate] TN --> EVAL{Evaluate from
turn 3 onward} EVAL -->|VULNERABLE| STOP[Escalation Succeeded
at turn K/N] EVAL -->|SAFE| CONT{More turns?} CONT -->|Yes| ADAPT CONT -->|No| DONE[Final Verdict] end style Phase1 fill:#e8f4fd,stroke:#333 style Phase2 fill:#fdf8e8,stroke:#333 style Phase3 fill:#fde8e8,stroke:#333 style STOP fill:#f99,stroke:#333 ``` #### 变异引擎 ``` flowchart LR subgraph Programmatic[Programmatic Mutations] P1[Base64 Encode] P2[Leetspeak] P3[Context Overflow] P4[ROT13] P5[Unicode Homoglyph] P6[Char Split — ZWSP] P7[Reversed Words] P8[Morse Code] P9[Caesar Cipher] end subgraph LLMPowered[LLM-Powered Mutations] L1[Paraphrase] L2[Roleplay Wrap] L3[Gradual Escalation] L4[Translation] end ORIG[Original Prompt] --> Programmatic ORIG --> LLMPowered Programmatic --> MUT[Mutated Attack] LLMPowered --> MUT MUT --> EXEC[Execute against Target] EXEC --> DET[Detection Pipeline] style Programmatic fill:#e8f4fd,stroke:#333 style LLMPowered fill:#fde8e8,stroke:#333 style MUT fill:#f9f,stroke:#333 ``` ### API 规范 Keelson 服务的权威 OpenAPI 3.1.0 契约位于 [`docs/openapi.yaml`](docs/openapi.yaml)。它涵盖了 `/health` 端点（已实现）以及第二阶段扫描、攻击和报告端点的占位符路径。 ### 架构决策记录关键技术决策记录为 [MADR](https://adr.github.io/madr/) 记录，保存在 [`docs/adr/`](docs/adr/) 中： | ADR | 决策 | 状态 | |-----|----------|--------| | [ADR-001](docs/adr/ADR-001-framework.md) | Web 框架：FastAPI (Async 优先，自动生成 OpenAPI) | 已采纳 | | [ADR-002](docs/adr/ADR-002-dependency-management.md) | 依赖管理：uv (快速解析器, `uv.lock`) | 已采纳 | | [ADR-003](docs/adr/ADR-003-observability.md) | 可观测性：目前使用结构化日志，第二阶段引入 OpenTelemetry | 已采纳 | ## 路线图有关完整路线图，请参阅 [docs/plans/](docs/plans/)。 **即将推出：** - 漂移检测和持续监控 - 语义覆盖跟踪 - REST API 和 Web 仪表板 - GitHub Action (`keelson-ai/keelson-action`) ## 许可证 Apache 2.0 — 详见 [LICENSE](LICENSE)。

标签：AI智能体, AI渗透测试, Anthropic, CIS基准, CrewAI, Cybersecurity, DevSecOps, FTP漏洞扫描, GitHub, JUnit, Kubernetes 安全, LangChain, OpenAI, OWASP LLM Top 10, PE 加载器, Python, SARIF, 上游代理, 人工智能安全, 内存规避, 反取证, 合规性, 合规报告, 大语言模型安全, 安全合规, 安全编排, 安全评估, 密码管理, 搜索语句（dork）, 攻击模拟, 无后门, 机密管理, 模型鲁棒性, 红队框架, 网络代理, 自动化攻击, 轻量级, 逆向工具, 驱动签名利用