toniantunovi/prowl

GitHub: toniantunovi/prowl

Prowl 是一款借助 LLM 与 Docker 沙箱的自主漏洞发现与 PoC 验证工具。

Stars: 8 | Forks: 0

# Prowl ``` pip install "prowl-sec[anthropic]" export ANTHROPIC_API_KEY=sk-ant-... # https://console.anthropic.com/settings/keys prowl scan /path/to/project ``` 更喜欢使用 `.env` 文件？将 `ANTHROPIC_API_KEY=sk-ant-...` 放入运行 `prowl scan` 的目录中的 `.env` 文件即可——会自动读取。侦察完全确定性（tree-sitter，无 LLM —— 同一代码库始终生成相同的目标列表）。LLM 层受规则约束、置信度门控并缓存。PoC 生成通过 [Claw Code](https://github.com/ultraworkers/claw-code) 在加固的 Docker 沙箱中作为代理循环运行。 ## Prowl 与 SAST | Prowl | SAST (semgrep, CodeQL) | |-------|------------------------| | 推理“意图”，然后证明可利用性 | 匹配语法模式 | | 构建实际项目，用 crafted inputs 运行真实二进制文件/服务器 —— ASAN 在真实二进制文件上崩溃，向运行中的服务器发送 HTTP 请求 | 源码行级标记 | | LLM 推理；消耗 token，耗时数分钟 | 确定性、快速、免费 | | 研究工作台 | 适合 CI | 同时运行两者。`semgrep --config auto .` 在合并时捕获已知模式；`prowl scan .` 找到让未认证用户批准自身退款的三步 API 调用序列，并生成演示该操作的 HTTP 请求序列。 | 领域 | Prowl 产出 | |--------|---------------------| | Web 应用漏洞 | 安装依赖，启动真实服务器，发送 HTTP 请求证明未授权访问 / 无效状态转移 | | 注入 | 启动真实应用，发送 crafted 输入通过精确的 sink 提取数据 | | 内存安全（C/C++/Rust unsafe） | 用 ASAN 编译真实项目，运行真实二进制文件触发 ASAN 报告 | | 并发 | 构建并运行真实二进制文件/服务器，发送并发请求证明竞争窗口 | | 多漏洞链 | 链识别 + 针对运行应用的端到端验证（SSRF + 无内部认证 → 内部网络访问） | ## 工作原理 ``` Target codebase │ ▼ RECONNAISSANCE (deterministic, no LLM) Tree-sitter parsing → function extraction → risk signal detection → vulnerability scoring → call graph → taint tracking → target ranking │ ▼ (ranked target list) LAYER 1: HYPOTHESIS (LLM) Context builder scopes each target function (~4K tokens) LLM hypothesizes vulnerabilities against detection rubrics Confidence gating: promote (≥0.7) / batch (0.4–0.7) / suppress (<0.4) │ ▼ (confidence-gated hypotheses) LAYER 2: TRIAGE + CHAIN ANALYSIS (LLM) Individual + batch triage: exploitable / mitigated / false_positive / uncertain Deterministic chain grouping + LLM chain evaluation Severity gating for Layer 3 │ ▼ (exploitable + uncertain findings) LAYER 3: EXPLOIT VALIDATION (Claw Code + Docker sandbox) Claw Code agent builds the actual project inside Docker C/C++: compiles with ASAN/UBSAN, runs the real binary with crafted inputs Python/Node: installs deps, starts the actual server, sends crafted HTTP requests Go/Rust/Java: builds with native toolchain, runs with crafted inputs Per-vuln-class validation (ASAN crashes, injection markers, auth bypass, ...) Optional patch generation with compile + test validation │ ▼ VULNERABILITY REPORT (text / JSON / SARIF / AI / Markdown) ``` **每跳 LLM 的结构化输出强制**：提示词中的 Pydantic 模式、响应的 `model_validate_json`、一次带错误上下文的重试、第二次失败则跳过目标。**依赖注入**用于 LLM 客户端和沙箱管理器，因此整个管道在测试中可通过 `MockLLMClient` + `MockSandboxManager` 运行（385 个测试，约 1 秒）。**通过 `anyio.CapacityLimiter` 的异步与有界并行**——8 个并发假设，4 个分类，2 个验证。 ## 需求 - Python 3.11+ - Docker（用于沙箱容器中的 Layer 3 PoC 验证） - 至少一个 LLM 提供商的 API 密钥（OpenAI、Anthropic、Google 或本地 Ollama 实例） - [Claw Code](https://github.com/ultraworkers/claw-code)（在 Docker 沙箱中自动安装用于 Layer 3 PoC 验证） ## 安装从 PyPI（发布为 `prowl-sec`；CLI 命令和导入名称保持为 `prowl`）： ``` pip install "prowl-sec[anthropic]" # Anthropic (default) pip install "prowl-sec[openai]" # OpenAI pip install "prowl-sec[google]" # Google pip install "prowl-sec[ollama]" # Ollama (local models) pip install "prowl-sec[all-llm]" # All providers ``` 从源码（可编辑安装，用于开发）： ``` cd prowl pip install -e ".[dev,anthropic]" # Anthropic (default) pip install -e ".[dev,openai]" # OpenAI pip install -e ".[dev,google]" # Google pip install -e ".[dev,ollama]" # Ollama (local models) pip install -e ".[dev,all-llm]" # All providers ``` 如果无法进行可编辑安装（例如缺少 setuptools），请直接设置 `PYTHONPATH`： ``` export PYTHONPATH=/path/to/prowl/src ``` 验证安装： ``` python -m prowl --help ``` ## LLM 配置将 API 密钥设置为环境变量： ``` export ANTHROPIC_API_KEY=your-key # for Anthropic (default) export OPENAI_API_KEY=your-key # for OpenAI export GOOGLE_API_KEY=your-key # for Google ``` 在 `prowl.yml` 中配置提供者和模型： ``` llm: provider: anthropic # openai | anthropic | google | ollama model: claude-opus-4-6 temperature: 0.0 # Per-layer model overrides (optional) hypothesis: model: claude-haiku-4-5-20251001 # fast/cheap for high-volume Layer 1 triage: model: claude-opus-4-6 # strong reasoning for Layer 2 validation: model: claude-opus-4-6 # code generation for Layer 3 ``` 通过 Ollama 使用本地模型： ``` llm: provider: ollama model: llama3 base_url: http://localhost:11434 ``` ## 用法 ### CLI ``` # 项目完整扫描 prowl scan /path/to/project # 带详细日志的扫描 prowl -v scan /path/to/project # 仅特定漏洞类别 prowl scan --categories memory,auth,injection /path/to/project # 输出格式：文本（默认）、json、sarif、ai、markdown prowl scan --format json /path/to/project prowl scan --format sarif /path/to/project > results.sarif prowl scan --format ai /path/to/project prowl scan --format markdown /path/to/project # 将报告写入文件 prowl scan --format markdown -o report.md /path/to/project prowl scan --format json -o results.json /path/to/project # 为已确认的发现生成补丁 prowl scan --fix /path/to/project # 恢复中断的扫描 prowl scan --resume /path/to/project # 强制完全重新扫描（忽略缓存） prowl scan --no-cache /path/to/project # 覆盖 PoC 迭代预算 prowl scan --iterations 8 /path/to/project ``` ### 管理发现结果 ``` # 检查扫描状态 prowl status # 列出发现（来自上次扫描） prowl findings prowl findings --severity critical,high prowl findings --category auth,injection # 忽略误报 prowl suppress prowl-sqli-handler.py-84 --reason "input validated in middleware" --scope function # 忽略范围： # finding - 此精确发现 # function - 此函数的所有发现（不受行变更影响） # rule - 此文件中匹配此规则的所有发现 # project - 项目范围内匹配此规则的所有发现 # 报告 Prowl 遗漏的漏洞 prowl missed src/handlers/admin.py:47 --category auth --description "missing admin check on delete endpoint" # 清理持久化扫描状态 prowl clean-state ``` ## 通过 Claw Code 进行 PoC 验证 Layer 3 验证使用 [Claw Code](https://github.com/ultraworkers/claw-code) 作为沙箱中的自主代理。Claw 构建并运行实际目标项目，然后用 crafted inputs 进行测试以确认漏洞真实存在。 **按语言的工作方式：** - **C/C++**：自动检测构建系统（cmake、autotools、meson、make），注入 `ASAN_OPTIONS` 和 `UBSAN_OPTIONS` 标志，编译真实项目，运行二进制文件并检查 ASAN 跟踪是否包含目标函数。 - **Python/Node**：安装依赖（`pip install` / `npm install`），启动真实应用服务器，并向脆弱端点发送 crafted HTTP 请求。 - **Go/Rust/Java**：使用原生工具链构建（`go build -race`、`cargo build`、`mvn package`），然后运行二进制文件并使用 crafted inputs。每种语言都会获得一个包含完整构建工具链的增强型 Docker 镜像。容器以增强资源运行：2 GB RAM、1 GB tmpfs、1024 个 PID、30 分钟超时，最多 50 个 Claw 代理回合。成功时，Claw 写入 `ARGUS_VALIDATED` 标记并保存 `test.sh` 脚本以便复现。 ``` validation: claw_timeout_build: 1800 # 30 minutes for full-project builds claw_max_turns_build: 50 # Claw agent turns for build+test claw_api_key_env: null # auto-detected from LLM config sandbox: mem_limit_build: "2g" # 2 GB for compilation cpu_quota_build: 400000 # double CPU for build phase ``` Claw 容器需要网络访问以调用 LLM API。请参阅 [规范](prowl.md) 了解完整的安全模型。 ## 配置在项目根目录创建 `prowl.yml`。所有字段均为可选。

完整默认值（点击展开）

``` scan: include: [] # paths to scan (default: entire project) exclude: [] # paths to skip languages: [] # auto-detected if omitted project_type: "auto" # "auto", "application", "library" detection_categories: # all 9 enabled by default - auth - data_access - crypto - input - financial - privilege - memory - injection - concurrency reconnaissance: min_likelihood_score: 1.0 # skip functions below this max_review_chunks: 100 # cap targets per scan interaction_targets: true # detect shared-state interaction targets auto_exclude: true # auto-exclude generated/vendored code auto_exclude_override: [] # paths to force-include despite auto-exclude scoring: hypothesis_confidence_threshold: 0.7 batch_confidence_threshold: 0.4 max_promoted_findings: 100 triage: reachability: true chain_analysis: true patch: true # generate remediation patches patch_iterations: 3 validation: enabled: true severity_gate: "high" # generate PoCs for high+ severity only max_exploits: 10 # cap PoC attempts per scan max_iterations_simple: 3 # missing auth, basic injection max_iterations_medium: 5 # business logic, race conditions max_iterations_memory: 5 # ASAN crash confirmation max_iterations_chain: 8 # multi-finding chains instrumentation: - asan - ubsan - coverage # Claw Code PoC validation settings claw_timeout_build: 1800 # 30 minutes for full-project builds claw_max_turns_build: 50 # Claw agent turns for build+test claw_api_key_env: null # API key env var forwarded to container (auto-detected) sandbox: runtime: docker timeout_default: 180 # seconds timeout_race_condition: 720 timeout_max: 1800 timeout_startup: 180 # container startup budget mem_limit: "512m" mem_limit_build: "2g" # 2 GB for compilation cpu_quota: 200000 # 2 CPU cores cpu_quota_build: 400000 # double CPU for build phase pids_limit: 256 network: none # no network egress tier3_services: [postgres, mysql, redis] concurrency: max_concurrent_hypotheses: 8 max_concurrent_triage: 4 max_concurrent_validations: 2 budget: max_tokens_per_scan: null # null = unlimited max_cost_per_scan: null layer3_budget_fraction: 0.4 cache: enabled: true invalidation: "interface" # "interface" (scoped) or "any_change" cross_cutting_invalidation: true output: format: "text" # text, json, sarif, ai, markdown include_poc: true include_reasoning: false resume: enabled: true state_dir: ".prowl/scan-state" llm: provider: "anthropic" # openai | anthropic | google | ollama model: "claude-opus-4-6" api_key_env: null # auto-detected per provider base_url: null # for ollama/vLLM temperature: 0.0 hypothesis: # per-layer overrides (all optional) provider: null model: null temperature: null max_tokens: null triage: provider: null model: null temperature: null max_tokens: null validation: provider: null model: null temperature: null max_tokens: null ```

### 自定义规约通过在 `.prowl/rubrics/` 中放置 YAML 文件添加自定义检测规则： ``` # .prowl/rubrics/custom-ssrf.yml category: injection detection_rules: - name: internal_ssrf instruction: "Check if any URL parameter is used to make server-side HTTP requests without restricting the target to allowed hosts." calibration: test_cases: - file: "tests/vulns/ssrf_vulnerable.py" function: "fetch_url" expected: true - file: "tests/vulns/ssrf_safe.py" function: "fetch_url" expected: false ``` 自定义规约扩展内置规约 —— 不会替换它们。 ## 输出格式 ### 文本（默认）带严重性排序的发现、攻击场景和 PoC 验证状态的可读终端输出。 ### JSON 通过 `--format json` 提供完整的结构化报告。包含所有发现字段、链分析、扫描进度和预算使用情况。 ### SARIF 2.1.0 通过 `--format sarif` 提供的标准静态分析格式。兼容 VS Code、GitHub Code Scanning、Defect Dojo 及其他 SARIF 消费者。每个发现映射为包含规则 ID、严重级别、位置及属性中 PoC 的 SARIF 结果。 ### AI 通过 `--format ai` 提供为下游 LLM 代理消费而结构化的内容。包含自然语言攻击叙述和每个发现的 actionable remediation 上下文。 ### Markdown 通过 `--format markdown` 提供的详细自包含报告。设计用于与安全团队共享并确保可复现。每个发现包括： - 严重性、分类、置信度和位置元数据 - 完整描述、攻击场景和分析推理 - **带语言高亮的 fenced code blocks 中的完整 PoC 源代码** - **针对漏洞类别定制的逐步复现说明**（例如内存错误的 ASAN 编译标志） - **沙箱验证运行的 stdout/stderr 执行输出** - **内存安全错误的 sanitizer 输出（含违规类型和细节）** - **可用的建议补丁** - 发现之间的攻击链关系使用 `-o` 写入文件： ``` prowl scan --format markdown -o report.md /path/to/project ``` ## 项目状态 Prowl 在项目根目录的 `.prowl/` 中存储运行时状态： | 路径 | 用途 | VCS | |------|---------|-----| | `.prowl/suppressions.json` | 忽略的发现 | 提交（团队共享） | | `.prowl/missed.json` | 报告的假阴性 | 提交（团队共享） | | `.prowl/cache/` | LLM 结果缓存 | Gitignore | | `.prowl/scan-state/` | 用于恢复的进行中的扫描状态 | Gitignore | | `.prowl/calibration/` | 置信度校准数据 | Gitignore | ## 漏洞类别 | 类别 | 权重 | Prowl 查找内容 | |----------|--------|---------------------| | `auth` | 1.5 | 缺失检查、破损的访问控制、权限提升、会话固定 | | `data_access` | 1.0 | 未限定范围的查询、IDOR、SQL 注入 | | `input` | 1.0 | 类型混淆、缺失验证、不安全反序列化 | |crypto` | 1.2 | 弱随机性、错误算法、时序侧信道 | | `financial` | 1.3 | 无效状态转移、双重花费、缺少幂等性 | | `privilege` | 1.4 | 不完整的权限释放、TOCTOU 在权限边界 | | `memory` | 1.5 | 缓冲区溢出、use-after-free、整数溢出、格式字符串 | | `injection` | 1.5 | 命令注入、SSTI、LDAP 注入、头注入 | | `concurrency` | 1.0 | 竞争条件、TOCTOU、双重获取 | ## 支持的语言 Prowl 使用 tree-sitter 解析并支持： Python、JavaScript、TypeScript、TSX、Java、Go、Rust、C、C++、Ruby、PHP 语言根据文件扩展名自动检测。可通过 `scan.languages` 在配置中覆盖。 ## 开发 ### 运行测试 ``` # 全部测试（385 个测试，约 1 秒） pytest tests/ -v --ignore=tests/fixtures # 特定模块 pytest tests/test_recon/ -v # 单个测试类 pytest tests/test_recon/test_signals.py::TestSqlInjectionSignals -v # 启用覆盖率 coverage run -m pytest tests/ && coverage report ``` ### 代码检查 ``` ruff check src/prowl/ ruff check src/prowl/ --fix # auto-fix ``` ### 项目结构 ``` src/prowl/ models/ # Pydantic v2 data models (core, scan, context, finding, etc.) recon/ # Reconnaissance: parsing, extraction, signals, scoring, call graph context_builder/ # Context assembly for each layer, framework/sanitizer detection rubrics/ # YAML detection/triage/exploit rubrics (28 files, 9 categories) hypothesis/ # Layer 1: parallel hypothesis generation + confidence gating triage/ # Layer 2: classification, chain analysis validation/ # Layer 3: Claw Code agentic PoC generation + sandbox execution sandbox/ # Docker container lifecycle, security policy, instrumentation llm/ # LangChain multi-provider LLM client, schema validation, retry, budget, calibration cache/ # Content-addressed cache with cross-cutting invalidation pipeline/ # Orchestrator, resume, concurrency management suppression/ # False positive/negative management output/ # Text, JSON, SARIF 2.1.0, AI, Markdown output formats cli.py # Click CLI config.py # prowl.yml loading tests/ fixtures/ # Intentionally vulnerable codebases (Python, C, Node.js) test_recon/ # Parser, exclusions, extractor, signals, scorer, call graph test_context_builder/ test_hypothesis/ test_triage/ test_cache/ test_suppression/ test_output/ test_pipeline/ # Integration with mocked LLM + sandbox test_llm/ # LangChain client, provider routing, config test_sandbox/ # Docker sandbox manager, image build, instrumentation test_validation/ # Claw backend, result checking, patch generation test_integration/ ``` ## 限制 - **无内核模式验证。** Docker 沙箱运行用户空间代码。内核漏洞进行 Layer 1-2 分析但不提供 PoC 确认。 - **无跨服务链。** 分析在单个仓库内完成。 - **无跨语言污点追踪。** 多语言项目（Python 通过 FFI 调用 C）各语言独立分析。 - **调用图近似。** 动态调度、回调和元编程会产生缺口。LLM 利用上下文补偿，但精度因语言而异。 - **非 CI 网关。** Prowl 是研究工具，不是 linter。CI 流水线应使用确定性工具。 ## 规范完整规范位于 [`prowl.md`](prowl.md)。

标签：API调用序列, ASAN, C2, CI安全, Claw Code, Docker沙箱, HTTP请求测试, IPv6支持, LLM推理, Maven, Web安全, 代码库分析, 内存安全, 多漏洞链, 安全扫描, 并发漏洞, 开源安全工具, 时序注入, 未授权访问, 树形语法分析, 注入漏洞, 漏洞验证, 状态转换漏洞, 环境变量配置, 确定性侦察, 缓存机制, 自主漏洞发现, 自动化渗透测试, 蓝队分析, 规则约束, 语义意图分析, 请求拦截, 运行二进制验证, 退款绕过, 逆向工具, 逆向工程平台