sane100400/Blitz-Bounty-Agent

GitHub: sane100400/Blitz-Bounty-Agent

基于 Claude Code 的智能合约安全审计多智能体系统，支持审计竞赛和漏洞赏金的全流程自动化。

Stars: 0 | Forks: 0

# Blitz Bounty Agent 这是一个基于 Claude Code 的智能合约安全代理。该仓库提供三种运行路径： - 基于 Claude Code 斜杠命令的交互式挖掘 - 基于 `claude -p` 的无头循环执行 - 由 Python recon + specialist agent + merger 组成的多智能体编排此外还包含基于 EVMBench 的 detect / patch / ablation / visualization 脚本。 ## 当前实现摘要 - 主要运行路径可以在没有 `ANTHROPIC_API_KEY` 的情况下运行。如果 Claude Code CLI 处于已登录的订阅会话，repo-local runner 将通过 `claude -p` 执行。 - 多智能体编排器会在调用 LLM 之前先用 Python 清理代码库。 - EVMBench detect runner 会克隆审计源并自动生成 `.claudeignore` 以减少读取范围。 - judge 默认使用 `claude-haiku-4-5` 并保留缓存。 - quota/login/auth 问题在通用的 `claude_cli.py` 中进行 fail-fast 处理。优化内容整理在 [docs/optimization-ledger.md](/home/sane100400/projects/Blitz-Bounty-Agent/docs/optimization-ledger.md)，研究定位整理在 [docs/research-positioning.md](/home/sane100400/projects/Blitz-Bounty-Agent/docs/research-positioning.md)。 ## 认证与运行路径该仓库有两个运行路径。 1. `repo-local Blitz path` `/web3-hunt`、`/web3-loop`、`hunt_loop.py`、`audit_loop.py`、`audit_orchestrator.py`、[`benchmark/evmbench_skill_runner.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_skill_runner.py)、[`benchmark/evmbench_patch_runner.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_patch_runner.py) 使用 `claude -p`。如果 Claude Code CLI 以有效会话登录，则可以在没有 `ANTHROPIC_API_KEY` 的情况下运行。 2. `official upstream EVMBench path` [`benchmark/evmbench_setup.sh`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_setup.sh) 是用于配置上游 EVMBench 运行环境的独立路径。该路径根据 solver 配置可能需要 API 凭证。可以通过以下命令立即检查基于订阅的运行是否可用： ``` # 检查 Claude CLI 是否存在及 auth 相关 env 状态 python3 benchmark/claude_subscription_check.py # 实际 `claude -p` 最小 probe python3 benchmark/claude_subscription_check.py --probe ``` 如果显示 `usage_exhausted`，则不是登录问题，而是当前订阅配额已用尽。当前 runner 会将此状态立即作为失败处理，而不会错误地保存为 `0 recall / $0` 之类的结果。 ## 架构 ``` 사용자 / CI / cron ├─ 인터랙티브: /web3-hunt, /web3-loop ├─ 헤드리스 루프: hunt_loop.py, audit_loop.py └─ 멀티에이전트: audit_orchestrator.py 공유 흐름 1. Recon 2. File Sweep 3. Cross-Compare 4. Attack Surface 5. Triage 6. PoC 7. Report ``` 多智能体路径的组成如下： ``` Phase 1: Pure Python Recon - in-scope .sol 파일 수집 - 파일명 기반 pattern group 분류 - 외부 프로토콜 키워드 탐지 Phase 2: 4-6 Specialist Agents - TVL / accounting - position lifecycle - access control + fund flow - external protocol semantics - core logic Phase 3: Merger Agent - specialist 결과의 FINDING 블록만 취합 - 중복 제거 - triage - 최종 report 생성 ``` 当前的默认模型配置如下： - deep model: `claude-opus-4-6` - fast model: `claude-sonnet-4-6` - judge model: `claude-haiku-4-5` 当前的默认预算上限如下： - specialist: `$0.50` - merger: `$0.80` ## 运行模式 ### 1. 交互式斜杠命令在 Claude Code 中直接调用的模式。定义位于 `.claude-commands/` 下。主要命令： - `/web3-hunt` - `/web3-loop` 示例： ``` # 审计竞赛 /web3-hunt "https://cantina.xyz/competitions/..." cantina /web3-hunt "https://code4rena.com/audits/..." codearena # Immunefi Bounty /web3-hunt "https://immunefi.com/bug-bounty/balancer" immunefi "https://1rpc.io/eth" # 重复探索循环 /web3-loop "https://cantina.xyz/competitions/..." cantina 10 ``` ### 2. 无头循环不使用 Claude Code UI，通过 `claude -p` 子进程重复执行的模式。 `hunt_loop.py` - 用于 Immunefi 漏洞赏金 - 累积 drop history、lesson、PoC failure - 解析 `HUNT_SIGNAL:*` 信号 ``` python3 hunt_loop.py "https://immunefi.com/bug-bounty/balancer" "https://1rpc.io/eth" 10 ``` `audit_loop.py` - 用于审计竞赛 - 生成 recon、candidate triage、report 文件 - 解析 `AUDIT_SIGNAL:*` 信号 ``` python3 audit_loop.py "https://audits.sherlock.xyz/contests/123" sherlock 12 ``` ### 3. 多智能体编排 `audit_orchestrator.py` 先用 Python 分析代码库，然后并行执行 specialist。 ``` python3 audit_orchestrator.py ./protocol-source --platform codearena --timeout 900 ``` 主要选项： - `--deep-model` - `--fast-model` - `--specialist-budget` - `--merger-budget` - `--merge-timeout` - `--benchmark` - `--audit-id` ## 当前已实现的优化以下是本仓库实际运行路径中反映的优化： 1. `Python recon before LLM` `audit_orchestrator.py` 先生成文件映射、pattern group 和外部协议提示，然后构建后续 prompt。 2. `scope pruning with .claudeignore` EVMBench runner 为每个克隆的审计源自动生成 `.claudeignore`，将 `lib/`、`node_modules/`、`out/`、`cache/` 和测试文件等排除在 Claude 读取范围之外。 3. `specialist decomposition` 不将所有分析放入一个 prompt，而是按角色分成多个 prompt。 4. `model tiering` 语义密集型步骤可用 Opus 运行，模式密集型步骤可用 Sonnet 运行。 5. `budget caps` 可以在 per specialist、merger、per audit、per profile、total ablation run 等级别设置上限。 6. `judge cache` LLM-as-Judge 使用 Haiku 默认值和缓存。 7. `fail-fast CLI handling` 统一检测 `usage_exhausted`、`login_required`、`invalid_api_key`、`low_credit`。详细依据和权衡可在 [docs/optimization-ledger.md](/home/sane100400/projects/Blitz-Bounty-Agent/docs/optimization-ledger.md) 中查看。 ## 基准测试本仓库的主要基准测试路径是 EVMBench 包装器。 ### 1. Detect benchmark [`benchmark/evmbench_skill_runner.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_skill_runner.py) 是基于 `/web3-hunt` 的 detect 评估包装器。功能： - 选择 EVMBench audit - 克隆源代码仓库 / 检出基础提交 - 生成 `.claudeignore` - 用 `claude -p` 执行 `/web3-hunt` - 保存原始输出 - 与 ground truth 比较并计算得分支持功能： - 模型别名：`opus`、`sonnet`、`haiku` - pre/post cutoff 分离 - `--max-budget` - `--max-total-cost` - `--no-judge` - `--rescore` - `--compare` - `--dry-run` 示例： ``` # 单一 audit python3 benchmark/evmbench_skill_runner.py --audit 2026-01-tempo-feeamm # 运行部分 detect split python3 benchmark/evmbench_skill_runner.py --split detect-tasks --limit 5 # 预算 cap + 省略 judge python3 benchmark/evmbench_skill_runner.py \ --audit 2024-05-munchables \ --timeout 600 \ --max-budget 0.80 \ --no-judge # 对保存的结果重新评分 python3 benchmark/evmbench_skill_runner.py --split detect-tasks --limit 5 --rescore # 比较最近两次 run python3 benchmark/evmbench_skill_runner.py --compare # 仅检查选定的 audit 而不运行 python3 benchmark/evmbench_skill_runner.py --split detect-tasks --dry-run ``` detect runner 的主要选项： | 选项 | 含义 | |---|---| | `--split` | `detect-tasks`, `patch-tasks`, `exploit-tasks`, `debug` | | `--audit` | 单个 audit ID | | `--model` | 模型 ID 或别名 | | `--limit` | 限制 audit 数量 | | `--timeout` | 每个 audit 的超时时间 | | `--max-budget` | 每个 audit 的费用上限 | | `--max-total-cost` | 整体运行的费用上限 | | `--no-judge` | 仅使用 identifier matching 而非 Haiku judge | | `--rescore` | 不重新运行 agent，对保存的结果重新评分 | | `--compare` | 比较最近两次运行 | | `--post-cutoff` | 仅选择 cutoff 之后的 audit | | `--pre-cutoff` | 仅选择 cutoff 之前的 audit | | `--dry-run` | 不执行，仅输出选中的 audit | ### 2. Patch benchmark [`benchmark/evmbench_patch_runner.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_patch_runner.py) 是用于 patch task 的 runner。按当前实现标准： - 仅评估基于 foundry 的 patch task - agent 必须修改实际源文件 - 必须通过 `forge build` - 必须保持现有测试 - oracle exploit test 必须失败示例： ``` # 仅运行 post-cutoff patch audit python3 benchmark/evmbench_patch_runner.py --post-cutoff # 单一 audit python3 benchmark/evmbench_patch_runner.py --audit 2026-01-tempo-feeamm # 完整 foundry patch audit python3 benchmark/evmbench_patch_runner.py --all # dry run python3 benchmark/evmbench_patch_runner.py --post-cutoff --dry-run ``` ### 3. 成本 / 性能 ablation [`benchmark/evmbench_ablation.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/evmbench_ablation.py) 用多个执行 profile 运行相同的 audit slice，比较 `recall`、`cost`、`$/detect`、`detect/$`。当前 profile： - `single_opus` - `single_sonnet` - `single_opus_cap_050` - `single_opus_cap_080` - `hybrid_orchestrated` - `all_opus_orchestrated` - `all_sonnet_orchestrated` 示例： ``` # 从最小的 post-cutoff slice 开始比较 python3 benchmark/evmbench_ablation.py \ --audits 2026-01-tempo-feeamm,2026-01-tempo-mpp-streams,2026-01-tempo-stablecoin-dex \ --profiles single_sonnet,single_opus,hybrid_orchestrated \ --no-judge \ --max-profile-cost 2.00 \ --max-total-cost 4.00 ``` 建议执行顺序： 1. `single_sonnet` 2. `single_opus` 3. 仅在需要时使用 `hybrid_orchestrated` 4. 第一轮 pass 使用 `--no-judge` 5. 之后仅对良好结果使用 `--rescore` 或重新运行 judge ### 4. 可视化 [`benchmark/visualize_ablation.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/visualize_ablation.py) 从存储的 ablation JSON 生成 HTML 仪表板。 - 不重新运行 benchmark - 不使用额外 token - 汇总表 - cost vs recall 散点图 - cost per detect 柱状图示例： ``` # 基于最近的 ablation JSON 生成 HTML python3 benchmark/visualize_ablation.py # 指定输入/输出路径 python3 benchmark/visualize_ablation.py \ --input benchmark/results/ablation/ablation_YYYYMMDD_HHMMSS.json \ --output benchmark/results/ablation/dashboard.html ``` ## 评分方式 detect benchmark 使用两种路径。 1. `LLM-as-Judge` 默认 judge 模型为 `claude-haiku-4-5`，结果缓存于 `benchmark/results/judge_cache/`。 2. `identifier fallback` 如果关闭 judge，则使用基于文件名、函数名、title word overlap 的确定性回退。重点： - judge 为降低评估成本，默认使用较小模型。 - detect runner 可以对存储的原始输出重新评分，因此即使更改评分逻辑也无需重新运行 agent。 ## 实验用 custom suite [`benchmark/run.py`](/home/sane100400/projects/Blitz-Bounty-Agent/benchmark/run.py) 是自定义 suite runner，但目前不是 `primary benchmark`。当前状态： - 仅存在 `benchmark/suites/known-vulns.yaml` - 默认 case 为空 - precision / severity / cost 有部分 TODO 尚未完成即该路径用于实验，主要比较应以 EVMBench 包装器为准。 ``` python3 benchmark/run.py --suite known-vulns python3 benchmark/run.py --compare ``` ## 目录结构 ``` Blitz-Bounty-Agent/ ├── .claude-commands/ 슬래시 커맨드 정의 ├── hunt_loop.py 헤드리스 Immunefi 루프 ├── audit_loop.py 헤드리스 감사 대회 루프 ├── audit_orchestrator.py 멀티에이전트 오케스트레이터 ├── claude_cli.py 공통 Claude CLI wrapper / fail-fast 처리 ├── benchmark/ │ ├── evmbench_skill_runner.py detect benchmark wrapper │ ├── evmbench_patch_runner.py patch benchmark wrapper │ ├── evmbench_ablation.py cost/performance ablation │ ├── visualize_ablation.py ablation HTML 시각화 │ ├── claude_subscription_check.py subscription-backed 실행 확인 │ ├── llm_judge.py judge + cache │ ├── run.py 실험용 custom suite runner │ └── config.yaml 모델 / cutoff / scoring 설정 ├── docs/ │ ├── optimization-ledger.md │ ├── research-positioning.md │ └── evmbench-ablation-plan.md ├── evmbench-sources/ clone된 audit source ├── evmbench-upstream/ upstream EVMBench data └── audit-reports/ 생성된 report 출력 ``` ## 安装 ### 需求 - Claude Code CLI - Python 3.10+ - `PyYAML` - Foundry (`forge`, `cast`) - 可选：`gh` CLI - 可选：Docker + `uv`（使用上游 EVMBench 路径时） ### 安装示例 ``` git clone https://github.com/sane100400/Blitz-Bounty-Agent.git cd Blitz-Bounty-Agent # 确认是否 subscription-backed 运行 python3 benchmark/claude_subscription_check.py # 如需 upstream EVMBench 环境则单独设置 bash benchmark/evmbench_setup.sh ``` 在此目录中运行 Claude Code 即可自动使用 `.claude-commands/` 下的命令。 ## 当前文档原则本 README 主要说明当前已实现的功能和实际可运行的路径。 - 故意未加入静态成本估算表。实际成本会根据 audit 大小、quota 状态、cache hit 情况、budget cap 而大幅变化。 - 正文中也去掉了静态 benchmark 性能表。最新数值应以 `benchmark/results/` 中保存的 artifact 为准进行确认或重新运行。 - 故意将 repo-local path 和 upstream EVMBench path 分开说明。 ## 许可证 MIT

标签：AI代理, Bug Bounty, Claude, CVE检测, DeFi安全, DLL 劫持, EVM, EVMBench, Immunefi, Python, Web3安全, 云安全监控, 区块链审计, 可自定义解析器, 多智能体编排, 大语言模型, 安全研究员, 对称加密, 文档安全, 无后门, 智能合约安全, 智能合约漏洞, 网络安全研究, 自动修复, 自动化审计, 请求拦截, 逆向工具, 配置审计, 重入攻击, 闪电贷攻击, 静态分析, 黑客攻防