cargopete/bulwark

GitHub: cargopete/bulwark

Stars: 0 | Forks: 0

# Bulwark Multi-pass, multi-agent smart contract audit pipeline. Rust CLI + Docker container with Slither, Forge, Halmos, Claude Code, and 70 AI audit skills. ## Quick Start ### 1. Set your API key # .env (gitignored) ANTHROPIC_API_KEY=sk-ant-... ### 2. Build and run docker compose build docker compose up The pipeline starts automatically. To interact with the container while it runs: docker exec -it bulwark bash ### 3. Manual control inside the container bulwark run # Full 6-pass pipeline bulwark run --pass 1 # Recon only (no AI, no auth needed) bulwark run --pass 1-3 # Recon + Agents + PoC Gate bulwark run --pass 2 --agent red # Single agent bulwark status # Which passes completed bulwark findings # List findings with severity bulwark findings --severity high bulwark report # Regenerate final report bulwark doctor # Check tool availability ## Pipeline Pass 1: Reconnaissance ──────── Deterministic (Slither, Forge, Rust) ~1 min │ Pass 2: Multi-Agent Analysis ── 3× parallel Claude (RED/BLUE/GOLD) ~16 min │ Pass 3: PoC Gate ────────────── "No PoC, no finding" validation (Forge) ~5–15 min │ Pass 4: Fuzzing Campaign ────── Foundry invariant tests (AI-generated) ~8 min │ Pass 5: Formal Verification ─── Halmos bounded model checking ~12 min │ Pass 6: Adversarial Review ──── Fresh Claude session challenges all ~3 min │ └──► final-report.md + final-report.json ### Pass 1: Reconnaissance Deterministic, no AI required. Produces structured JSON consumed by all later passes. - Compiles all in-scope packages (`forge build`) - Slither static analysis — H/M/L severity triage - Maps all external/public state-changing entry points - Extracts storage layouts via `forge inspect --json` - Builds inheritance/dependency graph - Enumerates access control modifiers and roles - Inventories arithmetic operations (division, multiplication) - Identifies proxy patterns - Optional AI-assisted vulnerability scan (`/tob-scv-scan`) **On Graph Protocol:** 55 entry points across 5 contracts, H:28 M:75 L:48 from Slither. ### Pass 2: Multi-Agent Adversarial Analysis Three independent Claude Code sessions run in parallel. Agents cannot see each other's output. Each reads Pass 1 recon data, context files, and source code. | Agent | Persona | Focus | |-------|---------|-------| | RED | Attacker | Exploits that steal funds. Rewarded per critical finding. | | BLUE | Systematic verifier | Verify/refute each property in PROPERTIES.md. | | GOLD | DeFi economist | Rounding errors, MEV, flash loans. Numbers required. | After completion, findings are merged and deduplicated. Severity disagreements are tracked. Variant analysis (`/tob-variant-analysis`) runs on all high/critical findings to find similar patterns. ### Pass 3: PoC Gate Every finding must earn a validated proof-of-concept or it is discarded. 1. False-positive check (`/tob-fp-check`) filters obvious FPs before spending PoC budget 2. Claude generates a Foundry test (`[PASS]` = attack succeeded convention) 3. `forge build` — must compile; errors fed back for retry (up to 2 retries) 4. `forge test` — `[PASS]` = validated, `[FAIL]` = inconclusive 5. Inconclusive findings get one more retry with full test output as feedback 6. Findings that exhaust retries are discarded; inconclusive High/Critical capped to Medium ### Pass 4: Fuzzing Campaign Claude (Sonnet) generates Foundry invariant tests from PROPERTIES.md. No bash access — Claude writes files only, the pipeline handles compilation. Generated files are sanitised (unicode quotes replaced with ASCII) before compilation to prevent Solidity parse errors. Tests run with `forge test --match-test invariant_` at 10,000 fuzz runs each. Missing remappings are auto-detected from build errors and patched before compilation. Medusa and Echidna integration is wired up but not yet installed in the container. **On Graph Protocol:** 5 test files, 28 invariant functions generated, 14 passing. ### Pass 5: Formal Verification Claude (Sonnet) generates symbolic tests for critical properties. Tests run in an isolated directory (import-free pure arithmetic contracts) so Halmos doesn't fight the full project. Halmos runs bounded model checking (`--loop 5`, `--solver-timeout 300s`) per property: | Result | Meaning | |--------|---------| | VERIFIED | No counterexample exists within the loop bound | | VIOLATED | Concrete counterexample found — genuine bug | | TIMEOUT | Solver budget exhausted — property may hold but Z3 couldn't close it | **On Graph Protocol:** P-10, P-15, P-16, P-19 VERIFIED. P-1 TIMEOUT (cross-multiplication arithmetic exceeds Z3's budget at this bound — not a finding). ### Pass 6: Adversarial Review ## Results on The Graph Protocol (Horizon) | Finding | Severity | Source | Status | |---------|----------|--------|--------| | Service provider front-runs slash by thawing tokens (P-10 violation) | Critical | Pass 2 | PoC validated | The slash front-run allows a service provider to call `thaw()` before a slash transaction lands, reducing their slashable provision and forcing delegators to absorb losses that should fall on the provider. P-10 (provider-first slash ordering) is formally verified to hold in the pure arithmetic model — the violation is at the protocol interaction level, not the accounting math. ## Model Configuration model = "haiku" # Global default (cheapest, used for most passes) [passes.fuzzing] model = "sonnet" # Better at generating compilable Solidity tests [passes.formal] model = "sonnet" # Better at generating symbolic test structure Swap `haiku` → `sonnet` globally for higher-quality agent analysis at ~4× the cost. ## Installed AI Skills The container auto-installs ~70 audit skills at startup from three sources: | Source | Count | What | |--------|-------|------| | [Trail of Bits skills](https://github.com/trailofbits/skills) | 36 | fp-check, variant-analysis, entry-point-analyzer, etc. | | [Trail of Bits skills-curated](https://github.com/trailofbits/skills-curated) | 28 | scv-scan (36 Solidity vuln classes), and more | | [forefy/.context](https://github.com/forefy/.context) | 6 | smart-contract-security-audit, foundry-poc, etc. | ### Pipeline-integrated skills | Skill | Where | Purpose | |-------|-------|---------| | `/tob-scv-scan` | Pass 1 (post-Slither) | 36-class vulnerability scan | | `/tob-fp-check` | Pass 3 (pre-PoC) | False positive gate | | `/tob-variant-analysis` | Pass 2 (post-merge) | Pattern search for high/critical findings | | `/tob-scv-scan` | RED + GOLD agents | Structured scan before manual analysis | | `/tob-fp-check` | BLUE agent | Self-challenge on potential violations | | `/tob-token-integration-analyzer` | GOLD agent | Token-handling edge case detection | | `/tob-spec-to-code-compliance` | BLUE agent | Cross-check property vs implementation | | `/tob-variant-analysis` | RED agent | Post-analysis variant search | All integrations degrade gracefully — if a skill is not installed, the pipeline continues. ## Context Files Pre-populated for The Graph Protocol in `context/`: - **AUDIT_CONTEXT.md** — Protocol overview, deployment, trust model, economic parameters - **PROPERTIES.md** — 22 security invariants (P-1 through P-22) - **KNOWN_ISSUES.md** — 4 accepted risks, 5 fixed issues, 3 focus areas - **ATTACK_PATTERNS.md** — 10 known patterns from previous audits and bounties Copied into the audit directory at container startup. Replace with your own context for a new target. ## Output Structure audit-workspace/ ├── recon/ # Pass 1 │ ├── entry-points.json │ ├── storage-layouts.json │ ├── slither-results.json │ ├── dependency-graph.json │ ├── math-operations.json │ └── access-control.json ├── findings/ # Pass 2 │ ├── red-agent-raw.json │ ├── blue-agent-raw.json │ ├── gold-agent-raw.json │ ├── merged-deduplicated.json │ └── variant-analysis.json ├── pocs/ # Pass 3 │ ├── F-001.t.sol │ ├── validated-findings.json │ └── discarded-findings.json ├── fuzzing/ # Pass 4 │ ├── invariant-tests/ │ ├── fuzzing-campaign-results/ │ └── fuzzing-findings.json ├── formal/ # Pass 5 │ ├── Symbolic*.t.sol │ ├── verification-summary.json │ └── formal-findings.json ├── review/ # Pass 6 │ └── adversarial-review.json ├── final-report.md ├── final-report.json └── pipeline-status.json Reports are also exported to `./reports/` on the host after each run. ## Configuration Reference `bulwark.toml` controls everything: [target] repo = "https://github.com/graphprotocol/contracts.git" branch = "main" scope = ["packages/horizon", "packages/subgraph-service"] core_contracts = ["HorizonStaking", "GraphPayments", "PaymentsEscrow"] model = "haiku" # Global default; "sonnet" or "opus" for more depth [passes.recon] scv_scan = true scv_scan_max_turns = 20 [passes.agents] max_turns = 80 agents = ["red", "blue", "gold"] timeout_minutes = 60 variant_analysis = true [passes.poc] max_turns = 30 max_retries = 2 fp_check = true [passes.fuzzing] fuzz_runs = 10_000 max_turns = 40 model = "sonnet" [passes.formal] solver_timeout = 300 loop_bound = 5 target_properties = ["P-1", "P-10", "P-15", "P-16", "P-19"] max_turns = 30 model = "sonnet" [passes.review] max_turns = 60 ## Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `ANTHROPIC_API_KEY` | For AI passes | Or authenticate via `bulwark login` inside the container | | `AUDIT_TARGET` | No | Override git repo URL | | `AUDIT_BRANCH` | No | Override branch | ## Development cargo check # Type check cargo test # Unit tests cargo clippy # Lint cargo build --release # Build release binary ## Running as a service [bulwark-cloud](https://github.com/cargopete/bulwark-cloud) wraps this CLI in an AWS-native service — submit audits via HTTPS, results stored in S3 and DynamoDB, pipeline runs on ECS Fargate. No local Docker required. POST /v1/audits { repo, branch, scope, model } → { job_id, status: PENDING } GET /v1/audits/{job_id} → { status: COMPLETED, findings_count: {CRITICAL: 1, HIGH: 3} } GET /v1/audits/{job_id}/report?format=md → { url: "https://s3.../signed-url" } ## License [MIT](LICENSE)
标签:通知系统