devpedrois/crabwalk

GitHub: devpedrois/crabwalk

Stars: 0 | Forks: 0

# Crabwalk [![Build](https://img.shields.io/badge/build-passing-brightgreen)](#) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE) [![Rust Version](https://img.shields.io/badge/rust-1.75%2B-orange)](https://www.rust-lang.org) [![Security Auditing](https://img.shields.io/badge/security-defense--in--depth-red)](#security) **File system snapshot differ for security auditing and deploy verification.** Crabwalk takes point-in-time snapshots of directories (BLAKE3 hash, permissions, timestamps, ownership) and compares them to show exactly what changed. Built for infrastructure security teams, sysadmins, and compliance workflows. crabwalk snap /etc --output baseline.json # ... deploy, incident, or time passes ... crabwalk snap /etc --output current.json crabwalk diff baseline.json current.json --format table Crabwalk Diff Report Summary: Added: 1 | Removed: 1 | Modified: 2 | Metadata: 3 | Total: 7 ┌──────────────────┬─────────────────────────┬────────────────────────────────┐ │ Status │ Path │ Details │ ├──────────────────┼─────────────────────────┼────────────────────────────────┤ │ ADDED │ etc/cron.d/backdoor │ │ │ REMOVED │ etc/app.conf │ │ │ MODIFIED │ bin/sshd │ hash: a1b2c3... → f4e5d6... │ │ META │ etc/sudoers │ permissions: 0440 → 0777 │ └──────────────────┴─────────────────────────┴────────────────────────────────┘ ## Table of Contents - [Use Cases](#use-cases) - [Requirements](#requirements) - [Installation](#installation) - [Quick Start](#quick-start) - [Commands](#commands) - [Ignore Patterns](#ignore-patterns) - [Output Formats](#output-formats) - [Architecture](#architecture) - [Security](#security) - [Performance](#performance) - [Contributing](#contributing) - [License](#license) ## Use Cases - **Deploy verification** - compare filesystem state before and after a release - **Security auditing** - detect unauthorized changes to binaries, configs, or permissions - **Drift detection** - spot configuration drift between environments or over time - **Compliance evidence** - persist deterministic, reproducible audit snapshots as artifacts - **Incident response** - establish exactly what changed and when during an incident ## Requirements - Rust 1.75 or newer - Cargo (comes with Rust) Install Rust via [rustup](https://rustup.rs): curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ## Installation ### From source git clone https://github.com/devpedrois/crabwalk cd crabwalk cargo build --release Binary will be at `target/release/crabwalk`. Copy it wherever you need: cp target/release/crabwalk /usr/local/bin/crabwalk ### Run without installing cargo run -- snap /etc --output baseline.json cargo run -- diff baseline.json current.json ## Quick Start **1. Snapshot a directory:** crabwalk snap /etc --output baseline.json **2. Make changes, snapshot again:** crabwalk snap /etc --output current.json **3. Diff the two snapshots:** crabwalk diff baseline.json current.json --format table **4. Export as HTML report:** crabwalk diff baseline.json current.json --format html --output report.html ## Commands ### `crabwalk snap ` Walk a directory recursively and produce a JSON snapshot with BLAKE3 hashes, permissions, timestamps, and ownership for every entry. crabwalk snap [OPTIONS] OPTIONS: -o, --output Write snapshot to file (default: stdout) --max-size Skip files larger than SIZE (e.g. 100MB, 1GB) --no-ignore Do not load .crabwalkignore files --follow-symlinks Follow symlinks [default: off, see Security] --threads Rayon worker threads (default: number of CPU cores) Examples: # Snapshot /etc, skip files over 10MB crabwalk snap /etc --max-size 10MB --output etc-snap.json # Use 4 threads crabwalk snap /var/www --threads 4 --output www-snap.json # Snapshot without respecting .crabwalkignore crabwalk snap /app --no-ignore --output app-snap.json ### `crabwalk diff ` Compare two snapshot files and report what changed. Categorizes every change as `Added`, `Removed`, `Modified` (content), or `MetadataChanged` (permissions, ownership, mtime). crabwalk diff [OPTIONS] OPTIONS: -f, --format Output format: table, json, html (default: table) -o, --output Write output to file (default: stdout, required for html) --ignore-mtime Ignore mtime-only differences --ignore-metadata Suppress metadata-only changes, show content changes only --only Filter change types: added,removed,modified,metadata Exit codes: | Code | Meaning | |------|---------| | `0` | No changes detected | | `1` | Changes detected | | `2` | Execution or input error | Examples: # Human-readable table in terminal crabwalk diff before.json after.json # JSON output to file crabwalk diff before.json after.json --format json --output diff.json # Show only content changes, ignore permission/mtime drift crabwalk diff before.json after.json --ignore-metadata # Show only added and removed files crabwalk diff before.json after.json --only added,removed # HTML report crabwalk diff before.json after.json --format html --output report.html ## Ignore Patterns Create a `.crabwalkignore` file in the directory you are snapshotting. Uses the same syntax as `.gitignore`, powered by the same `ignore` crate used by ripgrep. # .crabwalkignore # Ignore build artifacts target/ *.log # Ignore version control metadata .git/ # Negation: always include this specific log !important.log # Ignore cache directories **/__pycache__/ **/node_modules/ Pass `--no-ignore` to disable this behavior entirely. ## Output Formats ### Table (default) Color-coded terminal output. Best for interactive use and quick reviews. - Green: `ADDED` - Red: `REMOVED` - Yellow: `MODIFIED` - Cyan: `META` crabwalk diff old.json new.json ### JSON Machine-readable structured output. Useful for piping into other tools, storing as artifacts, or building dashboards. crabwalk diff old.json new.json --format json Output structure: { "old_snapshot": "baseline.json", "new_snapshot": "current.json", "timestamp": "2026-05-20T12:00:00Z", "summary": { "added": 1, "removed": 1, "modified": 2, "metadata_changed": 3, "total": 7 }, "entries": [ { "path": "etc/sudoers", "change_type": "MetadataChanged", "details": ["permissions: 0440 -> 0777"] } ] } ### HTML Self-contained HTML report with summary cards and interactive filters. Requires `--output`. crabwalk diff old.json new.json --format html --output report.html open report.html ## Architecture crabwalk/ ├── Cargo.toml # Dependencies and package metadata ├── .crabwalkignore.example # Example ignore pattern file │ ├── src/ │ ├── main.rs # Entry point: CLI parse -> dispatch to snap/diff │ ├── cli.rs # Clap structs: SnapArgs, DiffArgs, OutputFormat │ │ │ ├── walker.rs # Recursive directory traversal via walkdir │ │ # Applies filters before hashing │ │ # Never follows symlinks outside root │ │ │ ├── hasher.rs # BLAKE3 streaming hash in 64KB chunks via BufReader │ │ # Never loads entire file into memory │ │ # Send + Sync for rayon parallelism │ │ │ ├── snapshot.rs # Snapshot + FileEntry structs with serde │ │ # Entries always sorted by path (determinism) │ │ │ ├── diff.rs # Diff engine: HashMap-indexed comparison │ │ # Categorizes: Added, Removed, Modified, MetadataChanged │ │ │ ├── filter.rs # .crabwalkignore via `ignore` crate │ │ # --max-size enforcement before hashing │ │ # Special file detection and skip │ │ │ ├── output/ │ │ ├── mod.rs # OutputFormatter trait + format dispatch │ │ ├── table.rs # comfy-table + colored terminal output │ │ ├── json.rs # serde_json pretty output to stdout or file │ │ └── html.rs # Askama compiled template -> HTML report │ │ │ └── error.rs # CrabwalkError enum via thiserror │ # Io, Walk, Hash, Snapshot, Diff, Filter, Output │ ├── templates/ │ └── report.html # Askama HTML diff report template │ ├── tests/ │ ├── snap_test.rs # Snapshot determinism, metadata, errors │ ├── diff_test.rs # All change categories │ ├── filter_test.rs # Ignore patterns and max-size │ ├── security_test.rs # Symlinks, special files, path traversal │ └── integration_test.rs # End-to-end: snap -> modify -> snap -> diff │ └── benches/ └── walker_bench.rs # Criterion: sequential vs parallel, various sizes ### Data Flow PATH | v [walker.rs] -- recursive walk, metadata collection | v [filter.rs] -- .crabwalkignore, --max-size, skip special files | v [hasher.rs] -- rayon par_iter: BLAKE3 streaming hash per file | v [snapshot.rs] -- sort by path, serialize to JSON | v (two snapshots) [diff.rs] -- HashMap comparison, categorize changes | v [output/] -- table | json | html ### Key Dependencies | Crate | Version | Purpose | |-------|---------|---------| | `clap` | 4 | CLI parsing with derive macros | | `walkdir` | 2 | Recursive directory traversal | | `blake3` | 1 | Streaming cryptographic hash (faster than SHA-256) | | `rayon` | 1.10 | Data parallelism via `par_iter()` | | `serde` + `serde_json` | 1 | Snapshot serialization | | `ignore` | 0.4 | `.gitignore`-style pattern matching (same as ripgrep) | | `comfy-table` + `colored` | 7 / 2 | Terminal table output with colors | | `askama` | 0.12 | Compile-time type-safe HTML templates | | `thiserror` | 2 | Ergonomic error type derivation | | `chrono` | 0.4 | ISO 8601 timestamps | ## Security Crabwalk is a security auditing tool. It treats the filesystem as hostile by default. ### Defense in Depth | Layer | Threat | Mitigation | |-------|--------|-----------| | 1. Symlink safety | Symlink escape to `/etc/shadow` | Symlinks not followed by default. `--follow-symlinks` is opt-in and stays within root | | 2. Path normalization | Path traversal via crafted paths | All snapshot paths are relative to root, never absolute | | 3. Streaming hash | OOM on large files | BLAKE3 via 64KB `BufReader` chunks. Constant memory for any file size | | 4. Special file protection | Infinite read on `/dev/zero`, blocking on pipes | Char devices, block devices, FIFOs, and sockets are skipped and reported | | 5. Error reporting | Silent failures hiding tampering | Every I/O error is recorded in `FileEntry.error`. `total_errors` in snapshot metadata | | 6. TOCTOU awareness | File changes between walk and hash | Files that vanish mid-scan produce an error entry, never a panic | | 7. Snapshot integrity | Tampered snapshot files | Deterministic JSON ordering. Schema version field for compatibility checks | | 8. Input validation | Malformed paths or options | All CLI inputs validated before processing. Strict serde deserialization | ### What Crabwalk Does Not Do - Does not verify its own binary integrity (use your package manager or a TPM for that) - Snapshots are not cryptographically signed (GPG-sign them yourself if needed) - Does not operate in real-time (it is a point-in-time tool, not a daemon) ## Performance Crabwalk hashes files in parallel using Rayon across all available CPU cores. Memory footprint is constant regardless of file sizes. ### Benchmark Results Reference run on 2026-05-20 (`cargo bench --bench walker_bench`): | Benchmark | Time | |-----------|------| | `snapshot_small / sequential` | ~9.91 ms | | `snapshot_small / parallel` | ~4.46 ms | | `snapshot_medium / sequential` | ~107.06 ms | | `snapshot_medium / parallel` | ~41.21 ms | | `hash_single_file / 1KB` | ~43.36 µs | | `hash_single_file / 100KB` | ~76.67 µs | | `hash_single_file / 1MB` | ~316.56 µs | | `hash_single_file / 10MB` | ~4.13 ms | | `diff_engine / 100 entries` | ~35.75 µs | | `diff_engine / 1000 entries` | ~404.13 µs | | `diff_engine / 10000 entries` | ~6.83 ms | ### Run Benchmarks cargo bench --bench walker_bench Compare sequential vs parallel with Hyperfine: hyperfine \ "crabwalk snap ./target --threads 1 --output /tmp/seq.json" \ "crabwalk snap ./target --threads 8 --output /tmp/par.json" ## License MIT. See [LICENSE](./LICENSE).
标签:通知系统