devpedrois/crabwalk
GitHub: devpedrois/crabwalk
Stars: 0 | Forks: 0
# Crabwalk
[](#)
[](./LICENSE)
[](https://www.rust-lang.org)
[](#security)
**File system snapshot differ for security auditing and deploy verification.**
Crabwalk takes point-in-time snapshots of directories (BLAKE3 hash, permissions, timestamps, ownership) and compares them to show exactly what changed. Built for infrastructure security teams, sysadmins, and compliance workflows.
crabwalk snap /etc --output baseline.json
# ... deploy, incident, or time passes ...
crabwalk snap /etc --output current.json
crabwalk diff baseline.json current.json --format table
Crabwalk Diff Report
Summary: Added: 1 | Removed: 1 | Modified: 2 | Metadata: 3 | Total: 7
┌──────────────────┬─────────────────────────┬────────────────────────────────┐
│ Status │ Path │ Details │
├──────────────────┼─────────────────────────┼────────────────────────────────┤
│ ADDED │ etc/cron.d/backdoor │ │
│ REMOVED │ etc/app.conf │ │
│ MODIFIED │ bin/sshd │ hash: a1b2c3... → f4e5d6... │
│ META │ etc/sudoers │ permissions: 0440 → 0777 │
└──────────────────┴─────────────────────────┴────────────────────────────────┘
## Table of Contents
- [Use Cases](#use-cases)
- [Requirements](#requirements)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Commands](#commands)
- [Ignore Patterns](#ignore-patterns)
- [Output Formats](#output-formats)
- [Architecture](#architecture)
- [Security](#security)
- [Performance](#performance)
- [Contributing](#contributing)
- [License](#license)
## Use Cases
- **Deploy verification** - compare filesystem state before and after a release
- **Security auditing** - detect unauthorized changes to binaries, configs, or permissions
- **Drift detection** - spot configuration drift between environments or over time
- **Compliance evidence** - persist deterministic, reproducible audit snapshots as artifacts
- **Incident response** - establish exactly what changed and when during an incident
## Requirements
- Rust 1.75 or newer
- Cargo (comes with Rust)
Install Rust via [rustup](https://rustup.rs):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
## Installation
### From source
git clone https://github.com/devpedrois/crabwalk
cd crabwalk
cargo build --release
Binary will be at `target/release/crabwalk`. Copy it wherever you need:
cp target/release/crabwalk /usr/local/bin/crabwalk
### Run without installing
cargo run -- snap /etc --output baseline.json
cargo run -- diff baseline.json current.json
## Quick Start
**1. Snapshot a directory:**
crabwalk snap /etc --output baseline.json
**2. Make changes, snapshot again:**
crabwalk snap /etc --output current.json
**3. Diff the two snapshots:**
crabwalk diff baseline.json current.json --format table
**4. Export as HTML report:**
crabwalk diff baseline.json current.json --format html --output report.html
## Commands
### `crabwalk snap `
Walk a directory recursively and produce a JSON snapshot with BLAKE3 hashes, permissions, timestamps, and ownership for every entry.
crabwalk snap [OPTIONS]
OPTIONS:
-o, --output Write snapshot to file (default: stdout)
--max-size Skip files larger than SIZE (e.g. 100MB, 1GB)
--no-ignore Do not load .crabwalkignore files
--follow-symlinks Follow symlinks [default: off, see Security]
--threads Rayon worker threads (default: number of CPU cores)
Examples:
# Snapshot /etc, skip files over 10MB
crabwalk snap /etc --max-size 10MB --output etc-snap.json
# Use 4 threads
crabwalk snap /var/www --threads 4 --output www-snap.json
# Snapshot without respecting .crabwalkignore
crabwalk snap /app --no-ignore --output app-snap.json
### `crabwalk diff `
Compare two snapshot files and report what changed. Categorizes every change as `Added`, `Removed`, `Modified` (content), or `MetadataChanged` (permissions, ownership, mtime).
crabwalk diff [OPTIONS]
OPTIONS:
-f, --format Output format: table, json, html (default: table)
-o, --output Write output to file (default: stdout, required for html)
--ignore-mtime Ignore mtime-only differences
--ignore-metadata Suppress metadata-only changes, show content changes only
--only Filter change types: added,removed,modified,metadata
Exit codes:
| Code | Meaning |
|------|---------|
| `0` | No changes detected |
| `1` | Changes detected |
| `2` | Execution or input error |
Examples:
# Human-readable table in terminal
crabwalk diff before.json after.json
# JSON output to file
crabwalk diff before.json after.json --format json --output diff.json
# Show only content changes, ignore permission/mtime drift
crabwalk diff before.json after.json --ignore-metadata
# Show only added and removed files
crabwalk diff before.json after.json --only added,removed
# HTML report
crabwalk diff before.json after.json --format html --output report.html
## Ignore Patterns
Create a `.crabwalkignore` file in the directory you are snapshotting. Uses the same syntax as `.gitignore`, powered by the same `ignore` crate used by ripgrep.
# .crabwalkignore
# Ignore build artifacts
target/
*.log
# Ignore version control metadata
.git/
# Negation: always include this specific log
!important.log
# Ignore cache directories
**/__pycache__/
**/node_modules/
Pass `--no-ignore` to disable this behavior entirely.
## Output Formats
### Table (default)
Color-coded terminal output. Best for interactive use and quick reviews.
- Green: `ADDED`
- Red: `REMOVED`
- Yellow: `MODIFIED`
- Cyan: `META`
crabwalk diff old.json new.json
### JSON
Machine-readable structured output. Useful for piping into other tools, storing as artifacts, or building dashboards.
crabwalk diff old.json new.json --format json
Output structure:
{
"old_snapshot": "baseline.json",
"new_snapshot": "current.json",
"timestamp": "2026-05-20T12:00:00Z",
"summary": {
"added": 1,
"removed": 1,
"modified": 2,
"metadata_changed": 3,
"total": 7
},
"entries": [
{
"path": "etc/sudoers",
"change_type": "MetadataChanged",
"details": ["permissions: 0440 -> 0777"]
}
]
}
### HTML
Self-contained HTML report with summary cards and interactive filters. Requires `--output`.
crabwalk diff old.json new.json --format html --output report.html
open report.html
## Architecture
crabwalk/
├── Cargo.toml # Dependencies and package metadata
├── .crabwalkignore.example # Example ignore pattern file
│
├── src/
│ ├── main.rs # Entry point: CLI parse -> dispatch to snap/diff
│ ├── cli.rs # Clap structs: SnapArgs, DiffArgs, OutputFormat
│ │
│ ├── walker.rs # Recursive directory traversal via walkdir
│ │ # Applies filters before hashing
│ │ # Never follows symlinks outside root
│ │
│ ├── hasher.rs # BLAKE3 streaming hash in 64KB chunks via BufReader
│ │ # Never loads entire file into memory
│ │ # Send + Sync for rayon parallelism
│ │
│ ├── snapshot.rs # Snapshot + FileEntry structs with serde
│ │ # Entries always sorted by path (determinism)
│ │
│ ├── diff.rs # Diff engine: HashMap-indexed comparison
│ │ # Categorizes: Added, Removed, Modified, MetadataChanged
│ │
│ ├── filter.rs # .crabwalkignore via `ignore` crate
│ │ # --max-size enforcement before hashing
│ │ # Special file detection and skip
│ │
│ ├── output/
│ │ ├── mod.rs # OutputFormatter trait + format dispatch
│ │ ├── table.rs # comfy-table + colored terminal output
│ │ ├── json.rs # serde_json pretty output to stdout or file
│ │ └── html.rs # Askama compiled template -> HTML report
│ │
│ └── error.rs # CrabwalkError enum via thiserror
│ # Io, Walk, Hash, Snapshot, Diff, Filter, Output
│
├── templates/
│ └── report.html # Askama HTML diff report template
│
├── tests/
│ ├── snap_test.rs # Snapshot determinism, metadata, errors
│ ├── diff_test.rs # All change categories
│ ├── filter_test.rs # Ignore patterns and max-size
│ ├── security_test.rs # Symlinks, special files, path traversal
│ └── integration_test.rs # End-to-end: snap -> modify -> snap -> diff
│
└── benches/
└── walker_bench.rs # Criterion: sequential vs parallel, various sizes
### Data Flow
PATH
|
v
[walker.rs] -- recursive walk, metadata collection
|
v
[filter.rs] -- .crabwalkignore, --max-size, skip special files
|
v
[hasher.rs] -- rayon par_iter: BLAKE3 streaming hash per file
|
v
[snapshot.rs] -- sort by path, serialize to JSON
|
v (two snapshots)
[diff.rs] -- HashMap comparison, categorize changes
|
v
[output/] -- table | json | html
### Key Dependencies
| Crate | Version | Purpose |
|-------|---------|---------|
| `clap` | 4 | CLI parsing with derive macros |
| `walkdir` | 2 | Recursive directory traversal |
| `blake3` | 1 | Streaming cryptographic hash (faster than SHA-256) |
| `rayon` | 1.10 | Data parallelism via `par_iter()` |
| `serde` + `serde_json` | 1 | Snapshot serialization |
| `ignore` | 0.4 | `.gitignore`-style pattern matching (same as ripgrep) |
| `comfy-table` + `colored` | 7 / 2 | Terminal table output with colors |
| `askama` | 0.12 | Compile-time type-safe HTML templates |
| `thiserror` | 2 | Ergonomic error type derivation |
| `chrono` | 0.4 | ISO 8601 timestamps |
## Security
Crabwalk is a security auditing tool. It treats the filesystem as hostile by default.
### Defense in Depth
| Layer | Threat | Mitigation |
|-------|--------|-----------|
| 1. Symlink safety | Symlink escape to `/etc/shadow` | Symlinks not followed by default. `--follow-symlinks` is opt-in and stays within root |
| 2. Path normalization | Path traversal via crafted paths | All snapshot paths are relative to root, never absolute |
| 3. Streaming hash | OOM on large files | BLAKE3 via 64KB `BufReader` chunks. Constant memory for any file size |
| 4. Special file protection | Infinite read on `/dev/zero`, blocking on pipes | Char devices, block devices, FIFOs, and sockets are skipped and reported |
| 5. Error reporting | Silent failures hiding tampering | Every I/O error is recorded in `FileEntry.error`. `total_errors` in snapshot metadata |
| 6. TOCTOU awareness | File changes between walk and hash | Files that vanish mid-scan produce an error entry, never a panic |
| 7. Snapshot integrity | Tampered snapshot files | Deterministic JSON ordering. Schema version field for compatibility checks |
| 8. Input validation | Malformed paths or options | All CLI inputs validated before processing. Strict serde deserialization |
### What Crabwalk Does Not Do
- Does not verify its own binary integrity (use your package manager or a TPM for that)
- Snapshots are not cryptographically signed (GPG-sign them yourself if needed)
- Does not operate in real-time (it is a point-in-time tool, not a daemon)
## Performance
Crabwalk hashes files in parallel using Rayon across all available CPU cores. Memory footprint is constant regardless of file sizes.
### Benchmark Results
Reference run on 2026-05-20 (`cargo bench --bench walker_bench`):
| Benchmark | Time |
|-----------|------|
| `snapshot_small / sequential` | ~9.91 ms |
| `snapshot_small / parallel` | ~4.46 ms |
| `snapshot_medium / sequential` | ~107.06 ms |
| `snapshot_medium / parallel` | ~41.21 ms |
| `hash_single_file / 1KB` | ~43.36 µs |
| `hash_single_file / 100KB` | ~76.67 µs |
| `hash_single_file / 1MB` | ~316.56 µs |
| `hash_single_file / 10MB` | ~4.13 ms |
| `diff_engine / 100 entries` | ~35.75 µs |
| `diff_engine / 1000 entries` | ~404.13 µs |
| `diff_engine / 10000 entries` | ~6.83 ms |
### Run Benchmarks
cargo bench --bench walker_bench
Compare sequential vs parallel with Hyperfine:
hyperfine \
"crabwalk snap ./target --threads 1 --output /tmp/seq.json" \
"crabwalk snap ./target --threads 8 --output /tmp/par.json"
## License
MIT. See [LICENSE](./LICENSE).
标签:通知系统