tombaldwin/candor

GitHub: tombaldwin/candor

Stars: 0 | Forks: 0

# candor **A cheap, honest map of what every function in a Rust codebase actually does** — which functions reach the network, filesystem, a database, subprocesses, the clock, the environment; *transitively*; and where it honestly *can't* tell. A capability/effect checker built as a [dylint](https://github.com/trailofbits/dylint) lint — the Rust reference implementation of [candor-spec](https://github.com/tombaldwin/candor-spec) (the same idea, specified across languages). **Built for AI coding agents.** An agent re-derives "what does this do?" every session and pays per token to read code. candor precompiles it into a report read in seconds — and marks calls it can't resolve `Unknown` rather than guessing. In a pilot ([EVAL.md](EVAL.md)) a JSON-only agent scoped a refactor ~3× cheaper and ~6.5× faster than one reading source. ### Get an agent using it — one paste, from nothing Give your coding agent (Claude Code, Cursor, …) this: [AGENTS.md](AGENTS.md) is self-contained — it installs candor, runs it on this project, and explains the report and the trust rule (`inferred` is authoritative; `unresolved`/`Unknown` → read the source). Single source of truth for agents. ### Claude Code: see it work, automatically The paste above asks the agent to use candor — but you can't *see* whether it did. The [Claude Code integration](integrations/claude-code/) gives you a deterministic, un-fakeable **receipt** in your transcript whenever your Rust changes — function count, effect breakdown, a freshness hash, and a coverage warning when a dependency looks effectful but isn't calibrated: candor · 143 fns · 54 Db, 16 Net, 27 Fs · 0 unresolved · fresh @8c4c9053 · coverage ✓ A `Stop` hook auto-refreshes it on every turn that touches Rust (silent otherwise); `/candor` shows it on demand. Install: `integrations/claude-code/install.sh` from your project — it installs thin stubs that delegate to this clone, so `cargo candor update` refreshes the engine, the scripts, and `AGENTS.md` together (every receipt is stamped with the engine commit, so they can't silently desync). See its [README](integrations/claude-code/README.md) for the trust model and honest limits. **Opt-in edit-time self-review.** Set `CANDOR_REVIEW=1` (in `.candor/config`) and the Stop hook does more than inform the human: when the agent's edits give a function a *new* effect vs your committed baseline, it hands that delta *back to the agent* as a self-review checkpoint — "your edits gave `foo` a new `Net` (which propagates to its callers); intended?". Each effect prompts once, it never loops, and it's off by default. This is the difference between candor *informing* an agent and *changing what it does* — see [BACKLOG.md](BACKLOG.md) P0. **MCP server.** [`integrations/mcp/`](integrations/mcp/) exposes candor's instant queries (`candor_effects` / `candor_where` / `candor_callers` / `candor_diff`) as native MCP tools, so an agent calls candor reflexively — in one cheap call instead of grepping and reading source. Pair it with `cargo candor watch` so every call serves from a fresh report. *Humans:* [Quick start](#quick-start-humans) · *Detail:* [what it detects](#what-it-detects) · [PRINCIPLES](PRINCIPLES.md) · [CRITIQUE](CRITIQUE.md) ## Layout | Path | What | |---|---| | `src/lib.rs` | the entire lint — classifier, per-function call-graph fixpoint, the three modes | | `sample/` | a small crate written in the capability discipline, for trying conformance mode | | `rust-toolchain` | pins the nightly the lint links against (`rustc-dev`) | ## Setup cargo install cargo-dylint dylint-link # once per machine cargo build # builds the lint; first build downloads the nightly + rustc-dev The build produces `target/debug/libcandor@-.dylib` (`.so` on Linux). ## Quick start (humans) Put this repo on `PATH` (or symlink `cargo-candor` into one) and use the wrapper, which finds and builds the dylib for you: cargo candor audit # at-a-glance effect profile of the whole project cargo candor audit --all # the full per-function lint (spans in context) cargo candor snapshot .candor/baseline # write a JSON report cargo candor guard .candor/baseline # fail on functions that gained an effect cargo candor diff .candor/baseline # describe the per-function effect delta (--json) cargo candor watch # keep the report fresh in the background → instant `diff` cargo candor show my_function # a function's effects, instant (read from the report) cargo candor where Net # which functions perform an effect, instant cargo candor callers my_function # which functions call this one, instant (who depends on it) cargo candor explain my_function # trace WHY a function has each effect (the call path) cargo candor policy .candor/policy # enforce effect boundaries (deny/pure rules) cargo candor risk # heuristic: effects on caller-derived input (advisory) cargo candor strict my_module # conformance, scoped to a module cargo candor no-ambient my_module # flag direct ambient-authority use `cargo candor audit` aggregates the project's crates into a one-screen profile — how many functions perform each effect, which make calls candor can't resolve, any uncalibrated dependencies, and the functions with the broadest reach into the outside world: candor @62a9383 143 effectful functions · 7 pgman.Executable · 136 pgman.Rlib effects 56 Db · 53 Clock · 47 Log · 37 Env · 27 Fs · 23 Exec · 21 Clipboard · 18 Net broadest effect surface app::App::run { Clipboard Clock Db Env Exec Fs Log Net } main { Clipboard Clock Db Env Exec Fs Log Net } run_batch { Clock Db Env Exec Fs Log Net } … `cargo candor policy` enforces **architectural effect boundaries** — the failure mode AI agents have most, because they edit one function without seeing the whole effect graph. A policy file declares invariants and candor flags any *transitive* violation: # .candor/policy deny Net Db Fs domain # the domain layer must reach no I/O — even through a helper pure parse # parsing must be side-effect-free deny Exec # nothing may spawn a subprocess [AS-EFF-006] `domain::checkout` performs { Db }, forbidden by policy (scope `domain`): `deny Net Db Fs domain` `checkout` need not touch the database *directly* — candor catches it reaching `Db` through any callee, the boundary break a local diff would hide. See [examples/candor-policy](examples/candor-policy). `cargo candor risk` is an **advisory, heuristic** nudge toward the injection class — an effect whose argument derives from a function parameter (`fs::read(format!("/var/cache/{key}"))`, `Command::new(name)`): [AS-EFF-007] `read_user_file` performs { Fs } on caller-derived input (an injection surface — …) It is *not* sound taint analysis: a syntactic, intra-procedural check that over- and under-flags (it misses flow through struct fields and across functions, and flags a parameter that's actually validated). Use it to find surfaces worth reviewing — never as a gate. ## All modes (explicit invocation) From any Rust project root, with `LINT` set to the dylib's absolute path: # AUDIT (default): every function's transitive effect set. No code changes needed. cargo dylint --lib-path "$LINT" # JSON: machine-readable report, one file per crate+type: ...json CANDOR_JSON=/tmp/report cargo dylint --lib-path "$LINT" # CONFORMANCE: enforce inferred ⊆ declared. CANDOR_STRICT=1 cargo dylint --lib-path "$LINT" # whole crate CANDOR_STRICT=mymod::sub cargo dylint --lib-path "$LINT" # one module (incremental adoption) # ENFORCEMENT (cap-std-aligned): flag any DIRECT reach for ambient authority. CANDOR_NO_AMBIENT=mymod cargo dylint --lib-path "$LINT" # AS-EFF-004 per direct ambient call # REGRESSION GUARD: fail if any function gained an effect since a saved snapshot. CANDOR_JSON=.candor/baseline cargo dylint --lib-path "$LINT" # 1. snapshot (commit it) CANDOR_BASELINE=.candor/baseline cargo dylint --lib-path "$LINT" # 2. in CI: AS-EFF-005 on regressions # Flags that combine with any mode: CANDOR_CONFIG=candor.rules cargo dylint --lib-path "$LINT" # extra classifier rules CANDOR_PARANOID=1 cargo dylint --lib-path "$LINT" # treat generic trait dispatch as Unknown Or register it in a project's `Cargo.toml` so plain `cargo dylint` finds it — by local path, or **straight from git with no clone** (dylint fetches and builds it against candor's pinned toolchain). This is dylint's equivalent of a dependency; dylint loads libraries only from `git` or `path` sources, not crates.io, so candor is **not** (and need not be) published there. [workspace.metadata.dylint] # clone-free — pin a tag/rev for reproducibility: libraries = [{ git = "https://github.com/tombaldwin/candor", tag = "v0.1.0" }] # …or a local checkout: libraries = [{ path = "/abs/path/to/candor" }] ## What it detects candor answers two questions about a codebase: 1. **What effects does each function perform?** — network (AWS SDK, `reqwest`/`ureq`/`isahc`, raw `std`/`tokio` sockets), databases (`sqlx`/`rusqlite`/`postgres`/…), local IPC (Unix sockets), filesystem, process spawn, env, clock, randomness, logging, clipboard — including effects inherited transitively through the functions it calls. 2. **Are the signatures honest?** — once you thread capability tokens (or use cap-std) through a module, it flags any function performing an effect it doesn't declare. It resolves every call's `DefId` and classifies the crate/path it lands in. That type resolution is the point: a bare `.send()` is meaningless syntactically, but the resolved method tells us it belongs to `aws_sdk_*` → a network effect. ## The capability discipline (conformance mode) A function declares the effects it may perform by taking the matching **capability token** as a parameter (`&Fs`, `&Env`, …). Tokens are unforgeable — a private field means they can only be *received*, never constructed outside their defining module — and are minted once at the entry point. See `sample/src/main.rs` for the pattern. The checker then flags: - **AS-EFF-001** — a function performs an effect it does not declare. - **AS-EFF-002** — a function declares a capability it never uses. - **AS-EFF-003** — a function makes a call candor cannot resolve (dynamic dispatch, fn-pointer, or callback through `impl Fn`), so its effect set is not provably complete and cannot be certified. - **AS-EFF-004** (`CANDOR_NO_AMBIENT`) — a function reaches for *ambient authority* directly (`std::fs`, `std::net`, `std::env`, `std::process`, the clock, …) instead of receiving a capability. This is the cap-std-aligned, *enforceable* alternative to the advisory tokens: it fires even on functions that hold a token, because holding `&Fs` doesn't stop you calling `std::fs`. The fix is to route the call through an injected capability (e.g. a cap-std handle). - **AS-EFF-005** (`CANDOR_BASELINE`) — an existing function *gained* an effect it didn't have in a saved snapshot. The lowest-friction adoption path: no token threading, no rewrite — just catch the PR that makes a previously-pure function start doing network/disk/etc. I/O. (New functions are not flagged; they're reviewed as new code.) Adopt incrementally: scope `CANDOR_STRICT` / `CANDOR_NO_AMBIENT` to one module, fix until it reports zero, then move to the next. ### Or use real capabilities: cap-std candor recognises [cap-std](https://github.com/bytecodealliance/cap-std) capability *types* as declarations and its operations as the matching effect. A function that takes a `&Dir` and reads through it (`dir.read_to_string(..)`) is conformant — its declared `Fs` matches its inferred `Fs` — while a sibling that reaches for ambient `std::fs` is flagged. Unlike candor's own advisory tokens, cap-std capabilities are unforgeable and compile-enforced; candor just makes the effect surface *visible* on top. See `sample-capstd/`. Mapped today: `Dir`→Fs, `Pool`/`TcpStream`→Net, `SystemClock`→Clock, `UnixStream`→Ipc. ## CI guardrail (lowest-friction adoption) You don't have to adopt the capability discipline to get value. The cheapest win is the regression guard: snapshot the effect report, commit it, and fail CI when a function's effect surface grows. # once, on a known-good commit — then `git add .candor/` CANDOR_JSON=.candor/baseline cargo dylint --lib-path "$LINT" # in CI: fail only on AS-EFF-005 (a function gained an effect) — see examples/candor-guard.yml out=$(CANDOR_BASELINE=.candor/baseline cargo dylint --lib-path "$LINT" 2>&1); echo "$out" echo "$out" | grep -q AS-EFF-005 && { echo "effect surface grew"; exit 1; } || true Now a PR that makes a parser suddenly open a socket, or a render function start reading the filesystem, fails review automatically — no tokens, no rewrite. Refresh the baseline deliberately (re-run the snapshot command) when a new effect is intended. This is equally useful to a human reviewer and to an AI agent reviewing a diff. ## How well does it actually help an agent? (the honest version) A controlled pilot ([EVAL.md](EVAL.md)) pitted a JSON-only agent against a source-only one on the same scoping task. The JSON was ~3× cheaper and ~6.5× faster — *and* it surfaced a real lesson: the source-only agent was more **accurate** in one spot, because candor had silently misclassified some `reqwest` HTTP calls (a classifier gap, since fixed). So: the report is cheap and genuinely useful, but **only as correct as its classifier** — which is exactly why `Unknown`/`unresolved` exists, and why an agent should treat flagged-uncertain functions as "go read the source," not "trust me." ## Unresolved calls (honest soundness) A call candor cannot trace to a concrete callee — `dyn Trait` dispatch, a function pointer, a closure reached through a generic `impl Fn` parameter — could perform *any* effect. candor records these as an **`Unknown`** effect rather than silently assuming purity. You'll see `Unknown` in audit output and the JSON `unresolved` flag; in conformance mode it raises AS-EFF-003. (Measured cost of *not* doing this: on a real ~8k-line codebase, 22% of functions make at least one unresolved call.) Residual gap: statically-dispatched **generic** trait calls (`t.method()` where `t: T: Trait`) are assumed to honour their bound rather than marked `Unknown` — otherwise every `.clone()` / `.to_string()` / iterator adaptor would drown the report. See `CRITIQUE.md`. ## Extending the classifier `classify()` in `src/lib.rs` is a curated table mapping crates/paths to effects. To recognise your own effectful crates without rebuilding, point `CANDOR_CONFIG` at a rules file — one rule per line, ` `: # project effect rules Net crate reqwest Fs path mycrate::storage:: Match the actual I/O boundary, not the whole crate — e.g. only `.send()` for an SDK, only `Command`/`Child` for `std::process` — or you will over-report. ## Known limitations - **Dynamic dispatch / fn-pointers / callbacks** can't be resolved to a concrete callee. These are surfaced honestly as `Unknown` (→ AS-EFF-003) rather than silently dropped, but candor still can't tell you *which* effects hide behind them. Exception: `dyn` over conventionally-pure std traits (`Display`, `Debug`, `Error`, `ToString`, `Clone`, …) is treated as pure, not `Unknown` — otherwise ubiquitous patterns like `dyn Error` formatting would flood reports with false positives. - **Generic static dispatch** (`t.method()` for `t: T: Trait`) is assumed to honour its bound — a deliberate residual unsoundness to keep the report readable (see `CRITIQUE.md`). - **Advisory, not enforced**: a `&Fs` token doesn't actually gate `std::fs`; candor only reports. For real enforcement use [cap-std](https://github.com/bytecodealliance/cap-std). - **Macro-generated functions are invisible.** Items whose span is from a macro expansion are skipped (to drop noise like tracing's `__CALLSITE` statics) — so a function *generated* by a macro (`async_trait`, derive, a declarative macro) is not analyzed in any mode. Code you wrote by hand is unaffected. - **Capabilities must be direct parameters.** `declared_caps` recognizes a capability (`&Fs`, a cap-std `&Dir`) only as a top-level parameter. A capability reached *through* a struct field (`fn f(ctx: &AppContext)` where `ctx` holds the `Dir`) is not counted as declared — that function would be flagged in strict mode despite holding the capability. - **Generic static dispatch over non-local traits** is assumed to honour its bound (CHA only sees through *local* traits); `CANDOR_PARANOID` flags the rest at the cost of noise. - Logging via macros is deduped per function but counts every function that logs. ## Documentation - **[candor-spec](https://github.com/tombaldwin/candor-spec)** — the language-agnostic spec candor implements (effect vocabulary, report schema, trust contract; the Java/C#/Go ports share it). - **[AGENTS.md](AGENTS.md)** — self-contained instructions for an AI agent (install → run → read). - **[PRINCIPLES.md](PRINCIPLES.md)** — the ideas candor (and its development) are built on. - **[CRITIQUE.md](CRITIQUE.md)** — an honest, critical self-assessment + comparison to prior art (Cackle, cap-std, the Rust effects initiative). - **[EVAL.md](EVAL.md)** — a controlled pilot of whether the report actually helps an AI agent. - **[BACKLOG.md](BACKLOG.md)** — what's done, what's deferred, and the concrete reason for each. - **[CONTRIBUTING.md](CONTRIBUTING.md)** — build/test, and how to teach the classifier a new crate. - **[SECURITY.md](SECURITY.md)** — why candor is *not* a security boundary, and how to report a false-negative (the bug class that matters most). ## Tests `cargo test` runs unit tests over the *classifier* precision rules (e.g. `std::net::TcpStream` is `Net` but `std::net::SocketAddr` is not) plus a load smoke-test. The **stateful core** (call-graph fixpoint, CHA, conformance) isn't unit-tested — it needs the dylint harness, which has no bless support — so it's covered instead by the `sample/`+`sample-capstd/` crates and a CI *behavioural* check that asserts real audit output (so a "candor emits nothing" regression fails CI). The lint also fails *gracefully* (never an ICE) on expressions outside a typechecked body. ## Status Prototype. Validated on a real ~8k-line codebase (the `ebman` AWS Elastic Beanstalk TUI): audit tagged ~445 functions; a leaf module was converted to the capability discipline and brought to zero conformance violations while still building on stable. candor also **guards itself**: CI runs candor over candor against `.candor/baseline`, so its three existing effectful functions (config/baseline reads + the report write, all `Env`/`Fs`) can't gain a *new* effect unnoticed. Note the guard's scope, honestly: per AS-EFF-005's design it flags *regressions in existing functions*, not brand-new functions (those are reviewed as new code), so a newly-added effectful function wouldn't trip it. Refresh with `cargo candor snapshot .candor/baseline` when a new effect is intended. ## License Dual-licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE), at your option.
标签:通知系统