fireharp/coherence

GitHub: fireharp/coherence

一款面向 AI 编辑代码库的 Git 原生一致性检测工具，用于发现代码与文档、测试、产物之间的结构偏差。

Stars: 5 | Forks: 0

# Coherence Git-native drift detector for agent-assisted repositories. **Coherence is not an AI reviewer. It is a repo consistency harness for AI-edited codebases.** Tests pass. The repo still drifts. Coherence catches the broken links between code, docs, ADRs, tests, metrics, generated files, endpoints, and evidence - especially after AI-agent edits. Coherence runs locally. Deterministic checks do not send code anywhere. The optional LLM pass is disabled by default and only runs when `COHERENCE_LLM=1` or `--llm` is set. ## How is this different? | Tool/category | Positioning | Coherence differentiation | | --- | --- | --- | | [Fiberplane Drift](https://fiberplane.com/blog/drift-documentation-linter/) | Binds Markdown specs to code anchors and flags docs as stale when bound code changes. | Broader repo-graph drift across ADRs, tests, metrics, generated artifacts, endpoints, and evidence. | | [`drift-analyzer`](https://pypi.org/project/drift-analyzer/) | Detects deterministic architectural erosion and structural drift in AI-accelerated codebases. | Adds traceability and semantic repo consistency, not only structural analysis. | | [AgentSys `/drift-detect`](https://github.com/agent-sh/agentsys) | Compares documented plans and project docs with actual implementation using deterministic collectors plus one LLM analysis call. | Deterministic CLI/JSON-first checks with an optional LLM pass. | | [AgentLint](https://www.agentlint.app/) | Audits the agent harness: `AGENTS.md`, `CLAUDE.md`, CI, hooks, and related rule surfaces. | Checks whether changed repo artifacts still support each other. | ## 30-second demo # install from the latest GitHub release curl -fsSL https://github.com/fireharp/coherence/releases/latest/download/install.sh | sh # add repo rules, a pre-commit hook, a drift baseline, and the Codex skill coherence init --template=agent-repo # review local worktree drift before handing off or committing coherence review --base=HEAD --worktree --json One concrete regression looks like this: { "safe_to_commit": true, "review_recommended": true, "drift_verdict": "telemetry", "drift_regression_count": 1, "drift_regressions": [ { "kind": "newly_orphaned_endpoint", "id": "endpoint:GET:/api/orders", "suggested_action": "add or restore a test that verifies the source file defining endpoint:GET:/api/orders" } ], "recommended_next_command": "coherence drift --json" } That is the gap Coherence is built for: the commit can be technically safe, but it still removed a traceable support path that an agent or reviewer should look at. ## Requirements - Go 1.26.3 or newer (to build) - Git - Optional: `GROQ_API_KEY` for the LLM pass ## Install # latest release binary; writes ~/.local/bin/coherence curl -fsSL https://github.com/fireharp/coherence/releases/latest/download/install.sh | sh # fallback: install from the latest tagged source go install github.com/fireharp/coherence/cmd/coherence@latest # local development build from a clone go build -o bin/coherence ./cmd/coherence ## GitHub Actions Run Coherence in PR CI with strict drift gating: name: coherence on: pull_request: jobs: coherence: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Install Coherence run: curl -fsSL https://github.com/fireharp/coherence/releases/latest/download/install.sh | sh - name: Review repo drift run: ~/.local/bin/coherence review --base=origin/main --worktree --json --strict ## Command reference coherence init --template=go-cli # scaffold ontology + hook coherence templates # list available templates coherence bench # run shipped template eval suite coherence scan --staged # pre-commit gate coherence check --ref=HEAD~1 # tracked diff-range check coherence check --ref=HEAD --include-untracked # diff + untracked union coherence review --base=HEAD --worktree --json # combined local/agent review coherence review --base=origin/main --staged --json # PR-shaped review coherence watch --once --json # one-shot local worktree signal coherence doctor # validate ontology + hook + state coherence index # write .coherence/snapshot.json + graph.json coherence diff # compare current snapshot vs baseline coherence drift # compute drift meters → .coherence/drift.json coherence drift --summary # print a 1-line drift summary coherence drift --strict # same, but exit 1 on telemetry too (zero-drift CI gate) coherence status # rewrite .coherence/STATUS.md coherence status --json # same data as STATUS.md but structured for agents coherence report # print the last stored report coherence version # print build/module version coherence help # usage `scan`, `check`, and `review` write `.coherence/last-report.json`. The `.coherence/` directory is gitignored. ### JSON outcome contract `scan`, `check`, and `review` accept `--json` and emit a stable top-level vocabulary so pre-commit hooks and agents can decide what to do next without parsing prose: { "safe_to_commit": true, "review_recommended": true, "blocking_error": false, "telemetry_only_movement": false, "staged": "clean", "worktree": "dirty", "untracked_files_excluded": true, "untracked_file_count": 17, "recommended_next_command": "coherence review --base=HEAD --worktree --json" } Notable behaviors: - `scan --staged --json` passes (`safe_to_commit: true`) when nothing is staged, but reports `review_recommended: true` plus a `recommended_next_command` when the worktree is dirty. A clean staged set does not mean local work has been reviewed. - `check` excludes untracked files by default; pass `--include-untracked` to fold them in. When excluded, the JSON reports `untracked_files_excluded`, `untracked_file_count`, and a next-command hint. - `review --worktree` includes untracked files; `review --staged` mirrors pre-commit but also folds in the `base..HEAD` diff so it can flag rule fires that the staged set alone misses. ## Pre-commit hook `.githooks/pre-commit` runs `coherence scan --staged`. `coherence init` sets `git config core.hooksPath .githooks` automatically when the repo has no conflicting hook path. If init reports that hook config was skipped, run: git config core.hooksPath .githooks The hook expects `coherence` to be on `PATH`. To point at a different binary, edit `.githooks/pre-commit` directly (e.g. change it to `./bin/coherence scan --staged` if you prefer to build into the repo). ## Tests go test ./... ## Rules Rules live in `ontology.yml`: version: 1 commands: test: [go test ./...] build: [go build ./cmd/coherence] rules: - id: fixture-generator-needs-output when: - "frontend/scripts/build-fixtures.mjs" expect_any: - "frontend/public/fixtures/dashboard.json" severity: error message: "Fixture source changed; outputs must be regenerated and co-staged." suggested_commands: - node frontend/scripts/build-fixtures.mjs - git add frontend/public/fixtures Paths are Git-relative. A rule fires when any `when` glob changed and none of the `expect_any` globs changed in the same staged set or diff. `suggested_commands` on a rule are surfaced in both human and `--json` output when the rule fires, and aggregated under top-level `suggested_commands` in the report payload — so agents see exactly what shell commands the rule authors recommend. Use `--ontology=path/to/file.yml` with `scan`, `check`, `review`, or `status` to load a non-default ontology. ## Init and templates `coherence init [--template=] [--force] [--skill-install=auto|native|off] [--no-baseline] [--no-hooks-config] [--json]` scaffolds a fresh repository. When `--template` is omitted, the command auto-detects from layout tells (`pnpm-workspace.yaml`, `go.mod`, `pyproject.toml`, `apps/`+`packages/`, etc.) and falls back to `generic` if nothing strong matches. The detected template name prints to stderr (or the `template` field in `--json` mode) so users see what shape was inferred. - writes `ontology.yml` (template-specific rules + `commands:` + per-rule `suggested_commands`), - writes `.githooks/pre-commit` (executable; finds the binary on PATH or falls back to `$HOME/go/bin/coherence`), - ensures `.coherence/` is listed in `.gitignore`, - creates the local `.coherence/` state directory, - builds an initial `.coherence/snapshot.json` + `graph.json` baseline so the first `coherence drift` / `diff` compares against real state rather than empty (rather than leaving the user to remember `coherence index` post-init), - installs the Codex project skill at `.agents/skills/coherence/SKILL.md`. It is idempotent: existing files are skipped without `--force`. After init, run `coherence doctor` to verify. Skill installation defaults to `auto`, which tries `npx --yes skills add ... --agent codex --copy -y` and falls back to native file writes. Use `--skill-install=native` to skip `npx`, or `--skill-install=off` to skip the skill. Available templates (`coherence templates`): | name | kind | shape | | -------------------- | -------- | ---------------------------------------------------- | | `generic` | starter | minimal baseline — docs + code coupling | | `go-cli` | starter | `cmd//main.go` + `internal/` + `go.mod`/`go.sum` | | `typescript-app` | starter | `package.json` + `src/` + `tsconfig` | | `python-package` | starter | `pyproject.toml` + `src/` + `tests/` | | `data-pipeline` | starter | schema/migrations/dbt-style projects | | `docs-site` | starter | markdown-heavy repos with an index/nav file | | `infra-terraform` | starter | `.tf` modules + runbooks | | `monorepo` | starter | `packages/*` + `apps/*` workspaces | | `agent-repo` | starter | AI/automation agents with task/evidence traceability | | `markdown-index` | overlay | KB content/* with index files + frontmatter schema | | `privacy-collectors` | overlay | privacy-sensitive Go collectors + redaction policy | **Starter** templates are intended as the `init` baseline. **Overlay** templates are composition examples — copy their rules into an existing ontology when the relevant repo shape applies (you might run a `go-cli` starter and merge in `privacy-collectors` rules for a service that handles PII; or run `docs-site` and merge in `markdown-index` rules for a structured knowledge base). Every template ships `commands:` (test/build/lint where applicable), at least two rules carrying `suggested_commands`, and an `eval/scenarios.yml` fixture that the `coherence bench` runner uses to guard against regression. ## Bench `coherence bench` runs any of the shipped scenario or evaluation suites: coherence bench # default: template eval suite coherence bench --suite=templates # explicit coherence bench --suite=coherencebench # the CB-### internal suite coherence bench --suite=evidence # canonical evidence protocol coherence bench --suite=lifecycle # compatibility alias for evidence coherence bench --suite=external # M7 external-style evaluations coherence bench --suite=adversarial # graph-seeded adversarial mutations coherence bench --suite=all --write-report # templates + CB + adversarial coherence bench --template=go-cli # single template shortcut coherence bench --suite=external --json # machine-readable Exit code is `1` when any non-adversarial scenario fails. The adversarial suite is telemetry by default and fails only with `--strict`. `--write-report` writes a human-readable Markdown summary to `.coherence/runs/YYYY-MM-DD/index.md` (linked from `STATUS.md`) for the template/CoherenceBench suites, writes evidence JSON + HTML charts to `.coherence/runs//evidence.{json,html}`, and writes adversarial artifacts under `.coherence/adversarial/`. ### Template eval suite Every template under `init` ships `eval/scenarios.yml` with at least one "fires" scenario and one "coherent update passes" scenario. The runner calls the same `rules.Evaluate` used by `scan` and compares fires against `expect_fires`. These fixtures also serve as regression guards: editing a template ontology that breaks a scenario surfaces immediately in `bench`. ### CoherenceBench `coherencebench` is the GOAL.md `CB-###` internal scenario suite (M1). Each scenario is a self-contained directory under `internal/coherencebench/scenarios/CB-###/`: - `ontology.yml` — the rules the scenario depends on, - `scenario.yml` — `id`, `name`, `description`, `status`, `changed_files`, and `expected.fires` / `expected.blocking_error`. `status:` distinguishes: - `deterministic` / `scored` — runnable with the current rules/IDs/graph/drift engine. 20 scenarios (CB-001..005, CB-007..021) pass today. - `skip` — deferred to later milestones (typically LLM-only paths). 1 scenario remains: CB-006 (LLM contradiction). Each skipped stub records the milestone that would enable it. The shipped totals are: **21 scenarios, 20 pass, 0 fail, 1 skipped** — matching M1's "at least 8 internal scenarios exist" bar. ### Scored scenarios (Files mode) Scenarios can also be declared with an inline `files:` map to materialize a synthetic git repo and run the full drift pipeline. The bench runner writes each file to a temp directory, auto-adds a minimal `ontology.yml` if the scenario omits one, runs `git init` + `git add -A`, then calls `drift.Compute` and compares the resulting verdict against `expected.drift.verdict`. This closes the M4 "benchmark scenarios have scored expected outputs" box and the M6 "contradiction scenarios have measurable precision/recall" path. Scenarios can also declare an optional `base_files:` map alongside `files:`. When set, the materializer first writes the baseline, computes its snapshot + graph + writes them under `.coherence/` (which a synthetic `.gitignore` excludes from tracking), then overlays the `files:` map and re-stages. This exercises the **diff-aware** meters (`semantic_movement`, `neighborhood_drift`, `blast_radius`, etc.) against a real before/after pair. Graduated scored scenarios so far: - **CB-014** ("ADR superseded but old docs still link as active") — files-only scenario; asserts `stale_decision_links` bumps verdict to `telemetry`. - **CB-011** ("doc typo-only change classified as semantic no-op") — base+current scenario; baseline has the original prose, current has a typo. `semantic_movement` classifies it as noop; verdict stays `clean` because no semantic edit triggered. - **CB-015** ("removed file still referenced by docs") — files-only scenario; doc links to a path that isn't in the tracked set. The new `broken_links` meter scans tracked markdown and flags the dangling reference. Verdict bumps to `telemetry`. - **CB-013** ("generated artifact older than generator/source") — base+current scenario relying on the materializer's baseline `git commit`. The current overlay modifies only the generator source; the artifact stays untouched. With a real `HEAD`, `git diff HEAD` surfaces the source change alone, and the ontology's severity=error rule fires via `required_edge_breakage`. Verdict bumps to `warn`. - **CB-004** ("code references US-999 but no story exists") — files-only scenario; the `unknown_id_references` meter scans non-Markdown tracked files for typed-id mentions and flags those without a corresponding node in the graph. Verdict bumps to `telemetry`. - **CB-012** ("test passes but no longer validates changed behavior") — base+current+commit scenario. The new `stale_tests` meter walks the `verifies` edge wired by the Go test extractor, compares baseline + current snapshot content_hashes, and flags the unchanged test whose source did change. - **CB-008** ("metric renamed in frontend only") — base+current scenario using the new `RemovedFiles` materializer option to model the rename. The new `orphaned_metric_aliases` meter diffs the metric label set between base and current graphs, then substring-scans frontend files (.ts/.tsx/.js/.jsx/.mjs/.cjs/.json) for any orphaned name. Verdict bumps to `telemetry`. The lone remaining skip is **CB-006 (LLM contradiction)** which requires a live Groq API key in CI; the materializer is otherwise fully equipped to host any future graduation. ### Evidence protocol `coherence bench --suite=evidence` is the canonical deterministic benchmark surface. It materializes a tiny policy/metrics service, evaluates each case against explicit oracles, and reports `hit`, `hit_with_unexpected_meter`, `false_negative`, `false_negative_with_unexpected_meter`, `false_positive`, `skipped`, or `errored` classifications. Each result also carries independent `oracle_hit`, `detection_hit`, and `specificity_clean` booleans: `detection_hit` means an expected meter actually fired and is no longer true for clean negative controls. `--suite=lifecycle` is a compatibility alias that returns the same canonical evidence JSON. The embedded, versionless matrix has exactly 60 evidence cases: six selected lifecycle meters, ten cases per meter, with four positives, three negative controls, and three known-limit cases per meter. The six original lifecycle cases remain the only managed/unmanaged chart rows and do not affect `scenario_counts.total`. Output includes schema-only artifact metadata (`artifact_kind: "coherence_evidence_report"`, `schema_version: 1`), `run_id`, `run_metadata`, `claims[]`, `scenario_counts`, `by_meter`, `evidence_rates`, `systematic_errors[]`, `raw_artifacts[]`, `lifecycle_summary`, and `final_health`. False positives are counted as both compatibility `false_positive` / `false_positive_cases` and `false_positive_meter_attributions`; by-meter false positives are attributed to the actual unexpected meter. Boundary reporting keeps `boundary_false_negative_rate` and adds `boundary_known_limit_false_negatives`. Per-meter `supported_recall` stays separate from `overall_recall_including_known_limits`. `--write-report` emits self-contained JSON + HTML evidence reports under an immutable `.coherence/runs//` directory with relative persisted `report_paths`, claim tables, FP/FN accounting, a systematic-error register, and managed-vs-unmanaged SVG charts. ### External-style evaluations (M7) `coherence bench --suite=external` runs the M7 evaluation harness. Per GOAL.md three categories are supported, with at least one sample shipped in each: | Category | Sample | What it asks | | --- | --- | --- | | `swe-bench` | EXT-SWE-001 | Given a changed source file, predict the test + spec doc that should be inspected | | `tebench` | EXT-TEB-001 | Given a modified source file, predict the tests likely needing updates | | `doc-code` | EXT-DOC-001 | Given a spec doc, recover the user-story doc it implements | Each sample materializes a tiny synthetic repo, runs `graph.Build`, then calls a 1-hop graph predictor over the seed. Predictions are scored against gold via precision / recall / F1; per-category averages roll up. The harness is intentionally minimal — extending it with real SWE-bench tasks (issue text + base-commit repo + gold patch) only requires more samples, not more plumbing. Results are reported **separately from the internal CB suite**, matching M7's acceptance criterion. ### Adversarial evaluations `coherence bench --suite=adversarial` materializes temporary repos, writes a baseline snapshot + graph, applies graph-seeded mutations, runs the drift pipeline, and scores expected meters against `drift.active_meters`. Source repos are never mutated. By default the suite uses an embedded agent-style Go/TypeScript repo covering the deterministic meter families. The first 20 built-in mutations are deterministic; the LLM contradiction mutation is ordered last and runs only when enabled. Real corpora are local-manifest only: version: 1 repos: - id: coherence-self path: . tags: [agent-repo, go] weight: 2 include: ["**"] exclude: [".coherence/**", "vendor/**"] Useful flags: coherence bench --suite=adversarial --iterations=100 --seed=7 --jobs=4 coherence bench --suite=adversarial --corpus-manifest=corpus.yml --json coherence bench --suite=adversarial --write-report coherence bench --suite=adversarial --refine-from=.coherence/adversarial/runs/ coherence bench --suite=adversarial --cycles=5 --iterations=20 --write-report coherence bench --suite=adversarial --llm --llm-specs coherence bench --suite=adversarial --export-report=docs/adversarial.md Default CI behavior is telemetry: misses are reported in JSON, clusters, and the rolling leaderboard, but the command exits 0 unless `--strict` is passed. The built-in suite is intentionally allowed to contain exploration demos that miss today, so `pass=false` is a research signal, not a failed implementation milestone. For the commit cadence and durable experiment ledger, see [`docs/adversarial-exploration.md`](docs/adversarial-exploration.md). `--write-report` writes `.coherence/adversarial/runs//` with JSONL, summary JSON, miss clusters, refinement suggestions, and updates `.coherence/adversarial/leaderboard.json` with rolling run, per-meter, and per-mutation hit/FN/FP rates. The summary includes both `by_meter` (expected meters plus unexpected active meters from false positives) and `by_expected_meter` for compatibility. `--llm-specs` optionally asks Groq for additional mutation specs using graph summaries only; generated specs are recorded under `.coherence/adversarial/specs/` after schema validation and a deterministic dry-run, and the run summary records whether expansion was requested, skipped, accepted, or failed without failing the deterministic bench. Mutation specs can declare `skip_conditions.require_env`, `skip_conditions.require_files`, and `skip_conditions.require_optional_engines`; unmet preconditions produce `skipped` iterations rather than errors. Edit paths and required-file paths must stay inside the materialized repo and cannot target `.git/` or `.coherence/`. `--export-report` writes only under the repo root. `--llm` enables LLM contradiction mutations when `GROQ_API_KEY` is also present. `--refine-from` accepts a run directory or `summary.json`, prioritizes mutations from prior miss clusters or skips/errors, and advances the seed when `--seed` is not supplied. `--cycles=N` runs that same refine loop repeatedly in one command, forcing per-cycle report artifacts so each pass can seed the next hypothesis. `coherence index` walks the tracked file set (`git ls-files`) and writes `.coherence/snapshot.json`. Each file gets: - `content_hash` — sha256 of file bytes, - `semantic_hash` — sha256 of a canonical form for known file types, - `kind`, `size`, `path`. Plus a Merkle directory roll-up and a `root_hash`. Two runs over the same tree yield the same root hash; a single byte change anywhere bubbles to the root. ### Diffing snapshots `coherence diff` computes a fresh snapshot of the worktree and compares it to a base (`.coherence/snapshot.json` by default; override with `--base=path`). It writes `.coherence/last-diff.json` and prints a summary: coherence diff # human summary coherence diff --json # machine-readable coherence diff --base=path/to/old-snapshot.json Per-file `change_type` taxonomy: | change_type | meaning | | ------------------ | ---------------------------------------------------------------- | | `added` | path in current snapshot, absent in base | | `removed` | path in base, absent in current | | `semantic_changed` | content_hash AND semantic_hash both differ | | `semantic_noop` | content_hash differs but semantic_hash identical (typo-only) | If there is no base on disk, `coherence diff` writes the current snapshot as the initial baseline and reports `initialized: true`. After that, the baseline is refreshed only by explicit `coherence index` invocations — `diff` itself never overwrites the baseline. ### Knowledge graph `coherence index` also writes `.coherence/graph.json` — the M3 knowledge-graph MVP. Each tracked file becomes a `file` node, each directory a `directory` node connected by `contains` edges. Markdown files additionally become `doc` nodes (label = frontmatter title or first heading). Files under `docs/user-stories/` and `docs/decisions/` with `US-###` / `ADR-###` / `IDR-###` ids in their frontmatter (or filename) emit typed `user_story` / `adr` / `idr` nodes connected back via `defines` edges. Inline Markdown links from one doc to another tracked file emit `mentions` edges with provenance. Code adds two more `mentions` flavors. (a) Typed-id references: when a non-markdown tracked file contains `US-###` / `ADR-###` / `IDR-###` tokens, a `mentions` edge wires `file:` → the typed-id node. Unknown ids are intentionally skipped here so the `unknown_id_references` drift meter still surfaces them as actionable findings. (b) Quoted path literals: a non-markdown file with `"some/path.json"`, `'./schemas/user.proto'`, or `` `config.yml` `` that resolves to a tracked file emits a `mentions` edge from source to target. The "must resolve to tracked" filter eliminates almost all noise — random string literals that aren't real repo paths never emit edges. URLs (`http://...`), absolute paths (`/etc/...`), and bare identifiers without a `/` or extension are rejected. Together these broaden the multi-hop reachability used by `path_loss` and `claim_support` — a concept whose doc mentions a story now reaches code that names the same story or references a config file the story depends on, even without an explicit markdown link. Node and edge kinds shipped today: | Node kinds | `file`, `directory`, `doc`, `user_story`, `adr`, `idr`, `rule`, `command`, `concept`, `claim`, `metric`, `test`, `evidence`, `generated_artifact`, `code_symbol`, `endpoint`, `data_model` | | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Edge kinds | `contains`, `defines`, `mentions`, `suggests`, `describes`, `verifies`, `supports`, `generates`, `supersedes`, `depends_on`, `implements`, `expects`, `contradicts`, `mirrors`, `invalidates` | `rule` and `command` nodes come from `ontology.yml`: every rule becomes a `rule:` node; every entry under top-level `commands:` and every per-rule `suggested_commands` becomes a `command:` node connected to the rule via a `suggests` edge. Additional `command` nodes come from Makefile/`*.mk` target declarations: each non-pattern, non-`.PHONY` target becomes a `command:make ` node wired via a `defines` edge from the source Makefile. `.PHONY`/`.DEFAULT_GOAL` and other `.`-prefixed special targets are skipped, as are variable assignments (`name = value`, `:=`, `?=`, `+=`, `!=`) and pattern rules (`%.o: %.c`). The canonical filenames `Makefile`, `makefile`, `GNUmakefile`, plus any `*.mk` include file, are scanned. Shell scripts also surface as commands: `*.sh`/`*.bash`/`*.zsh` files (and extensionless files with a `#!/.../sh`/`bash`/`zsh` shebang) emit `command:bash ` nodes wired back via `defines` edges. Non-shell shebangs (`python`, `node`, etc.) are not promoted. Recipe parsing (the sub-commands a script invokes) is deferred — Pass 13 surfaces existence + path only. `concept` nodes come from H1 + H2 headings in each Markdown doc, slugified (lowercased, non-alphanumeric → hyphen). Each captured heading emits one concept node + `describes` edge from the source doc. H3+ are intentionally skipped — they typically denote sub-sub-topics that inflate the concept graph without adding meaningful coverage signal. Cross-doc dedup applies: two docs whose headings slugify to the same value share **one** concept node, each contributing its own `describes` edge. Per-doc dedup also applies — a doc with multiple H2s sharing a slug emits a single describes edge. Each node carries `level` meta (`H1` / `H2`) for downstream filtering. `claim` nodes come from Markdown bullet items beginning with an assertive verb (`must`, `should`, `shall`, `requires`, `ensures`, `guarantees`, `cannot`, `will`). Each claim is content-addressed (`claim:`) so the same claim text across multiple docs dedupes to one node — each doc contributes a separate `defines` edge. This is the wiring needed for the `claim_support` drift meter. `metric` nodes come from YAML files under `rill/metrics/` or `metrics/` (including nested subdirs). One metric node per file today, labelled by the slugified filename (`metric:success-rate` from `rill/metrics/success_rate.yaml`). Per-`measures[]` extraction is a follow-up — the current MVP covers the common "one metric per file" convention. Code-level metric references add `mentions` edges: when a non-markdown tracked file contains a quoted occurrence (single, double, or backtick) of a known metric label, a `mentions` edge wires `file:` → `metric:`. Closes the GOAL.md "string-literal metric names" extraction note. The defining metric YAML itself is skipped (its `defines` edge already represents the relationship). `test` nodes come from path-pattern detection: Go `*_test.go`, Python `test_*.py` / `*_test.py`, JS/TS `*.test.{ts,tsx,js,jsx}` and `*.spec.{...}`, plus files under `tests/`, `test/`, or `__tests__/` directories. When the source file is reverse-mappable (e.g., `foo_test.go` → `foo.go`, `auth.test.ts` → `auth.ts`/`.tsx`), a `verifies` edge connects the test node to the source file node. Orphan tests (no matching source in the tracked set) still get a node but no verifies edge. `evidence` nodes come from `docs/evidence//...` files — one evidence node per bucket regardless of how many files live inside it. When the bucket name matches a typed-id pattern (`US-###`, `ADR-###`, `IDR-###`), a `supports` edge links the evidence node to the matching typed-id node. Date-keyed or otherwise arbitrary buckets get evidence nodes without supports edges — they still surface in the graph as standalone evidence artifacts. `generated_artifact` nodes come from ontology rules' `expect_any` paths. For each rule, its `expect_any` globs are expanded against the tracked file set (same glob matcher as rule evaluation) and every matched file becomes one artifact node. A `generates` edge is wired from each contributing rule to each artifact. Same artifact referenced by multiple rules dedupes to one node with multiple `generates` edges. Concrete paths and wildcards both work; expected paths missing from the tracked set are skipped. `code_symbol` nodes come from three shallow extractors today. (1) A Go AST scan over tracked `*.go` files (`_test.go` skipped). Exported top-level declarations emit one symbol per name: funcs, types, consts, vars. ID format `code_symbol:.` groups symbols across files in the same package. Methods are skipped — only package-scope functions and value declarations are captured. Each node carries `go_kind` (`func`/`type`/`const`/`var`) and `package` meta. (2) A TypeScript regex-driven scan over `*.ts`/`*.tsx`/`*.mts`/`*.cts` files (test/spec files and `*.d.ts` declarations skipped). Captures `export function`, `export class` (incl. `abstract`), `export interface`, `export type`, `export enum`/`export const enum`, and `export const|let|var`. Default exports of named declarations are captured; anonymous defaults are not. Re-exports (`export { foo } from`, `export *`) are not captured today since they don't introduce a fresh symbol. ID format uses the file path stem as module: `code_symbol:src/api/auth.User`. Imports of relative specifiers (`./b`, `../shared/x`) that resolve to a tracked file emit `depends_on` edges; bare module specifiers (`react`, `@scope/pkg`) are ignored. (3) A Python regex scan over `*.py` files (test_*/_test filenames skipped via the same isTestFile rule used by the test node pass). Captures column-0 `def`, `async def`, `class`, and `UPPER_CASE = …` constants. Nested defs/classes and instance assignments inside methods are intentionally skipped — the import surface is top-level names. Comments and triple-quoted blocks are stripped before scanning. ID format mirrors TS: file stem as module (`code_symbol:app/auth.Session`). Relative imports (`from .session`, `from ..config`, `from . import x`) that resolve to a tracked `.py` file emit `depends_on` edges; absolute imports (`from os import path`, `import json`) are not resolved. A `defines` edge wires from the source file to each symbol. `endpoint` nodes come from three shallow scans today. (1) Go AST walks all `CallExpr` for HTTP route registrations: stdlib `http.HandleFunc(path, h)` / `http.Handle(path, h)` (method `*`, catch-all), plus chi/gorilla/fiber-style `.Get(path, h)` / `.Post` / `.Put` / `.Delete` / `.Patch` / `.Head` / `.Options` (method from the call name). (2) TypeScript regex picks up Express/Fastify/ Hono-style `.get('/x', …)` / `.post` / `.put` / `.delete` / `.patch` / `.head` / `.options`. `.use`/`.all`/`.any` are intentionally skipped — they bind router-wide middleware, not single endpoints. Single-quoted, double-quoted, and template-literal paths are accepted; dynamic paths (`PREFIX + "/items"`, `getPath()`) are skipped. (3) Python regex picks up Flask / FastAPI decorators: `@.get('/x')` / `.post` / `.put` / `.delete` / `.patch` / `.head` / `.options`, plus `@.route('/x')` (catch-all `*`) and `@.route('/x', methods= ['GET','POST'])` (one endpoint per listed method). The path must be the first positional string literal; non-literal first args skip. Everything shares the format `endpoint::`, the `defines` edge from the source file, and `http_method` + `http_path` meta. `expects` edges are the symmetric complement to `generates`. For each ontology rule, the `when` globs are expanded against the tracked file set (same matcher used by rule evaluation), and one `expects` edge fires from the `rule:` node to each matched trigger file. Together with the `generates` edges (from `expect_any` matches), this encodes the full rule constraint as graph edges: a rule's full semantics is "when these files change, expect those artifacts to follow". `implements` edges come from three extractors today. (1) Go AST scan of doc comments on exported declarations. The pattern `(?i)implements[\s:\-]*(US|ADR|IDR)-###` matches both `// implements US-001` and `// Implements: ADR-007` forms. Works for `FuncDecl`, `TypeSpec`, and `ValueSpec` doc comments. (2) TypeScript line-based scan of raw source. Matches `// implements US-001`, JSDoc `/** @implements ADR-007 */` blocks, and same-line trailing comments (`export class Foo {} // implements IDR-002`). The TS keyword `implements` on `class Foo implements IBar` is rejected because `IBar` isn't a typed-id pattern. (3) Python line-based scan covering `# implements ADR-007` line comments, triple-quoted module/function docstrings, and same-line claims. Across all three: edges emit from the `code_symbol` node to the matching typed-id node, repeats within the same source dedupe to one edge, and mere mentions like "see US-001" don't trigger — the `implements` keyword is required. The line-based extractor attaches claims to the NEXT top-level symbol below them (so a JSDoc block above an export catches it correctly, while a claim above a class catches that class, not a later one). `depends_on` edges come from Go imports (Go-only MVP today). The extractor reads the repo's `go.mod`, captures the module path, then for each tracked `*.go` file walks `file.Imports`. Imports matching the module prefix + a tracked directory containing `.go` files emit a `depends_on` edge `file: → directory:`. Stdlib and external dependencies are silently skipped — only in-repo links surface. Multi-file packages produce one edge per importing file (the provenance shows which import resolved). Repos without `go.mod` emit no `depends_on` edges. `supersedes`, `contradicts`, `mirrors`, `invalidates`, and `implements` edges all come from typed-id frontmatter fields. Scalar (`supersedes: ADR-007`) and inline-list (`contradicts: [ADR-001, US-022]`) forms both parse, and a single doc can declare any combination. Cross-kind references work (`ADR-020 supersedes: IDR-005`), self-references are filtered, and edges emit even when the target id isn't tracked — dangling claims surface as useful telemetry. Together they encode deliberate decision lineage: `supersedes` is "this replaces that"; `contradicts` is "this asserts something incompatible with that"; `mirrors` is "this restates that in another scope"; `invalidates` is "this declares that no longer applies"; `implements` is "this decision fulfills that story / fixes that requirement" (symmetric with code-level `// implements US-###` annotations). The LLM-driven flavor of contradiction findings still flows into the `drift.contradiction` meter; the graph edges here capture the deterministic authored claim. `data_model` nodes come from schema-file regex detection across three formats: `.sql` (CREATE TABLE / VIEW / TYPE / MATERIALIZED VIEW, with IF NOT EXISTS + schema-qualified + quoted variants supported), `.proto` (message / enum / service declarations), and `.graphql` / `.gql` (type / input / interface / enum / union). The entity name is slugified and dedup'd across sources — defining the same entity in both `.proto` and `.graphql` (a common cross-tier pattern) produces one node with two `defines` edges. Meta carries `source_kind` for downstream filtering. **M3 catalogue complete:** all 17 node kinds AND all 15 edge kinds from GOAL.md's "Knowledge graph ontology" section are now shipping. The remaining work for M3 is breadth — better Makefile / shell extractors, deeper per-language code coverage, and richer per-rule expectation mining. `coherence status` shows the per-run node/edge count breakdown under "Graph Coverage". `coherence diff` now reports a graph delta alongside the file-level diff: graph delta: nodes +10/-0, edges +9/-0 +node adr adr:ADR-001 +node rule rule:adr-touched-needs-readme +edge defines doc:docs/decisions/ADR-001.md -> adr:ADR-001 +edge suggests rule:adr-touched-needs-readme -> command:cat README.md The combined `--json` output is `{snapshot: {…}, graph: {…}}` so agents can read concept-level changes without re-parsing the prose. ### Semantic hash coverage | kind | semantic hash | | ------------------------------------------ | ------------------------------------------------------------------------------ | | `markdown` | frontmatter + headings + link targets + code-fence languages | | `.go` | AST via `go/parser` + canonical `go/format` (comments stripped) | | `.ts/.tsx/.js/.jsx/.java/.kt/.rs/.sql` | `//` + `/* */` stripped, whitespace collapsed, SHA-256 | | `.py/.rb` | `#` lines + triple-quoted docstrings stripped, whitespace collapsed, SHA-256 | | `yaml` | placeholder (= content hash) — M2 follow-up | | `other` | placeholder (= content hash) | So a typo in Markdown prose leaves `semantic_hash` unchanged; a comment-only edit to a Go function (or a JSDoc-only edit to a `.ts` file) does the same. Renaming a heading, swapping a link target, changing function bodies, etc. all change it. This is what lets `stale_tests` ignore comment-only edits, and the foundation for the deferred CB-011 (semantic no-op) and CB-013 (stale generated artifact) scenarios. ## Watch `coherence watch` runs in two modes: coherence watch --once --json # single-fire snapshot coherence watch --interval=500ms --json # live polling loop (default 1s) `--once` is the first step in the GOAL.md recommended agent sequence: coherence watch --once --json coherence drift --base=HEAD --worktree --json coherence scan --staged --json The single-fire mode is equivalent to `review --base=HEAD --worktree`: same drift wiring, same outcome contract, just labelled `subcommand: "watch"` in the JSON so agents can tell the calls apart. The live loop polls the Merkle root every `--interval` (default 1s). On each detected change it re-runs the review pipeline and emits one JSON document to stdout (newline-delimited; pipe to `jq -c` or stream into any NDJSON consumer). `SIGINT`/`SIGTERM` stops the loop cleanly. The implementation is fsnotify-free — Merkle-root polling is portable and trivially testable, and `snapshot.Compute` is fast enough on real repos. The human output for `watch` and `review` adds a **changed concepts** block whenever a base graph is on disk and nodes were added or removed — this is what surfaces "a new ADR appeared" without re-parsing prose. ## Drift `coherence drift` reads the current ontology, snapshot, and graph (building fresh internally and loading `.coherence/{snapshot,graph}.json` as baselines), computes drift meters, and writes `.coherence/drift.json`. Ships 19 meters today (all 9 GOAL.md M4 meters plus 10 extras): | Meter | Reads | Today's signal | | ------------------------- | ------------------------------------ | ----------------------------------------------------------- | | `required_edge_breakage` | `ontology.yml` + worktree diff | broken_rules / total_rules | | `trace_coverage` | base + current graph | user_story nodes referenced (via defining doc) / total; reports `newly_uncovered_stories` + `newly_covered_stories` when a base graph is on disk | | `neighborhood_drift` | base + current graph | weighted Δ over added/removed nodes and edges | | `semantic_movement` | base + current snapshot | markdown_semantic_changed / markdown_total (noop excluded) | | `path_loss` | BFS over typed edges from each concept (base + current) | concepts that don't reach a `test`/`evidence`/`endpoint`/`generated_artifact` via chain; reports `newly_orphaned_concepts` and `newly_supported_concepts` when a base graph is on disk; `convention=false` (no concept ever supported) skips score-based verdict promotion so kickoff projects don't look like 100% regression | | `blast_radius` | base + current graph | unique 1-hop neighbors of touched nodes (`Score`/`ImpactedNeighbors`) + `CentralityWeight` = sum of touched-node degree (GOAL.md centrality contribution) | | `staleness` | `git log` per tracked file + graph concept-importance | concept-weighted stale-file share (threshold: 90 days); `weighted=false` falls back to uniform `stale_files / total_files` | | `claim_support` | BFS over typed edges from each claim (base + current) | claims that don't reach a `test`/`evidence`/`endpoint`/`generated_artifact` via chain; reports `newly_unsupported_claims` and `newly_supported_claims` when a base graph is on disk; `convention=false` (no claim ever supported) skips score-based verdict promotion | | `contradiction` | optional LLM findings (`--llm`) | count of `llm-contradiction` findings; disabled without LLM | | `stale_decision_links` | `supersedes` + `mentions` traversal | count of docs citing a superseded id without naming the new one | | `broken_implements_chains`| `implements` + `supports` traversal | count of code symbols implementing ids with no evidence packet | | `dependency_cycles` | DFS over `depends_on` (dir-level) | count of import cycles (warn-level — cycles break the build) | | `orphan_endpoints` | `defines` (reverse) + `verifies` (base + current) | count of HTTP endpoints whose source file has no test; reports `newly_orphaned_endpoints` and `newly_covered_endpoints` when a base graph is on disk; `convention=false` (no verifies edge anywhere) skips score-based verdict promotion so kickoff projects without tests yet don't look like 100% orphan regressions | | `unimplemented_stories` | user_story nodes + `implements` | stories with no incoming implements claim (gated on convention) | | `broken_links` | markdown re-scan of tracked .md | inline links to targets missing from the filesystem (untracked-but-on-disk targets like `.gitignore`d LOCAL.md are intentionally allowed) | | `unknown_id_references` | typed-id regex over non-Markdown production code | code mentions of US/ADR/IDR ids not defined in the graph; test files (`*_test.go`, `*.test.ts`, etc.), `.agents/`, and fixture-shaped dirs (`scenarios/`, `fixtures/`, `testdata/`, `golden/`, `eval/`) are excluded | | `stale_tests` | `verifies` + base/current snapshot | tests unchanged while their `verifies`-linked source changed | | `orphaned_metric_aliases` | base+current metric diff + frontend scan | frontend string refs to metric names removed/renamed in current | | `dangling_imports` | TS + Python source re-scan + relative-path resolution (incl. ESM `.js`/`.ts` suffix swap) | count of `./x` (TS) or `from .x` (Py) imports whose target isn't in the tracked set (warn-level — breaks the build); entries carry `lang: "ts"` / `lang: "py"`. TS resolver follows the Node ESM convention where source imports `./foo.js` and resolves to `./foo.ts` on disk | Plus one **optional engine** (opt-in via `ontology.yml`, off by default): | Meter | Input | Output | |---|---|---| | `callsite_blast_radius` | base+current snapshot Go-file diff + native `go/ast` extractor | for each changed top-level Go function, direct + transitive caller counts (`score` = max direct production callers). Telemetry-only — doesn't promote the verdict. Native extractor produces correctly package-qualified call edges; see [`docs/meters/callsite_blast_radius.md`](docs/meters/callsite_blast_radius.md). Enable with `optional_engines.callsite_blast_radius.enabled: true` in `ontology.yml`. | | `dead_code` | full module scan via native `go/ast` extractor | list of unexported Go top-level functions with zero inbound resolved calls (`score` = count). Conservative; function-value passes show up as false positives. Telemetry-only. See [`docs/meters/dead_code.md`](docs/meters/dead_code.md). Enable with `optional_engines.dead_code.enabled: true`. | Each meter also contributes to a top-level `verdict`: - `warn` — actionable findings (broken rules or uncovered stories). - `telemetry` — neighborhood drift exceeded the noise floor, or a support-path regression was detected (any `newly_orphaned_concepts` or `newly_unsupported_claims` since baseline); informative only (matches the `telemetry_only_movement` flag in the JSON outcome contract). A single transition flips the verdict even when the overall score stays below the floor — the suggested action lists the specific concept / claim that lost support. - `clean` — nothing to do. All 9 GOAL.md M4 meters are now shipping, plus ten extra graph-traversal, link-integrity, id-reference, test-staleness, metric-rename, and TS-import-resolution meters: `stale_decision_links`, `broken_implements_chains`, `dependency_cycles`, `orphan_endpoints`, `unimplemented_stories`, `broken_links`, `unknown_id_references`, `stale_tests`, `orphaned_metric_aliases`, and `dangling_imports`. Together that's 19 meters today. The cycle and dangling-imports meters promote to `warn`; convention-gated meters (like `unimplemented_stories`) stay silent unless the repo actually uses the annotation, avoiding false positives on repos that don't. The deterministic 8 always run; `contradiction` is fed by the optional Groq LLM pass — when `review --llm` runs, llm.Run's findings flow into `drift.ComputeWith(opts)` and populate the meter. `path_loss` and `claim_support` share GOAL.md's multi-hop reachability: undirected BFS from each concept/claim node over the typed {describes, mentions, defines, implements, supports, verifies, depends_on, generates, expects} edge set; supported iff the BFS reaches a verifiable artifact (`test` / `evidence` / `endpoint` / `generated_artifact`). `blast_radius` exposes both the raw 1-hop impacted-neighbor count (`Score` / `ImpactedNeighbors`) and the GOAL.md-aligned `CentralityWeight`: sum of degree(touched_node) over distinct touched nodes in the current graph — changes that touch highly-connected nodes weight higher even if the 1-hop count is the same. `staleness` now applies GOAL.md's `concept_importance` weighting: each concept's importance = its incoming `describes`-edge count, each file's weight = the max importance over the concepts its doc describes (non-markdown defaults to 1). The JSON `weighted` flag reports whether the graph had any concept nodes — when zero, the score degrades to the uniform `stale_files / total_files` share. Exit code: `1` only on `warn`; `telemetry`/`clean` are 0. Pass `--strict` to `coherence drift`, `coherence review`, or `coherence watch --once` to also exit 1 on `telemetry` — useful for CI gates that want zero-drift commits, where any movement (including diff-aware regressions like `newly_orphaned_concepts`) should block the merge. The live `coherence watch` loop ignores `--strict` (it streams events; there's no single exit code to promote). For agent consumers: the drift report exposes a top-level `active_meters` field listing the names of meters that contributed signal to the verdict (mirrors the verdict-promotion gates). Agents triage with `drift.active_meters.length > 0` rather than inspecting every per-meter score. The drift report also exposes a top-level `regressions` field aggregating the four diff-aware `newly_*` lists (`newly_orphaned_concepts`, `newly_unsupported_claims`, `newly_uncovered_stories`, `newly_orphaned_endpoints`) plus a `count` total. A single check on `drift.regressions.count > 0` answers "did this commit regress anything?" without navigating four nested meter blocks. `drift.regressions.entries` is the preferred iteration surface: a flat `[{kind, id, suggested_action}, …]` list (kinds: `newly_orphaned_concept` / `newly_unsupported_claim` / `newly_uncovered_story` / `newly_orphaned_endpoint`). Each entry carries its own `suggested_action` string with the specific node id baked in, so an agent looping the entries gets both the WHAT and the HOW in one pass — no separate cross-reference into the top-level `suggested_actions` list needed. ### `review` now includes drift `coherence review` automatically runs drift after the rules engine and embeds the full drift report in its JSON payload under the `drift` key. The top-level outcome contract gains three fields: - `drift_verdict` — `clean` / `telemetry` / `warn`, - `telemetry_only_movement` — set to `true` when drift is `telemetry` (matching the JSON outcome contract spec), - `drift_regression_count` — total entries across the four diff-aware regression lists (`newly_orphaned_concepts` + `newly_unsupported_claims` + `newly_uncovered_stories` + `newly_orphaned_endpoints`). Omitted when 0. Agents can gate on `drift_regression_count > 0` for a single-key regression check. When non-zero on a telemetry verdict, the outcome also flips `review_recommended` to `true` and sets `recommended_next_command` to `"coherence drift --json"` — pure movement-driven telemetry stays informational. - `drift_regressions` — the full typed list of regressions (`[{kind, id, suggested_action}, …]`) inline in the outcome contract. Omitted when empty. Lets an agent reading just the outcome JSON act on the WHAT and the HOW without descending into the full drift report. `scan` and `check` deliberately skip drift to stay fast — they're the pre-commit gate. `review` is where the full picture comes together. ## Doctor `coherence doctor` performs a quick environment check after `init` or before adopting the tool in a new repo: coherence doctor # human output coherence doctor --json # machine-readable It validates that `ontology.yml` loads, `.githooks/pre-commit` is present and executable, `.coherence/` is gitignored, the local state directory is healthy, and `.agents/skills/coherence/SKILL.md` has valid skill frontmatter. It also warns on legacy `.coherence/skills/agent.md`. Exit code is `1` only when a check is `fail`; `warn` issues are reported but do not block. ## LLM pass Set `COHERENCE_LLM=1` or pass `--llm` to enable the optional Groq pass. It uses `GROQ_API_KEY`, defaults to `llama-3.3-70b-versatile`, and can be overridden with `COHERENCE_GROQ_MODEL`. Hard cap: 3 calls per run; findings are always `warn` from the LLM directly, but a contradiction count > 0 also bumps the drift verdict to `warn` so callers see the actionable signal. ### Candidate selection - `scan` / `check` use **`SelectCandidatesFromStaged`** — staged markdown under `docs/{user-stories,specs}/`. Same behavior as before. - `review` / `watch` use **`SelectCandidatesFromSnapshotDiff`** — markdown files whose `semantic_hash` flipped between the on-disk snapshot baseline and the current state. Noop typo changes are excluded; new markdown files are included. This closes M6 box 1 ("LLM review consumes graph candidates, not whole repo text") by spending the per-run LLM budget on files with real semantic edits rather than every staged markdown. When no base snapshot is available (no prior `coherence index`), review falls back to the staged-glob selector so the LLM pass still runs sensibly.

标签：AI辅助编程, EVTX分析, LNA, SOC Prime, 云安全监控, 开发工具, 日志审计, 网络安全研究, 静态分析