hinanohart/conformlock

GitHub: hinanohart/conformlock

Stars: 0 | Forks: 0

# conformlock `conformlock` is a small CPU-only Python library that wraps a streaming ML predictor with four runtime checks and a tamper-evident audit log: 1. **Split conformal prediction** (and an adaptive variant in the style of Gibbs & Candès 2021) for per-decision prediction intervals / sets. 2. **Finite-trace temporal-logic property automata** (LTLf, evaluated incrementally) for "the system did *X* before *Y*" rules over the recent decision stream. 3. **Online drift detectors** (ADWIN, CUSUM, Page-Hinkley, sliding KS, PSI) to flag when calibration is no longer trustworthy. 4. **Append-only audit ledger** using BLAKE3 hash chaining + ULID identifiers, so a verifier can later show that a recorded decision was not edited after the fact. The ledger is **tamper-evident only against in-place edits under a single-writer assumption**: an attacker who can replace the whole file undetected, or two concurrent writers across processes, can rewrite history. External-verifiability anchoring (Sigstore/Rekor or a public chain) is on the v0.2 roadmap. Within a single process, `Ledger.append` is now thread-safe (added in v0.1.0a3). If any check rejects a decision, `conformlock` returns an `abstain` verdict instead of the model's prediction; the caller decides what to do (escalate to a human, return a default, retry, etc.). ## What this is — and is not | Is | Is not | |---|---| | An engineering convenience layer over well-known statistical methods | A formal verification system | | CPU-only, deterministic given seeds | A guarantee of safety, correctness, or compliance | | Useful for streaming tabular / time-series inference | An LLM agent oversight tool (see [Subjunctor](https://github.com/hinanohart/subjunctor) for that scope) | | A starting point for runtime monitoring policies | A drop-in replacement for human review | The word "lock" in the name is metaphorical; this library does not lock anything down. ## Install # Not on PyPI yet — install from the GitHub release tag. pip install "conformlock @ git+https://github.com/hinanohart/conformlock@v0.1.0a3" # or with notebook ML extras: pip install "conformlock[ml] @ git+https://github.com/hinanohart/conformlock@v0.1.0a3" Extras: `[ml]` adds `scikit-learn` and `torch` (notebook examples only); the core has no ML framework dependency. ## 30-second example import numpy as np from conformlock import ( ConformalCalibrator, Decision, LTLfSpec, Verifier, ) # 1. Calibrate split conformal on a held-out set. cal = ConformalCalibrator(alpha=0.1) # target 90% coverage cal.fit(scores=np.array([0.12, 0.08, 0.31, 0.05, 0.22])) # 2. Write the LTLf rule: # "if a 'risky' decision is made, an 'audit' decision must follow within 5 steps". spec = LTLfSpec.parse("G (risky -> F[0,5] audit)") # 3. Build a verifier. The calibrator already carries the target ``alpha``; # the verifier just orchestrates conformal → property → drift. v = Verifier(calibrator=cal, spec=spec) # 4. Use it on the streaming predictor. ``stream`` is whatever your caller # yields; one ``Decision`` per inference is the contract. for record_id, score, atom in stream: # e.g. atom in {"risky", "audit", ""} decision = Decision( record_id=record_id, score=score, atoms=frozenset({atom}) if atom else frozenset(), ) verdict = v.observe(decision) if verdict.action == "abstain": handle_escalation(verdict) # e.g. route to a human reviewer else: act_on(verdict) # the caller decides what acting means Numbers in the snippet above are illustrative inputs, not benchmark claims; `stream`, `handle_escalation`, and `act_on` are placeholders the caller supplies. See `examples/` for runnable scripts with deterministic seeds and printable output, and `tests/test_readme_example.py` for the CI-enforced regression test that runs this exact snippet end-to-end. ## What `conformlock` does *not* try to do - It does **not** prove that the underlying model is calibrated, fair, or correct. - It does **not** claim coverage guarantees in the strict statistical sense once the data-generating distribution drifts; drift detection only tells you that the assumption has likely broken. - It does **not** verify LLM agents, function-call traces, or tool use — see [Subjunctor](https://github.com/hinanohart/subjunctor). - It is **not** evaluated against any regulatory certification scheme. ## Why another conformal library? | Library | Latest release | Streaming online conformal | LTLf/MTL property layer | Tamper-evident ledger | License | |---|---|---|---|---|---| | [MAPIE](https://github.com/scikit-learn-contrib/MAPIE) | v1.4.0 (2026-04-30) | Partial (batch time-series only, no ACI) | No | No | BSD-3-Clause | | [crepes](https://github.com/henrikbostrom/crepes) | active | No streaming hook | No | No | BSD-3-Clause | | [nonconformist](https://github.com/donlnz/nonconformist) | maintenance only | No | No | No | MIT | | **conformlock** | **v0.1.0a3 (2026-05-24)** | **Yes (split CP + ACI; ACI advances through `Verifier.record_outcome`)** | **Yes (self-implemented LTLf, finite-trace re-evaluator; no DFA pre-compile)** | **Yes (BLAKE3 chain; single-writer threat model — see ledger note below)** | **MIT** | ### Adjacent OSS we are *not* trying to replace The combination *split conformal + temporal-logic monitor + drift + ledger* is the unit of value `conformlock` claims to add; on any single one of those four axes, several mature OSS projects already exist and `conformlock` is not trying to replace them: - [alibi-detect](https://github.com/SeldonIO/alibi-detect) — drift and outlier detection (no conformal, no temporal logic, no ledger). - [Frouros](https://github.com/IFCA-Advanced-Computing/frouros) — 31 drift-detection methods (no conformal, no temporal logic, no ledger). - [NannyML](https://github.com/NannyML/nannyml) — drift + performance estimation under absent labels (no conformal interval, no temporal logic, no ledger). - [Evidently](https://github.com/evidentlyai/evidently), [deepchecks](https://github.com/deepchecks/deepchecks), [whylogs](https://github.com/whylabs/whylogs) — ML-monitoring dashboards (no per-decision conformal interval, no temporal logic). - [river](https://github.com/online-ml/river) — online learning primitives (no conformal, no ledger). If you only need the *drift* axis you should look at those first; `conformlock` exists for the case where you specifically need the per-decision conformal abstain + finite-trace temporal-property monitor + tamper-evident log together. No equivalent OSS combination was found at scaffold time (2026-05-24); please file an issue if one exists. ## Regulatory framing — read carefully The EU AI Act ([Article 15 — Accuracy, Robustness and Cybersecurity](https://artificialintelligenceact.eu/article/15/), enforceable for high-risk AI systems from **2 August 2026**) requires high-risk systems to "achieve an appropriate level of accuracy, robustness and cybersecurity, and perform consistently … throughout their lifecycle," and to disclose accuracy metrics in the instructions for use. `conformlock` is **designed with reference to** that text: it gives operators a programmatic way to produce per-decision uncertainty bounds, detect distributional drift, and retain an audit log. It does **not** by itself make any AI system "Article 15 compliant"; compliance is an organisational and process determination that an operator's notified body or supervisory authority makes. Similarly, the library is **not** certified against ISO/IEC 23894:2023, NIST AI RMF, FDA SaMD Good Machine Learning Practice, or any other framework. We deliberately avoid the marketing register that would imply such a posture (see `docs/honest-marketing-policy.md` for the exact CI-enforced exclusion list). ## How this release was assembled — read carefully `conformlock` v0.1.0a1 and v0.1.0a3 were assembled by an LLM-driven autonomous workflow under the project author's account (`hinanohart`). The only human in the loop is the author; no third party has independently reviewed the implementation or the marketing claims at release time. The git tag `v0.1.0a3` supersedes `v0.1.0a1`, which was retained on GitHub solely so the audit trail remains intact; new users should install `v0.1.0a3` and treat the line accordingly until an external reviewer signs off. Issues filed against either tag are welcome. ## Related work and prior art - Vovk, Gammerman, Shafer — *Algorithmic Learning in a Random World* (Springer 2005) — split / inductive conformal prediction. - Gibbs & Candès (2021) — *Adaptive Conformal Inference Under Distribution Shift* (NeurIPS 2021) — ACI update rule. - De Giacomo & Vardi (2013) — *Linear Temporal Logic and Linear Dynamic Logic on Finite Traces* (IJCAI 2013) — LTLf semantics. - Bauer, Leucker, Schallhart (2011) — *Runtime Verification for LTL and TLTL* (ACM TOSEM 20(4)) — three-valued LTL₃ semantics; this library's permanent-verdict heuristic follows the same spirit. - Lindemann, Qin, Fan, Pappas, Bastani (2022) — *Conformal Prediction for STL Runtime Verification* ([arXiv:2211.01539](https://arxiv.org/abs/2211.01539)) — academic predecessor targeting **STL** (continuous-time temporal logic); `conformlock` targets **LTLf** (discrete-step finite-trace) and ships a public MIT implementation. - Bifet & Gavaldà (2007) — *Learning from Time-Changing Data with Adaptive Windowing* — ADWIN. - O'Connor, Aumasson, Neves, Wilcox-O'Hearn — *BLAKE3* (2020) — hash function used for the ledger. ## Project layout conformlock/ ├─ src/conformlock/ # core library (numpy + scipy + blake3 + python-ulid) ├─ tests/ # unit + property + ledger-tamper tests ├─ examples/ # runnable CPU-only examples ├─ notebooks/ # optional [ml] extra: scikit-learn / torch ├─ docs/ # background and design notes ├─ CHANGELOG.md ├─ ROADMAP.md └─ LICENSE # MIT ## Roadmap See [ROADMAP.md](ROADMAP.md). Highlights: a more general MTL fragment, an offline `conformlock-replay` tool to re-verify an existing ledger, and HuggingFace dataset bindings (deferred to v0.1.1). The names `verielle` and `decisionscope` appear in the roadmap as **v0.3 backlog** items, not promises. ## License MIT License — see [LICENSE](LICENSE).