DT-Foss/O1-O

GitHub: DT-Foss/O1-O

一套不依赖LLM的确定性代码合成系统,利用代数类型系统和因果知识图谱在毫秒级离线生成结构上无幻觉的工作程序。

Stars: 0 | Forks: 0

# O1-O — a deterministic code synthesis operator **A constant-time program-composition system whose output is structurally hallucination-free, built on an 8-color algebraic type system over a 7-pass causal knowledge graph, with a 7-layer verification pipeline and an autonomous 9-phase engagement operator.** **Author:** David Tom Foss · **Disclosed:** 2026-06-26 · **License:** Apache-2.0 ### Why am I publishing this? The short version: this thing was originally a demo I had lined up for the German *Bundeswehr Kommando CIR* (the cyber and information domain command). Walked into the meeting, pretty quickly realised the person on the other side of the table had no clue what any of this was, and got offered an IT-specialist apprenticeship instead. *Bestens.* So the repo sat on my disk since the start of 2026, doing nothing. Then I looked around and noticed it's June 2026 and most of the Western AI ecosystem is being throttled, gated, region-blocked, refusal-tuned, or just plain walled off behind whichever frontier lab happens to ship next — Fable 5, GPT-5.6, you name it. Compute access for independent researchers gets squeezed quarter by quarter; "we'd love to release this but…" has become the dominant note in releases. The center of gravity is drifting somewhere most of us can't actually reach anymore. So here's a counter-move: an entire deterministic code-synthesis stack, with zero LLM in the loop, that runs offline on a laptop and composes working programs in ~300 milliseconds out of a verified-fragment registry over a `.causal` knowledge graph. No API, no rate limit, no refusal policy, no abuse review queue, no "we're sorry, this request cannot be processed." It just composes the program. The 9 peer-reviewed papers further down validate the substrate underneath — the 8-color algebraic type system, the 7-pass causal inference engine, the 14-step Foss Gate, the seven-layer verification pipeline — across nuclear knowledge graphs, post-quantum cryptography, cipher cryptanalysis on real ISO/IEC and NIST standards, biomedicine, genomics, Monte Carlo PRNGs, and a bit-perfect production IBM z/OS mainframe assessment validated on real z15 hardware. The "offensive code synthesis" framing in the rest of this README is just the surface that this particular instance happens to be pointed at — because that's the surface CIR cared about. The substrate underneath is domain-agnostic. Point it at protein folding, point it at compiler verification, point it at your own knowledge graph in your own field — it doesn't care. Fork it, extend it, break it, ship something better. *Habt Spaß damit.* If anyone out there is worried about being left behind by whichever lab gates whichever model next, here's one whole working system that runs on your own hardware and doesn't care what the rate-limit table at the API gateway says today. ### A note on availability Heads-up before you open an issue: I have **zero bandwidth for anything** over the next few months — the nine conferences listed further down are all back-to-back this summer and into September, and I'm presenting at every single one of them. Rome (July), Nanjing (July), Seattle (July/August), Almaty (September). Plus the IBM z/OS coordination window. There's just nothing left. A proper paper on O1-O itself will follow once the conference season ends — that's the plan. Until then, the README is the document. Every claim in here is reproducible from the code committed in this repository; you don't need me in the loop to verify anything. If you want to reach me about O1-O specifically — bug reports, fork coordination, serious collaboration, journalistic enquiries, vendor coordination, the lot — write to **dtfoss-dev@proton.me**. Replies will be slow. I will read everything; I will not reply to most of it before October 2026. Issues and pull requests on GitHub are fine too, same caveat applies. If you fork and extend the substrate to a different domain (which is the whole point of publishing this), I'd genuinely love to hear about it eventually — even if I can't reply for a while. ## The architecture at a glance flowchart TB subgraph IN["INPUT"] intent["Natural-language intent
e.g. 'build a port scanner
with banner grabbing'"] end subgraph L1["Layer 1 — Intent processing"] parser["Intent Parser
tokenize · stem · disambiguate
classify mode · extract params"] memory["Session Memory
pronoun resolution
topic tracking · slot filling"] end subgraph L2["Layer 2 — Knowledge graph"] causal["132 .causal binary graphs
48k+ explicit triplets"] infer["7-pass deterministic inference
exact · direction · fuzzy ·
analogical · cross-domain ·
contextual · recombination"] harvest["AutoHarvester
+ AutoBridge
+ WebHarvester"] end subgraph L3["Layer 3 — Composition (algebraic type system)"] colors["8 colors: TEXT · STRUCT · TABULAR ·
BYTES · SERIAL · PATH · RESPONSE · VOID"] registry["~632 fragment registry entries
337 intent → color-chain patterns"] assembler["Color Assembler
+ Code Assembler
1245 code fragments"] end subgraph L4["Layer 4 — Seven-layer verification"] v1["1. Compile gate"] v2["2. Structural intent"] v3["3. Property-based
(Hypothesis)"] v4["4. Algebraic properties
(8 properties)"] v5["5. Symbolic execution"] v6["6. Taint analysis"] v7["7. Logic-consistency"] end subgraph L5["Layer 5 — Evasion / detection awareness"] det["46 detection signatures
4 classes (string · behavioral ·
entropy · import-table)"] sem["17 semantic transform classes"] mut["5-level mutation engine
+ 6 AST mutation operators"] edr["EDR Subverter
+ Canary Detector
+ Anti-Forensics"] end subgraph L6["Layer 6 — Self-improvement"] gap["Gap Detector
4 gap classes"] loop1["Code-pattern
learning"] loop2["Failure-pattern
memory"] loop3["Bridge
generation"] loop4["Knowledge
harvesting"] end subgraph L7["Layer 7 — Native / binary operations"] native["GCC · NASM · LIEF"] poly["4 byte-level polyglots
PDF/JS · PNG/HTML ·
JPEG/ZIP · MP4/PE"] platform["Inline PE / ELF /
Mach-O parsers"] end subgraph L8["Layer 8 — Autonomous engagement operator"] engage["/engage IP <target>
9-phase kill chain · adaptive retry ·
autonomous lateral movement · pivot"] end subgraph L9["Layer 9 — MITRE coverage + reporting"] mitre["14/14 tactics · 49 techniques
145 fragment mappings
Standard ATT&CK Navigator JSON"] end subgraph L10["Layer 10 — Operations persistence"] ops["Operations DB
AES-256-CTR · PBKDF2 600k
from-scratch stdlib only"] end subgraph L11["Layer 11 — Specialty surface exploiters"] surf["WiFi · USB · VPN · Email · ML ·
EDR · Credentials · Miner · Hash-mon"] end subgraph OUT["OUTPUT"] tool["Deployment-ready tool
source · standalone binary ·
Dockerfile · OPSEC profile ·
threat model · deployment guide"] end intent --> parser parser --> memory memory --> infer causal --> infer harvest -.->|feeds| causal infer --> assembler colors --> registry registry --> assembler assembler --> v1 v1 --> v2 --> v3 --> v4 --> v5 --> v6 --> v7 v7 --> det det --> sem --> mut edr -.->|pre-flight| det mut --> native native --> poly poly --> platform platform --> tool tool --> mitre mitre --> ops L8 -.->|orchestrates| L1 L8 -.->|persists in| ops L11 -.->|invoked by| L8 tool -.->|success| loop1 tool -.->|failure| loop2 loop1 -.->|writes| causal loop2 -.->|writes| causal loop3 -.->|writes| causal loop4 -.->|writes| causal gap -.->|drives| loop1 gap -.->|drives| loop2 gap -.->|drives| loop3 gap -.->|drives| loop4 style IN fill:#1a1a2e,color:#fff,stroke:#fff style OUT fill:#1a1a2e,color:#fff,stroke:#fff style L1 fill:#16213e,color:#fff style L2 fill:#16213e,color:#fff style L3 fill:#0f3460,color:#fff style L4 fill:#0f3460,color:#fff style L5 fill:#533483,color:#fff style L6 fill:#533483,color:#fff style L7 fill:#533483,color:#fff style L8 fill:#e94560,color:#fff style L9 fill:#16213e,color:#fff style L10 fill:#16213e,color:#fff style L11 fill:#533483,color:#fff Every layer is independently auditable: each box in this diagram corresponds to one or more files under `src/core/`, every count is verifiable by running `wc`, `grep`, and `find` against the committed source. The full architecture totals ~30,000 LOC across 107 modules. ## Thesis **Code composition does not require a language model.** It requires a closed-set type system, a verified knowledge graph, and a verification pipeline that catches every structural class of error. Given these three, "natural language → working program" reduces from sampling tokens in an unbounded space to traversing edges in a finite typed graph. LLM-based code generation is a sampling process. Sampling produces *plausible* outputs — outputs that look correct token-by-token to the model that produced them. *Plausible* is not *correct*. This is the hallucination wall, and it is structural to the sampling architecture, not a property of model size or training data. O1-O composes code by type-matched edge lookup in an algebraic graph. There is no sampling distribution. There is no plausibility heuristic. There is only: does the output color of fragment A equal the input color of fragment B? If yes, the composition is legal. If no, the composition is rejected before code is emitted. Hallucination is *structurally excluded by construction*, not statistically reduced by training. The same `.causal` substrate that drives the inference engine in O1's living mind drives the knowledge layer here. The same `.causal` engine that powers nine peer-reviewed papers across four 2026 IEEE conferences powers the composition lookup. The architecture is one stack with three points of contact: O1 *consults* the knowledge graph in flight, GSSM *integrates* the stream that the knowledge graph indexes, and O1-O *composes* working programs from the graph deterministically. ## The headline numbers Every number below is gathered by literally running `find`, `grep`, `wc -l` against the committed source. No estimates. | Metric | Value | |---|---| | Total Python platform code | ~30,000 LOC | | Core modules (`src/core/`) | **107** | | Code fragments (`fragments/`, 73 thematic JSON files) | **1,245** | | Binary `.causal` knowledge graphs (`knowledge/`) | **132** | | Source triplet JSON files (`triplets/`) | **35** | | External dependencies | **4** (msgpack, jellyfish, requests, beautifulsoup4) | | LLM/network calls during generation | **0** | | Average generation latency per tool | **270–613 ms** | | Architecture | Count | |---|---| | Color-type registry entries | **~632** | | Intent-to-color-chain regex patterns | **337** | | Inference engine passes | **7** | | Verification pipeline layers | **7** | | Detection signatures | **46** (over 4 classes) | | Semantic evasion transform classes | **17** | | Syntactic mutation levels | **5** | | AST mutation operators | **6** | | MITRE ATT&CK tactics covered | **14/14 (100 %)** | | MITRE ATT&CK techniques mapped | **49 / 145 fragment mappings** | | Self-improvement closed-loops running in parallel | **4** | | Auto-fix failure-class strategies | **11** | | Algebraic properties checked per output | **8** | Plus: AES-256-CTR with PBKDF2 600k iterations implemented from scratch in the Python standard library; inline PE / ELF / Mach-O / Mach-O Fat parsers without any external binary tooling; the full system runs air-gapped on a Mac mini. ## How to run it git clone https://github.com/DT-Foss/O1-O cd O1-O pip install -r requirements.txt python3 src/o1o_live.py --demo Or interactively: python3 src/o1o_live.py The REPL responds to free-text intent ("build a port scanner with service detection") and to 28 slash-prefixed commands documented in `/help`. The autonomous engagement operator is `/engage ` — described in detail below. ## The architecture — eleven layers The system is organized as eleven cooperating layers. Every layer is reproducible from the source paths given. ### Layer 1 — Intent processing - **Intent Parser** (`src/core/intent_parser.py`, 412 LOC). NLP without an LLM. Six steps: tokenize → stopword-strip → stem → fuzzy-match against the entity index of the knowledge graph (Jaro-Winkler) → classify mode (BUILD/CHAT/DEBUG/LEARN/DOMAIN) → extract parameters (paths, formats, numbers) → detect multi-step composition. Disambiguation of polysemous tokens (`command`, `injection`, `encryption`) is performed by **set intersection of context tokens against per-sense keyword sets** — no model, no embedding. - **Session Memory** (`src/core/session_memory.py`, 280 LOC). Multi-turn state with pronoun resolution, topic tracking across 7 domain buckets, incremental-intent markers, slot filling, persistent `project.causal` for cross-session learning. ### Layer 2 — The knowledge graph (7-pass deterministic inference) The knowledge layer is a **directed labelled graph of causal triplets** stored in a binary format and traversed by a seven-pass deterministic inference engine. Both the storage format and the inference engine are designed against a single architectural constraint: **every recall must be auditable back to a literal source**. No embeddings, no neural retrieval, no "the model knows because it was trained" — every fact in the knowledge base has a source tag pointing at a `.causal` file, every inferred fact has a chain of source-tags pointing at the parents that produced it. #### 2.1 — The triplet, formally A triplet is an ordered 3-tuple plus metadata: $$ t = (h, r, t, c, s, m) \in \Sigma \times \mathcal{R} \times \Sigma \times [0,1] \times \mathcal{S} \times \mathcal{M} $$ where | Symbol | Meaning | Implementation | |---|---|---| | $h$ | head entity (the *trigger*) | string, e.g. `"port_scanner"` | | $r$ | relation (the *mechanism*) | string from $\mathcal{R}$, e.g. `"scans"` | | $t$ | tail entity (the *outcome*) | string, e.g. `"network"` | | $c$ | confidence | float in $[0,1]$ | | $s$ | source provenance | the originating `.causal` file or inference pass that produced this triplet | | $m$ | optional metadata | dict (e.g. `_fuzzy_bridge`, `_analogy`, `_shared_context`) | The knowledge graph is the union of all triplet sets across all loaded `.causal` files, plus all triplets inferred by the 7-pass engine: $$ \mathcal{G} = \bigcup_{f \in \text{Files}} T_f \;\cup\; \bigcup_{p=1}^{7} \text{Pass}_p(\mathcal{G}) $$ where the inference passes operate over an immutable snapshot of the explicit graph plus all previously-inferred triplets, in fixed pass order. The engine is **monotone**: no pass removes triplets, every pass strictly extends the graph or leaves it unchanged. #### 2.2 — The `.causal` binary format The on-disk format is a 6-byte magic header (`CAUSAL`), a 2-byte big-endian version, then a zlib-compressed `msgpack` blob. # core/learning.py — the format reader with open(self.learned_path, 'rb') as f: magic = f.read(6) if magic != b'CAUSAL': return version = int.from_bytes(f.read(2), 'big') compressed = f.read() data = zlib.decompress(compressed) graph = msgpack.unpackb(data, raw=False) triplets = graph.get('triplets', []) Three design constraints drive the format choice: 1. **Compact** — 132 knowledge graphs ship at 9.5 MB total. Average 73 KB per domain. Loading the full set at boot takes under 200 ms on a Mac mini. 2. **Deterministic** — `msgpack` produces canonical byte representations for primitive types. The same triplet set serializes to identical bytes regardless of which Python process wrote it. Bit-exact reproducibility is a property of the format, not a property of the serializer's runtime. 3. **Self-contained** — every `.causal` file is a standalone domain. Adding a new domain is a single file drop. There is no global schema migration, no central index to rebuild. #### 2.3 — The four indices Loading a `.causal` graph builds four lookup structures over the triplet set: | Index | Maps | Size | Purpose | |---|---|---|---| | `entity_index` | entity → [triplets containing this entity] | $O(\|\mathcal{E}\| \cdot k)$ | content-addressable entity lookup | | `trigger_index` | trigger → [triplets with this trigger] | $O(\|\mathcal{E}\|)$ | forward traversal | | `outcome_index` | outcome → [triplets with this outcome] | $O(\|\mathcal{E}\|)$ | reverse traversal | | `all_triplets` | flat list | $O(\|\mathcal{G}\|)$ | full-scan iteration | The dual `trigger_index` / `outcome_index` design enables O(1) bridge-entity detection: *an entity is a bridge between two triplets if and only if it appears as both an outcome and a trigger*, found by the set intersection $\text{trigger_index.keys()} \cap \text{outcome_index.keys()}$. This intersection is the search space of the exact-chaining pass (Pass 1) — a one-line set operation rather than a full-graph traversal. At boot on the shipped knowledge base: **46,442 explicit triplets** are loaded across 44 domains; **43,693 entities** are indexed; **23,500 triplets** are inferred by the 7-pass engine, lifting the total reachable knowledge to **69,942 triplets** at a measured amplification of **+51 %**. #### 2.4 — The seven inference passes flowchart TB explicit[Explicit triplets
46,442 from 132 .causal files] p1[Pass 1 — Exact chain
transitive closure via bridge entities] p2[Pass 2 — Semantic direction
positive · negative · neutral propagation] p3[Pass 3 — Fuzzy match
Jaro–Winkler ≥ 0.90 in prefix buckets] p4[Pass 4 — Analogical
similar entities transfer attributes] p5[Pass 5 — Cross-domain analogy
mechanism-structure signature matching] p6[Pass 6 — Contextual cross-graph
shared-neighbor activation] p7[Pass 7 — Creative recombination
bridge-entity crossover] decay[Confidence decay + reward/penalty
Hebbian update on edge weights] final[Knowledge graph at query time
69,942 reachable triplets · 43,693 entities] explicit --> p1 p1 --> p2 p2 --> p3 p3 --> p4 p4 --> p5 p5 --> p6 p6 --> p7 p7 --> decay decay --> final style explicit fill:#1a1a2e,color:#fff style p1 fill:#0f3460,color:#fff style p2 fill:#0f3460,color:#fff style p3 fill:#0f3460,color:#fff style p4 fill:#533483,color:#fff style p5 fill:#e94560,color:#fff style p6 fill:#533483,color:#fff style p7 fill:#533483,color:#fff style decay fill:#16213e,color:#fff style final fill:#1a1a2e,color:#fff Each pass is bounded (`max_inferred` cap per pass) and confidence-filtered (only triplets with $c \geq \theta$ enter the graph, $\theta$ chosen per pass to balance precision against recall on the held-out task suite). ##### Pass 1 — Exact transitive chain For each entity $b$ that appears as both an outcome and a trigger (the **bridge set**), join incoming and outgoing triplets: $$ \frac{ (h_1, r_1, b, c_1) \in \mathcal{G} \quad\text{and}\quad (b, r_2, t_2, c_2) \in \mathcal{G} }{ (h_1, \text{chains_to}, t_2, c_1 \cdot c_2 \cdot 0.85) \in \mathcal{G} } $$ The factor $0.85$ is the *transitivity discount* — chained inferences are weaker than direct facts. Cycles ($h_1 = t_2$) and trivial chains where the bridge equals one endpoint are filtered. Acceptance threshold $\theta_1 = 0.30$; emission cap $8{,}000$ triplets per pass. This is the deterministic analogue of a **single-step transitive closure** in the description-logic sense: it computes $\mathcal{G}^{+}$ where $+$ denotes the confidence-weighted Kleene closure under bridge-joining. ##### Pass 2 — Semantic direction propagation Mechanisms are classified into three direction classes by literal substring matching against two lexica defined in `core/knowledge_engine.py`: POSITIVE_MECHANISMS = { 'uses', 'requires', 'reads', 'writes', 'creates', 'generates', 'returns', 'produces', 'manages', 'provides', 'enables', 'supports', 'implements', 'handles', 'processes', 'converts', 'parses', 'solves', 'solved_by', 'implemented_via', 'processed_by', 'type_of', 'is', 'iterates over', 'traverses', 'lists', 'displays', 'formats', 'idiom', 'composition', 'pipeline', 'bridge', } # 32 mechanisms NEGATIVE_MECHANISMS = { 'caused_by', 'raises', 'throws', 'blocks', 'prevents', 'breaks', 'conflicts_with', 'deprecates', 'removes', 'deletes', } # 10 mechanisms Anything not matching either is classified `neutral`. Two-step inference then follows propositional-logic-style direction-chaining rules: | $\text{dir}(r_1)$ | $\text{dir}(r_2)$ | result direction | $\gamma$ (confidence weight) | reasoning | |---|---|---|---|---| | positive | positive | positive | 0.80 | "A enables B, B enables C" ⇒ A enables C | | negative | negative | positive | 0.75 | "A prevents B, B prevents C" ⇒ A enables C (double negation) | | positive | negative | negative | 0.75 | "A enables B, B prevents C" ⇒ A prevents C | | negative | positive | negative | 0.75 | "A prevents B, B enables C" ⇒ A prevents C | | neutral | neutral | neutral | 0.70 | fallback | Confidence of the inferred triplet: $c_{\text{new}} = c_1 \cdot c_2 \cdot \gamma$. Inferred mechanism name: `indirectly_` for positive results, `inversely_` for negative, `relates_to` for neutral. The five-rule direction calculus is **isomorphic to the truth table of logical implication under sign multiplication** — a discrete, finite-arithmetic propagation of causal polarity through chains. Acceptance threshold $\theta_2 = 0.30$; cap $5{,}000$. ##### Pass 3 — Fuzzy entity bridging Linguistically equivalent entity names (`port_scan` vs `port_scanner`, `aes_encrypt` vs `aes_encryption`) appear in different `.causal` files and would otherwise remain unconnected. Pass 3 bridges them using **Jaro–Winkler similarity** with a precision-tuned threshold: $$ \sigma(e_1, e_2) = \text{JW}(e_1, e_2), \qquad \text{accept if } \sigma \geq 0.90 $$ Naïvely, this is $O(|\mathcal{E}|^2)$ — on 43,693 entities, ~1.9 billion comparisons. Pass 3 accelerates it by **prefix bucketing**: 1. Group entities by their first 3 lowercase characters. 2. Compare entities only within the same bucket. Average bucket size $\bar k = |\mathcal{E}| / B$ where $B$ is the number of populated buckets. On the shipped knowledge base $B \approx 8{,}000$, so $\bar k \approx 5.5$, and the comparison count drops from $\approx 2 \cdot 10^9$ to $\approx 1.3 \cdot 10^5$ — a 4-order- of-magnitude reduction with no recall loss (entities that differ in their first three characters are almost never the same concept). For each accepted bridge $\sigma$, both directions are emitted: $$ (e_2, r, o, c \cdot \sigma \cdot 0.75) \quad\text{and}\quad (e_1, r, o, c \cdot \sigma \cdot 0.75) $$ The factor $0.75$ is the fuzzy-bridge discount. Acceptance threshold inherited from Pass 1; cap $2{,}000$. Each inferred triplet carries a `_fuzzy_bridge` metadata tag documenting which entity pair was bridged. ##### Pass 4 — Analogical reasoning If $A \xrightarrow{\text{uses}} X$ and $B$ is similar to $A$, then $B$ likely also participates in patterns involving $X$. Pass 4 propagates these analogies, transferring attribute-bearing relations across entities that share enough structural similarity to be candidates for property inheritance. Confidence: $c_{\text{new}} \approx 0.5 \cdot \sigma$. Cap $2{,}000$. This is the symbolic substrate analogue of the **transfer-by-similarity** mechanism that appears emergently in embedding-space retrieval — but here it is an explicit algorithm operating on auditable entity pairs, not a probabilistic regularity buried in vector geometry. ##### Pass 5 — Cross-domain analogy discovery (the key structural pass) Pass 5 ignores entity names entirely and compares **mechanism-structure signatures** across different source graphs. Define the signature of an entity $e$ in graph $g$ as the set of $(\text{mechanism}, \text{outcome})$ pairs it triggers: $$ \text{sig}_g(e) = \{\,(r, t) : (e, r, t) \in T_g\,\} $$ For each pair of entities $(e_1, e_2)$ in *different* source graphs $(g_1, g_2)$ with $e_1 \neq e_2$: 1. Compute the shared mechanism set $M_{\cap} = \{ r : \exists t. (r,t) \in \text{sig}_{g_1}(e_1) \} \cap \{ r : \exists t. (r,t) \in \text{sig}_{g_2}(e_2) \}$. 2. Require $|M_{\cap}| \geq 2$ — at least two shared mechanisms. 3. Compute structural similarity as the **Jaccard index** over the mechanism sets: $$ \text{sim}(e_1, e_2) = \frac{|M_{\cap}|}{|M_1 \cup M_2|}, \qquad \text{accept if } \text{sim} \geq 0.3 $$ 4. **Transfer non-shared patterns**: for every $(r, t) \in \text{sig}_{g_1}(e_1)$ with $r \notin M_2$, emit $(e_2, r, t, 0.55 \cdot \text{sim})$. Symmetrically for the reverse direction. Concrete example from the shipped knowledge base: g₁ = offensive_security.causal contains: (port_scanner, scans, network) (port_scanner, identifies, services) (port_scanner, detects, os) g₂ = devops.causal contains: (nmap_automation, scans, network) (nmap_automation, identifies, services) (nmap_automation, integrates_with, ansible) Shared mechanisms: {scans, identifies}. |M∩| = 2 ≥ 2 ✓ Mechanism Jaccard: 2 / 4 = 0.5 ≥ 0.3 ✓ Pass 5 emits: (nmap_automation, detects, os, c = 0.55 × 0.5 = 0.275) — transferred from g₁ via structural analogy (port_scanner, integrates_with, ansible, c = 0.275) — transferred from g₂ via structural analogy Each carries metadata _analogy = 'nmap_automation(devops) ~ port_scanner(offensive_security)'. The structural insight: **entity names carry no semantic content for cross-domain inference; only the mechanism-outcome signatures do**. Two entities that "do the same kinds of things" are functionally interchangeable across domains, regardless of how they were named by the harvesters that scraped them. Pass 5 is the deterministic-knowledge-graph analogue of **embedding-space neighborhood transfer**, but expressed as set operations over literal mechanism strings. Every transfer is auditable: the metadata records which entity-pair and which graph-pair produced it. Cap $3{,}000$. ##### Pass 6 — Contextual cross-graph activation When two entities co-occur in user intent (`flask + slow`, `kerberos + dcsync`, `csv + sqlite`), the relevant subgraph is *the intersection of their neighborhoods* — the shared neighbors are the contextually-activated entities. Pass 6 precomputes contextual links by finding entity pairs with sufficiently large shared neighborhoods: $$ N(e) = \{\,e' : \exists r. (e, r, e') \in \mathcal{G} \lor (e', r, e) \in \mathcal{G}\,\} $$ For each pair $(e_1, e_2)$ with $|N(e_1) \cap N(e_2)| \geq 2$, emit $(e_1, \text{contextually_linked_to}, e_2)$ with $$ c = \min(0.70,\; |N(e_1) \cap N(e_2)| \cdot 0.15) $$ The shared-neighbor list is stored in the `_shared_context` metadata field (truncated to 5 entries) so downstream query can identify *which* shared concepts produced the contextual linkage. Cap $2{,}000$. ##### Pass 7 — Creative recombination The most speculative pass. Find **bridge entities** that appear in two or more distinct source graphs, then cross-connect their endpoints: $$ \text{bridges}(\mathcal{G}) = \{\,e \in \mathcal{E} : |\{ g : e \in g \}| \geq 2\,\} $$ For each bridge $b$ with triplets in source graphs $g_a$ and $g_b$: $$ \frac{ (h_a, r_a, b, c_a) \in T_{g_a} \quad\text{and}\quad (b, r_b, t_b, c_b) \in T_{g_b} }{ (h_a, \text{recombines_with}, t_b, c_a \cdot c_b \cdot 0.5) \in \mathcal{G} } $$ The factor $0.5$ is the deepest discount in the engine — Pass 7 recombinations are weakest because they cross domains in both endpoints. Limited to 3 triplets per graph pair per bridge to prevent combinatorial explosion. Cap $1{,}500$. Pass 7 is a discrete substrate analogue of **genetic crossover**: useful traits from two "parents" (source graphs) recombine through a shared "gene" (the bridge entity). The emitted triplets are tagged for downstream filtering — typical use cases threshold them out unless aggressive exploration is requested. #### 2.5 — Confidence decay, reward, penalty (Hebbian dynamics) The graph is not static. Three update rules implement an **edge-weight Hebbian dynamics** on the knowledge base: **Decay.** Every full-engine boot applies an exponential-style decay to triplets that have not been *traversed* since the last decay event: $$ c_{t+1} = c_t \cdot \lambda, \qquad \lambda \in (0, 1) $$ with $\lambda$ chosen so that unused triplets cross the prune threshold ($\theta = 0.30$) after approximately 20–30 boots. Triplets that decay below $\theta$ are removed by `prune_knowledge(threshold=0.30)`. **Reward.** When a triplet is *traversed and validated* (used in a successful code generation, oracle-confirmed output), its confidence is boosted: $$ c_{t+1} = \min(1.0,\; c_t + \beta), \qquad \beta = 0.05 $$ **Penalty.** When a triplet is *traversed and rejected* (used in a failed generation, oracle-rejected output): $$ c_{t+1} = \max(0.0,\; c_t - \pi), \qquad \pi = 0.20 $$ The asymmetry $\pi = 4\beta$ is deliberate: failure information is more valuable per event than success information (success can be lucky; failure under deterministic conditions is diagnostic). The combined dynamics implement **use-strengthens / disuse-and-failure- weaken** at the graph-edge level — strict structural Hebbian learning on a discrete substrate. The `zero_shot` flag disables decay and learning entirely for reproducible benchmark runs. #### 2.6 — The query: `infer(intent, top_k=3)` The query interface is one method. It takes an intent (mode + tokens + entities + params) and returns the top-$k$ inference chains ranked by confidence-weighted chain length: def infer(self, intent: Dict[str, Any], top_k: int = 3 ) -> List[List[Dict[str, Any]]]: if not self._inference_done: self.run_inference() # 7-pass build-up, idempotent # 1. Bridge lookup for raw intent tokens candidates = self._bridge_lookup(intent['raw']) # 2. Entity lookup for parsed entities candidates += self._get_entity_candidates(intent['entities']) # 3. Construct chains from explicit + inferred triplets chains = self._entity_based_chain(intent) # 4. Rank by chain-confidence, return top-k return sorted(chains, key=chain_confidence, reverse=True)[:top_k] Chain confidence: $$ C(\langle t_1, t_2, \dots, t_n \rangle) = \prod_{i=1}^n c_i $$ Longer chains are penalised multiplicatively. The Code Assembler (Layer 3) consumes the top-$k$ chains and tries them in order: first chain that produces a type-safe fragment sequence wins. #### 2.7 — The `.causal` substrate is peer-reviewed The knowledge format and the inference engine are not new to O1-O. They are the same substrate validated across **nine peer-reviewed papers at four 2026 IEEE conferences**: - **ICECET** (causal extraction, 14-step FOSS Gate) - **IEEE-NANO Nanjing** (the flagship — nuclear-knowledge-graph reasoning) - **IEEE IRI Seattle** (gap-driven autonomous knowledge expansion) - One additional 2026 IEEE flagship venue Key validations transferred from the published work: - **100 % byte-level determinism** across 150 repeated extractions of the same source corpus. The triplet set is bit-exact reproducible. - **Model-agnostic determinism**: Qwen-8B, Gemma-2B, and Llama-3B all converge to the same validated triplet set despite a 9× extraction-rate variation. The determinism is a property of the validation architecture (the 14-step FOSS Gate), not of any individual language model. - **88 % precision on DocRED** (the standard relation-extraction benchmark) — the underlying extractor matches state-of-the-art on the academic benchmark while being fully deterministic and air-gap-capable. - **Bit-for-bit-validated security assessment** of IBM z/OS mainframe infrastructure (50 findings, responsibly disclosed to IBM PSIRT). - **Distributional analysis of NIST PQC**: the same engine analyzes ML-KEM (FIPS 203) ciphertext distributional signatures. The substrate is the same. The engine is the same. O1-O is the third visible application of the same `.causal` infrastructure (after `fabel` for conversation and FORGE-class symbolic synthesis for nuclear domain modelling). See [dotcausal.com](https://dotcausal.com) and [github.com/dotcausal](https://github.com/dotcausal). #### 2.8 — Autonomous knowledge expansion Three modules drive **self-extending knowledge acquisition** — the engine grows its own graph by scraping documentation, validating extractions, and persisting the survivors. | Module | LOC | Role | |---|---|---| | `core/web_harvester.py` | 120 | recursive documentation crawler with the 7 causal-extraction patterns | | `core/auto_harvester.py` | 361 | GitHub-API-driven repository scraper for unknown domains | | `core/auto_bridge.py` | 338 | autonomous intent-to-fragment bridge generation (lifted coverage from 23 % to 100 %) | The seven causal-extraction patterns are literal regexes in `web_harvester.py`: CAUSAL_PATTERNS = [ (r"use\s+(?:the\s+)?([\w\.\- ]+?)(?:\s+library)?\s+to\s+([\w\.\- ]+)", "usage"), (r"(?:the\s+)?([\w\.\- ]+?)\s+(?:allows|enables|supports)\s+([\w\.\- ]+)", "capability"), (r"(?:the\s+)?([\w\.\- ]+?)\s+(?:provides|gives|offers)\s+([\w\.\- ]+)", "feature"), (r"to\s+([\w\.\- ]+),?\s+(?:use|try)\s+(?:the\s+)?([\w\.\- ]+)", "usage_pre"), (r"(?:the\s+)?([\w\.\- ]+?)\s+is\s+used\s+for\s+([\w\.\- ]+)", "purpose"), (r"prevents?\s+([\w\.\- ]+),?\s+(?:use|using)\s+(?:the\s+)?([\w\.\- ]+)", "prevention"), (r"requires\s+([\w\.\- ]+)", "dependency"), ] Each pattern, applied to documentation prose, extracts a triplet whose head and tail are the captured groups and whose mechanism is the pattern's label. Every extracted triplet has a **literal substring match in the source text** — this is the property that distinguishes rule-based extraction from LLM-based extraction: an LLM might produce a plausible-sounding triplet that does not actually appear in the source; the regex extractor cannot. The 88 % DocRED precision is achieved with extensions to this base — semantic chunking, domain detection, quantification, and the 14-step FOSS Gate validator. The base regex extractor alone is what ships in this repository as `web_harvester.py`; the full production extractor lives in the peer-reviewed reference implementations at [github.com/dotcausal](https://github.com/dotcausal). #### 2.9 — Worked example: end-to-end inference on a single intent Trace what happens when the user types `"port scanner with banner grabbing"`: 1. **Tokenise** → `{port, scanner, banner, grabbing}` (Layer 1). 2. **Bridge lookup** in `knowledge_engine._bridge_lookup` matches `port_scanner` directly in the `entity_index` (the multi-word token spans the bigram). Two explicit triplets are recovered from `offensive_security.causal`: (port_scanner, identifies, services) c = 0.95 (port_scanner, performs, banner_grabbing) c = 0.92 3. **Pass 5 inferences** (precomputed at boot) contribute one cross-domain triplet: (port_scanner, integrates_with, ansible) c = 0.275 _analogy = 'nmap_automation(devops) ~ port_scanner(offensive_security)' 4. **Pass 1 chains** (precomputed): `port_scanner` is the head of multiple direct triplets; transitive closure does not produce additional useful chains for this intent. 5. **Pass 6 contextual links** discover that `port_scanner` and `service_enum` share four neighbors. The contextual triplet is emitted but downscored at the chain-ranking step. 6. **Chain ranking** (`_entity_based_chain`) computes chain confidence $C = c_1 \cdot c_2 \cdot \dots$ for each candidate chain, sorts descending, returns the top-5. 7. **Top chain** wins: (port_scanner, performs, banner_grabbing) → fragment 'port_scanner_socket' + 'banner_grab_helper' The Code Assembler (Layer 3) takes this chain and proceeds with color-pipeline composition. Total inference wall-clock: under 50 ms on the shipped knowledge base. #### 2.10 — Comparison: what the substrate provides vs. what LLM retrieval provides | Property | LLM embedding retrieval (RAG) | O1-O `.causal` substrate | |---|---|---| | Knowledge representation | dense vectors in $\mathbb{R}^d$ | explicit triplets with source tags | | Recall mechanism | nearest-neighbor over embeddings | indexed lookup + 7-pass inference | | Source attribution | best-effort (chunk reference) | every triplet has a source-tag chain | | Determinism | depends on embedding model fp-stability | bit-exact reproducible | | Cross-domain transfer | implicit in embedding space | explicit Pass 5 with auditable analogies | | Knowledge update | re-embed + re-index | drop in a `.causal` file | | Decay / reinforcement | none structural | Hebbian dynamics on edge weights | | Format size on disk | full vector store (typically GB+) | 9.5 MB for 132 domains | | Inference latency | model forward pass | <50 ms (table joins on indices) | | Air-gap compatible | requires local model | yes — msgpack + zlib only | ### Layer 3 — Composition: the 8-color algebraic type system The composition layer answers one question: *given a natural-language intent, which code fragments combine in which order to produce a working program?* The traditional approach is to sample a likely answer from a language model. O1-O does not sample. O1-O performs a type-checked lookup in a finite typed graph. The mechanism is a closed-set algebraic type system on data-flow categories. Eight colors, binary composition, deterministic edge resolution. The whole composition pipeline runs in single-digit milliseconds. #### The eight colors Every code fragment in the registry declares exactly two colors: an **input color** (what the fragment consumes) and an **output color** (what it produces). The eight colors form a closed set: | Color | Meaning | Examples in the wild | |---|---|---| | `TEXT` | plaintext strings, file content, readable text | source code, log lines, prose, decoded payloads | | `STRUCT` | dicts, lists, parsed objects in memory | parsed JSON, parsed CSV row sets, Python dicts | | `TABULAR` | rows / records (CSV, DB query results) | `[[name, age], [Alice, 30], ...]` | | `BYTES` | raw binary streams | encrypted blobs, file contents in binary mode, network packets | | `SERIAL` | serialized text format (JSON/XML/YAML strings) | `'{"k": "v"}'` as a string, not yet parsed | | `PATH` | filesystem references | `/etc/passwd`, `./data.csv`, `~/tools/foo` | | `RESPONSE` | HTTP / network response objects | `requests.Response`, `urllib` responses | | `VOID` | no meaningful I/O — standalone operations, side-effects | HTTP servers, schedulers, whole tools | The closure of this set is not arbitrary. Each color names a *concrete data-flow category* that appears as a fragment-boundary type in real code. Eight is the empirical minimum: drop any one and standard composition patterns break (drop `RESPONSE` and `requests.get → parse` fragments cannot compose; drop `SERIAL` and "text-as-JSON" cannot be distinguished from "text-as-prose"; drop `VOID` and standalone tools cannot enter the algebra). Adding more colors increases distinction without increasing composability — the eight are the spanning basis. #### Composition is binary edge resolution flowchart LR A["Fragment A
(input: PATH, output: TEXT)"] B["Fragment B
(input: TEXT, output: STRUCT)"] C["Fragment C
(input: STRUCT, output: VOID)"] A -->|"output(A)=TEXT
== input(B)=TEXT ✓"| B B -->|"output(B)=STRUCT
== input(C)=STRUCT ✓"| C style A fill:#3a506b,color:#fff style B fill:#5bc0be,color:#000 style C fill:#0b132b,color:#fff Composition `A → B` is legal if and only if `output_color(A) == input_color(B)`. There is no score. There is no probability. There is no "almost matches" or "looks similar." There is the equality test, and there is the rejection. When the equality fails but a recorded **converter fragment** exists in the `COLOR_CONVERTERS` table, the assembler inserts the converter automatically. When no converter exists, the pipeline is *impossible* and the assembler returns `None` — no code is emitted, no plausible-but-wrong output is produced. #### Why this excludes hallucination by construction LLM code generation models a probability distribution over next tokens. The model has no internal representation of "this fragment expects a `dict`, that fragment returns a `requests.Response`." It produces the *most likely string given the preceding string*. When the most likely string happens to be correct code, the output works. When the most likely string is plausible-but-wrong code, the output looks correct and fails at runtime, or worse, silently corrupts data. This failure mode is *structural to the sampling architecture*. O1-O has no sampling. The composition decision is reduced to a finite sequence of equality tests over a closed type alphabet. Either every edge `(output_i, input_{i+1})` is in the allowed set (composition legal, code emitted) or some edge is not (composition rejected, no code produced). Hallucination requires a degree of freedom that the algebra does not provide. This is the algebraic-program-synthesis formulation of **Curry-Howard correspondence** applied to *data-flow types* rather than to *logical types*. Composing fragments is composing typed terms; the type system is the proof obligation; the type checker is the proof verifier. The proof here is not "this code is correct" in the full Hoare-logic sense — it is "this composition is type-safe, every fragment's output feeds a fragment that consumes it, and no untyped junction exists in the program graph." #### VOID — the closure operator `VOID` is a deliberate design choice that makes the algebra *complete* over code rather than restricted to pure functions. A pure function has a meaningful input and a meaningful output; an HTTP server, a scheduled task, a whole CLI tool does not. Without `VOID`, those operations cannot enter the type system at all — they would require a separate composition mechanism, doubling the implementation complexity. `VOID` is the identity for "no meaningful data flow." A `VOID → VOID` fragment is a standalone operation. A fragment with output `VOID` is a sink (e.g., `file_write` — writes data to disk, produces no consumable result). A fragment with input `VOID` is a source (e.g., `datetime_now` — needs no data to run, produces a timestamp). The effect: standalone operations and pure functions live in the same algebraic system, composed by the same edge-resolution mechanism. Approximately 50% of the fragment registry is `VOID → VOID` (entire offensive-security tools, server processes, scheduling loops). The other 50% participates in proper data-flow chains. #### The three modules in the composition layer The eight-color algebra is realized by three Python modules totaling ~2,000 LOC: ##### `core/color_types.py` (996 LOC) — the registry Defines the eight color constants, the fragment registry, the converter table, and the intent-to-color-chain pattern list. # core/color_types.py — excerpts TEXT = 'TEXT' STRUCT = 'STRUCT' TABULAR = 'TABULAR' BYTES = 'BYTES' SERIAL = 'SERIAL' PATH = 'PATH' RESPONSE = 'RESPONSE' VOID = 'VOID' ALL_COLORS = {TEXT, STRUCT, TABULAR, BYTES, SERIAL, PATH, RESPONSE, VOID} # Maps fragment_key → (input_color, output_color) COLOR_REGISTRY = { 'file_read': (PATH, TEXT), 'file_read_binary': (PATH, BYTES), 'file_write': (TEXT, VOID), 'json_load': (PATH, STRUCT), 'json_loads': (SERIAL, STRUCT), 'json_dump': (STRUCT, VOID), 'json_dumps': (STRUCT, SERIAL), 'csv_read': (PATH, TABULAR), 'csv_write': (TABULAR, VOID), 'requests_get': (TEXT, RESPONSE), 'aes_encrypt': (TEXT, BYTES), 'aes_decrypt': (BYTES, TEXT), 'hashlib_sha256': (TEXT, TEXT), 'hash_file': (PATH, TEXT), # ... ~632 entries total } # Maps (from_color, to_color) → converter fragment_key (or None for implicit) COLOR_CONVERTERS = { (TEXT, STRUCT): 'json_loads', (STRUCT, TEXT): 'json_dumps', (STRUCT, SERIAL): 'json_dumps', (SERIAL, STRUCT): 'json_loads', (PATH, TEXT): 'file_read', (RESPONSE, TEXT): None, # implicit via .text accessor (RESPONSE, STRUCT): None, # implicit via .json() method (BYTES, TEXT): None, # implicit via .decode() (TEXT, BYTES): None, # implicit via .encode() (VOID, TEXT): 'datetime_now', # ... ~20 conversion edges } # Intent regex → required color chain. Each chain forces an exact pipeline shape. INTENT_COLOR_CHAINS = [ (r'pipeline.*(?:source|sink|flow|transform)', [PATH, TEXT, STRUCT, VOID]), (r'(?:convert|bridge|transform).*(?:json|xml)', [PATH, STRUCT, SERIAL]), (r'(?:convert|transform).*(?:csv|json)', [PATH, TABULAR, STRUCT, VOID]), (r'(?:ssh|redis|http).*brute', [VOID, VOID]), (r'modbus.*(?:scan|probe|read|write|fuzz|attack)',[VOID, VOID]), (r'(?:pass.*the.*hash|pth).*(?:attack|exec)', [VOID, VOID]), # ... 337 patterns total ] The fragment registry is the **type signature catalog**. Every code fragment in `fragments/*.json` has exactly one row in `COLOR_REGISTRY`. The mapping is hand-curated for clarity but generated automatically for new fragments via static analysis of variable usage. ##### `core/color_assembler.py` (675 LOC) — the resolver Performs the actual edge resolution. Two main entry points: `detect_chain(intent_text)` finds the color sequence required by the intent; `resolve_chain(color_chain)` walks the sequence and selects fragments for each edge. The transition index is precomputed at boot for O(1) lookup per edge: class ColorAssembler: def __init__(self, fragments): # Precompute: (input_color, output_color) → [fragment_keys] self._by_transition = {} for frag_key, (in_c, out_c) in COLOR_REGISTRY.items(): if frag_key in fragments: self._by_transition.setdefault((in_c, out_c), []).append(frag_key) When the assembler needs a `PATH → TEXT` edge, the lookup `self._by_transition[(PATH, TEXT)]` returns the list `['file_read', 'file_readline', 'file_readlines', ...]` in constant time. No search, no scoring, no probability ranking — the first available fragment is used. Selection determinism is part of the property: same intent → same fragment sequence → same emitted code, bit for bit. ##### `core/color_checker.py` (327 LOC) — the validator Validates an assembled chain *before* code is emitted. Reports violations in four classes with explicit severity: class ColorChecker: IDENTITY_PAIRS = { (TEXT, SERIAL), # SERIAL is a TEXT subtype (SERIAL, TEXT), } IMPLICIT_PAIRS = { (RESPONSE, TEXT): '.text', (RESPONSE, STRUCT): '.json()', (BYTES, TEXT): '.decode()', (TEXT, BYTES): '.encode()', } def validate_chain(self, fragment_keys, expected_chain=None): violations = [] # ... walks every (output_i, input_{i+1}) edge, # checks (a) registry membership, (b) direct equality, # (c) identity pairs, (d) implicit conversions, # (e) converter availability, (f) hard mismatch. The four violation classes: | Class | Severity | Behavior | |---|---|---| | `unknown_fragment` | warning | Fragment not in registry — type-check skipped, may still compose | | `needs_converter` | warning | Color mismatch but a converter is recorded — assembler auto-inserts it | | `missing_converter` | **error** | Color mismatch and no converter exists — pipeline rejected | | `mismatch` | error | General mismatch — pipeline rejected | The error-vs-warning distinction is load-bearing: warnings are auto-repaired by the assembler; errors abort the pipeline. **There is no path through the system that produces an invalid composition.** Either the chain validates (with or without auto-repair) and code is generated, or the chain is rejected with a structured diagnostic — never a plausible hallucination. #### Driven by: `core/code_assembler.py` (2035 LOC) The Code Assembler is the actual entry point from the pipeline. It uses the Color Assembler as its primary composition path and falls back to two alternative paths when the color algebra does not match (incremental-update mode, project-mode, multi-language mode): 1. **Color pipeline** (primary, fastest) — natural-language → color chain → fragment chain → wired code. Sub-millisecond per fragment. 2. **Triplet-chain assembly** — knowledge-inference chain (Layer 2) → 6 lookup strategies per triplet → variable wiring. Used when the intent doesn't match any of the 337 color patterns but is well-supported by the knowledge graph. 3. **V4 architecture-aware** — last-resort fallback using higher-level intent decomposition. Variable wiring across fragment boundaries uses an explicit `produced_vars: Dict[str, int]` index — every variable a fragment defines is registered with its source fragment index; subsequent fragments that consume the variable bind it from the recorded source. The manually-curated `VARIABLE_COMPATIBILITY` map (~50 synonym sets) handles the case where fragment A produces `response.text` but fragment B expects a variable named `body` or `content` — domain knowledge that makes wiring robust across the natural variation in fragment naming conventions. #### Worked example: from intent to code, end to end Consider the intent `"convert data.csv to json"`. Trace the color algebra: 1. **Intent parsing** (Layer 1) yields tokens: `{convert, data, csv, json}`, mode `BUILD`, `requires_output=True`. 2. **Color chain detection** matches the intent against `INTENT_COLOR_CHAINS`: r'(?:convert|transform).*(?:csv|json)' → [PATH, TABULAR, STRUCT, VOID] The intent demands a 4-color pipeline: read from a path, get rows, convert to a struct, write somewhere. 3. **Chain resolution** walks the four edges: PATH → TABULAR : by_transition[(PATH, TABULAR)] = ['csv_read', 'csv_dictreader'] → pick 'csv_read' TABULAR → STRUCT : by_transition[(TABULAR, STRUCT)] = ['list_filter'] → pick 'list_filter' (identity transform, TABULAR is a STRUCT subtype) STRUCT → VOID : by_transition[(STRUCT, VOID)] = ['json_dump', 'database_insert', ...] → pick 'json_dump' 4. **Chain validation** (`color_checker.validate_chain`) confirms all three edges are direct equality matches. Zero violations. The pipeline is type-safe. 5. **Code emission** assembles the three fragments with variable wiring: # Generated by O1-O — bit-exact reproducible, no AI calls import csv import json def main(): path = 'data.csv' # PATH → TABULAR (csv_read) with open(path, 'r') as f: rows = list(csv.reader(f)) # TABULAR → STRUCT (list_filter / identity) data = rows # STRUCT → VOID (json_dump) with open('output.json', 'w') as f: json.dump(data, f, indent=2) if __name__ == '__main__': main() 6. **Verification** (Layer 4) confirms compile-pass + algebraic determinism + import coverage. The emitted source is committed to a session folder along with metadata, provenance trace (which color chain → which fragments → which knowledge triplets), and packaging artifacts. Total wall-clock time: under 100 ms. Total LLM calls: zero. Reproducibility: same intent → identical bit-exact code, every time. #### Fragment Registry (`core/fragment_registry.py`, 231 LOC) The Fragment Registry is the third-axis classifier that complements the color system. Every fragment is classified along **three orthogonal axes**: | Axis | Source | Values | |---|---|---| | **Color** | `color_types.py` | `(input_color, output_color)` — type-flow semantics | | **Role** | derived from AST analysis | `SOURCE` / `SINK` / `TRANSFORM` / `STANDALONE` — topology semantics | | **Domain** | JSON file the fragment lives in | `bash`, `web`, `crypto`, `offensive_security`, `forensics_ir`, ... — subject semantics | Three orthogonal classifications over the same 1,245 fragments. The color axis governs composition; the role axis governs which fragments can wire into which positions (`SOURCE` fragments cannot follow a `SINK`); the domain axis governs subject-matter relevance and is used by the knowledge graph for intent matching. The Registry also computes per-fragment metadata at boot: { 'key': 'file_read', 'produces': ['content'], # variables this fragment defines 'consumes': ['path'], # template variables this fragment requires 'imports': ['# (no imports needed for stdlib open)'], 'has_output': False, # has a print() / return statement 'role': 'TRANSFORM', # consumes-and-produces } The `produces` and `consumes` fields drive the variable-wiring layer in the Code Assembler. `VARIABLE_COMPATIBILITY` (~50 synonym sets covering `response ≈ data ≈ text ≈ body ≈ content`, `rows ≈ records ≈ entries`, `target ≈ host ≈ ip`, etc.) handles the natural variation between fragments harvested from different sources. #### Why this is hard to do with an LLM, and easy here An LLM has to learn composition rules implicitly from training examples. The eight-color type discipline emerges (if at all) as a fuzzy regularity in the embedding space; the model has no way to *enforce* the constraint that "the output of fragment A must be the consumable input of fragment B" because there is no explicit type representation it can check against during generation. O1-O makes the constraint explicit and checkable. The eight colors are first-class representations in `core/color_types.py`. The fragment registry is a literal Python dict. The composition rule is one line of Python (`output_color == input_color`). The constraint is enforced at every generation step, by code, deterministically. This is what "deterministic by construction" means in this context: the structural property "no invalid composition can be emitted" is not a hoped-for emergent regularity of a trained model — it is a Python-level invariant of the assembler module, verifiable by reading the source. ### Layer 4 — Verification: seven independent pre-emission stages #### 4.0 — The verification pipeline at a glance flowchart LR asm[Layer 3 output
color-validated
fragment composition] v1["1 — Compile gate
ast.parse + compile
SyntaxError caught"] v2["2 — Structural intent
80+ intent→module
map verification"] v3["3 — Hypothesis property
contracts verified over
random inputs"] v4["4 — Algebraic properties
8 properties checked
per emitted function"] v5["5 — Symbolic execution
path · constraint ·
overflow · loop-bound"] v6["6 — Taint analysis
source → sink reach,
KB-backed sanitizers"] v7["7 — Logic consistency
triplet-chain coherence
propositional"] sand[Sandbox executor
11 auto-fix strategies
over failure classes] out[Verified output
committed to session] asm --> v1 --> v2 --> v3 --> v4 --> v5 --> v6 --> v7 --> out v1 -.->|fail| sand v2 -.->|fail| sand sand -.->|retry| v1 style asm fill:#1a1a2e,color:#fff style v1 fill:#0f3460,color:#fff style v2 fill:#0f3460,color:#fff style v3 fill:#533483,color:#fff style v4 fill:#e94560,color:#fff style v5 fill:#533483,color:#fff style v6 fill:#533483,color:#fff style v7 fill:#0f3460,color:#fff style sand fill:#16213e,color:#fff style out fill:#1a1a2e,color:#fff Total verification stack: **~4,000 LOC** across seven modules. End-to-end verification latency: **single-digit milliseconds** per output on the shipped benchmark. #### 4.1 — Stage 1: Compile gate (`ast.parse` + `compile`) The first gate. The emitted source is parsed against Python's grammar and lowered to bytecode without execution. Any `SyntaxError`, `IndentationError`, or `TabError` is caught here. try: tree = ast.parse(source, mode='exec') code_obj = compile(tree, '', 'exec') compile_passed = True except SyntaxError as e: compile_passed = False diagnostic = {'line': e.lineno, 'col': e.offset, 'msg': e.msg} This is the cheapest gate (microseconds) and the only stage that **cannot** be auto-repaired at this layer — a syntax error in a generated source means the fragment composition itself was malformed, which signals a bug in the Code Assembler, not a fixable code error. Such events are logged at high severity and the pipeline rejects the output. In practice, the eight-color type system in Layer 3 makes compile failure essentially impossible — fragments are pre-compiled and their composition preserves syntactic well-formedness by construction. Production benchmarks show **100% compile-pass rate** on all shipped task suites. #### 4.2 — Stage 2: Structural intent verification (`core/formal_verifier.py`, 234 LOC) The mechanism is a literal map from intent keywords to expected import sets: INTENT_MODULE_MAP = { 'csv': {'csv'}, 'json': {'json'}, 'database': {'sqlite3', 'sqlalchemy'}, 'download': {'requests', 'urllib'}, 'http': {'requests', 'urllib', 'http'}, 'hash': {'hashlib'}, 'sha256': {'hashlib'}, 'regex': {'re'}, 'email': {'re', 'smtplib', 'email'}, 'zip': {'zipfile'}, 'socket': {'socket'}, 'port': {'socket'}, 'random': {'random'}, 'plot': {'matplotlib'}, 'image': {'PIL', 'Pillow', 'cv2'}, 'numpy': {'numpy'}, 'pandas': {'pandas'}, 'encrypt': {'cryptography', 'Crypto'}, 'scrape': {'requests', 'bs4', 'scrapy'}, 'ssh': {'paramiko'}, 'smtp': {'smtplib'}, 'agent': {'socket', 'threading', 'json', 'requests'}, 'c2': {'socket', 'threading', 'json'}, 'beacon': {'socket', 'json'}, 'watchdog': {'os', 'sys', 'signal'}, 'persistence': {'os', 'sys'}, # … 80+ entries total } Verification proceeds in five checks: | # | Check | Mechanism | |---|---|---| | 1 | **Import coverage** | For each intent keyword $k$ present in the user request, at least one element of `INTENT_MODULE_MAP[k]` must be imported by the emitted code | | 2 | **Operation coverage** | The relevant operations (e.g. `requests.get` for download intents) must appear as `Call` nodes in the AST | | 3 | **Safety check** | In `safe_mode`, calls to `eval`, `exec`, `os.system`, and `subprocess.*` with `shell=True` are forbidden | | 4 | **Completeness** | The code must contain real logic — not just `pass`, `def main(): pass`, or boilerplate | | 5 | **Output alignment** | If `intent.requires_output == True`, the code must contain at least one `print()`, `return`, or equivalent output statement | The output is structured: { 'is_proven': True, # All 5 checks pass 'violations': [], # None 'checks_passed': 5, 'checks_total': 5, 'cert': 'verified:2026-06-26T17:15:52', } When any check fails, the violations list is populated with the specific diagnostic. A single missing import does not auto-repair (the Code Assembler is the layer that fixes this — if `csv` is missing for a CSV intent, the assembler picked wrong fragments and the chain is re-resolved with the violation as a hint). #### 4.3 — Stage 3: Hypothesis-based property contracts (`core/verifier.py`, 98 LOC) This stage performs **property-based testing** using the [Hypothesis](https://hypothesis.readthedocs.io/) library when available, falling back to basic execution checks otherwise. The mechanism: given a fragment's declared contract (a property type plus input strategy), generate random inputs from the strategy, run the fragment, and check the property holds. # core/verifier.py — Hypothesis-driven contract verification class Verifier: def verify_fragment(self, code, properties): if not self.hypothesis_available: return self.basic_unit_test(code, properties) return self._run_property_test(code, properties) Example contract for a sorting function: properties = { 'type': 'sorting', 'input_strategy': st.lists(st.integers()), 'invariants': ['ordered', 'permutation_of_input', 'length_preserved'], } Hypothesis then generates 100+ random integer lists, runs the emitted `sort` function on each, and verifies: 1. The output is monotonically non-decreasing (`ordered`) 2. The output is a permutation of the input (`permutation_of_input`) 3. `len(output) == len(input)` (`length_preserved`) A single counterexample failure invalidates the property and reports the minimal failing input (Hypothesis's shrinking machinery). When no contract is declared, the stage falls back to a basic `exec(code, namespace)` check that confirms the code runs to completion without raising. This stage is the **statistical-empirical** complement to Stage 4's algebraic-formal verification: properties that Hypothesis can falsify by counterexample are caught here; properties that hold over infinite input domains are caught in Stage 4. #### 4.4 — Stage 4: Algebraic property verification (`core/property_verifier.py`, 633 LOC) The eight properties are declared in `core/property_verifier.py`: PROPERTIES = { 'commutativity': { 'description': 'f(a, b) == f(b, a)', 'min_args': 2, }, 'associativity': { 'description': 'f(f(a, b), c) == f(a, f(b, c))', 'min_args': 2, }, 'idempotence': { 'description': 'f(f(x)) == f(x)', 'min_args': 1, }, 'monotonicity': { 'description': 'a <= b → f(a) <= f(b)', 'min_args': 1, }, 'identity': { 'description': 'f(x, e) == x for some identity element e', 'min_args': 2, }, 'involution': { 'description': 'f(f(x)) == x', 'min_args': 1, }, 'determinism': { 'description': 'f(x) returns same result every time', 'min_args': 1, }, 'boundary': { 'description': 'f handles edge cases without crashing', 'min_args': 1, }, } Each property is formally: ##### Commutativity $$\forall a, b \in D : f(a, b) = f(b, a)$$ The function's output is invariant under input-argument swap. Holds for `add`, `mul`, `min`, `max`, `gcd`, `xor`, set-union, set-intersection. Fails for `subtract`, `divide`, `pow`, string-concatenation, list-append. Verification: sample $n$ random argument pairs from the inferred input domain, evaluate $f(a,b)$ and $f(b,a)$, count holds and fails. The property is reported as **holds** if all $n$ samples pass; **violated** with a counterexample otherwise. ##### Associativity $$\forall a, b, c \in D : f(f(a, b), c) = f(a, f(b, c))$$ Grouping is irrelevant. Holds for `add`, `mul`, `concat`, set-union. The check evaluates both nesting orders over random triples and compares results. ##### Idempotence $$\forall x \in D : f(f(x)) = f(x)$$ Applying $f$ twice has the same effect as applying it once. Holds for `abs` over the reals, `sort`, `unique`, `normalize`, `lowercase`, `strip`, set-construction. This is the **monad-law-style** idempotence and is critical for caching, memoisation, and retry-safe operations. An idempotent generator can be retried without semantic change — a property that downstream OPSEC tooling relies on. ##### Monotonicity $$\forall a, b \in D : a \leq b \;\Rightarrow\; f(a) \leq f(b)$$ The function preserves order. Holds for `abs` over non-negatives, `square` over non- negatives, monotone numerical transforms. Sampled by drawing $n$ pairs $(a, b)$ with $a \leq b$ enforced via post-hoc ordering, then checking $f(a) \leq f(b)$. ##### Identity element $$\exists e \in D : \forall x \in D : f(x, e) = x$$ There exists a *neutral element* $e$ such that combining anything with $e$ leaves it unchanged. `0` for `add`, `1` for `mul`, `""` for string-concat, `[]` for list-extend, `set()` for set-union. The verifier probes for $e$ over a small candidate set $\{0, 1, -1, "", []`, `set()`, `None\}$ and accepts the property if at least one candidate works for all sampled $x$. ##### Involution $$\forall x \in D : f(f(x)) = x$$ Self-inverse. Applying $f$ twice returns the original input. Holds for `negate`, `reverse`, `transpose` (matrix), `complement` (over a bounded set), Caesar-cipher-with-fixed-shift when the shift is its own inverse mod alphabet size. Involution is a stricter form of idempotence: idempotence requires $f(f(x)) = f(x)$; involution requires $f(f(x)) = x$ (which implies $f(f(f(x))) = f(x)$ but not the idempotence equation). The two are independent — both, either, or neither may hold for a given function. ##### Determinism $$\forall x \in D : \forall \text{ runs } r_1, r_2 : f_{r_1}(x) = f_{r_2}(x)$$ Same input, same output, every run, every process, every machine. The function depends only on its arguments — no hidden state, no I/O, no random source, no wall-clock dependency. Verification: sample inputs, evaluate the function multiple times (in this process and in a freshly-forked subprocess), compare results. Any divergence is reported as a determinism violation with the divergent runs as evidence. **This is the property O1-O's entire architecture is built around.** Layer 3 produces deterministic compositions; Layer 4 stage 4 confirms the resulting code is itself deterministic. The determinism property is checked on **every emitted function** by default (it has the lowest `min_args` requirement and is always applicable). ##### Boundary $$\forall e \in \text{EdgeCases}(D) : f(e) \;\text{ does not raise an unhandled exception}$$ Where `EdgeCases(D)` is a literal catalog of edge values per type: `0`, `1`, `-1`, `""`, `None`, `[]`, `{}`, `set()`, `float('inf')`, `float('-inf')`, `float('nan')`, max-int, min-int, max-float, min-float, single-char strings, single-element collections, etc. The check evaluates $f$ on each edge case and confirms either a clean return or a *handled* exception (caught with `try/except` inside the function). Unhandled exceptions are reported as boundary violations with the offending edge case. ##### Per-function property selection The verifier does not check every property on every function. It performs **type-based property selection**: given a function's argument count, return type, and inferred input domain, only properties that *could* hold are checked. For instance, `idempotence` and `involution` are only checked when the function has compatible input and output types ($f: D \to D$); `commutativity` is only checked on 2-argument functions. This selection is deterministic — same function signature, same set of checked properties. The `determinism` and `boundary` properties are checked on every function unconditionally. ##### What this catches | Bug class | Property that catches it | |---|---| | Hidden global state | Determinism | | Time-dependent output | Determinism | | Random-source dependency | Determinism | | Missing edge-case handling | Boundary | | Wrong commutative-semigroup operation | Commutativity (positive) or Identity (positive) failures | | Argument-order bug in symmetric operations | Commutativity | | Off-by-one in cancellation logic | Involution (when expected) | | Stateful retry-safety bug | Idempotence (when expected) | The shipped task suite reports verification-stage-4 catch rates of **3–5 % of generated functions** exhibit at least one property violation that the Code Assembler then re-resolves with alternative fragment selections. #### 4.5 — Stage 5: Symbolic execution (`core/symbolic_executor.py`, 561 LOC) The symbolic executor performs **lightweight constraint-based analysis** over the emitted AST. It is not a full Z3-based engine — it is a targeted analyzer for the bug classes most common in generated code. A `SymbolicValue` tracks per-variable constraints: class SymbolicValue: def __init__(self, name, vtype='unknown', min_val=None, max_val=None, possible_values=None, is_const=False): self.name = name self.vtype = vtype # 'int' | 'float' | 'str' | 'list' | 'bool' self.min_val = min_val # lower bound, if known self.max_val = max_val # upper bound, if known self.possible_values = possible_values # finite set, if known self.is_const = is_const def can_be_zero(self) -> bool: ... def can_overflow_32(self) -> bool: ... Six analyses are performed: | # | Analysis | What it detects | |---|---|---| | 1 | **Variable constraint tracking** | What value range each variable can hold at each program point | | 2 | **Path exploration** | Enumerate branches reachable from any entry, with their constraint sets | | 3 | **Unreachable-branch detection** | Branches whose constraints are unsatisfiable — dead code that will never execute | | 4 | **Overflow detection** | Integer arithmetic that can exceed 32- or 64-bit bounds given the constraints | | 5 | **Division-by-zero detection** | Divisions where the denominator's constraint set permits zero | | 6 | **Loop bound analysis** | Loops whose termination condition cannot be discharged from the constraints — potential infinite loops | This is the same class of analysis performed by industrial-grade static analyzers (`mypy`, `pyright`, `pysa`, `infer`, Coverity) but tuned for the patterns that emerge from fragment composition. Catching a division-by-zero at this stage prevents a runtime crash that would otherwise require Stage-1 sandbox execution to discover. Outputs are structured per-analysis with line numbers, the implicated variable, and a suggested fix (insert a guard, narrow the constraint, adjust the loop bound). #### 4.6 — Stage 6: Taint analysis (`core/taint_analyzer.py`, 394 LOC) Information-flow analysis: when can untrusted input flow into a dangerous sink without passing through a sanitizer? Sources, sinks, and sanitizers are not hard-coded — they are **looked up in the knowledge graph** (Layer 2). The knowledge engine exposes a dedicated taint API: # core/knowledge_engine.py:1105+ def query_by_taint(self, taint: str): ... def get_taint_flows(self, source: str): ... def get_sanitizers(self, source: str): ... def get_safe_sinks(self): ... def trace_taint_path(self, source: str, sink: str): ... The taint analyzer walks the emitted AST and constructs a flow graph: for each input parameter, network read, file read, environment variable, or stdin read (sources), trace all assignments and function calls through which the value flows. When the flow reaches a sink (e.g. `eval`, `exec`, `subprocess.Popen(shell=True)`, SQL `execute` without parameter binding, file write to user-controlled path, HTTP response without escaping), report a taint violation **unless** the flow passes through a known sanitizer for that source-sink pair. Source classes: | Source | Examples | |---|---| | user input | function arguments, CLI args, `input()`, form fields | | network | `socket.recv`, `requests.get(...).text`, `urllib.urlopen.read` | | filesystem | `open(...).read`, `os.listdir`, environment-controlled paths | | environment | `os.environ[...]`, `os.getenv(...)` | Sink classes: | Sink | Examples | |---|---| | code execution | `eval`, `exec`, `compile + exec`, `pickle.loads` | | shell execution | `os.system`, `subprocess.Popen(shell=True)`, `subprocess.run(shell=True)` | | SQL | `cursor.execute("..." + tainted)`, `cursor.execute("..." % tainted)` | | filesystem | `open(tainted_path)`, `os.remove(tainted_path)` | | HTTP | `Response(tainted)` without escaping | The flow-path reach is auditable: each reported violation includes the **exact AST path** the tainted value travelled, from source line to sink line, with intermediate assignments listed. When the emitted code includes a recognized sanitizer (HTML-escape, shell-quote, parameterized SQL binding, path normalization with allowlist), the flow is marked neutralized and no violation is emitted. #### 4.7 — Stage 7: Logic consistency (`core/mathematical_engine.py`, 101 LOC) The smallest stage by line count but architecturally critical: it checks the **triplet chain itself** for propositional-logic consistency *before* code is generated. # Invoked by code_assembler.py at composition time: logic_proof = self.math_engine.validate_chain(inference_chain) if not logic_proof['is_consistent']: print(f"⚠️ Logic Inconsistency Detected: {logic_proof['contradictions']}") If the knowledge-graph inference (Layer 2) produces a chain such as $$ (A, \text{requires}, B), \quad (A, \text{blocks}, B) $$ within the top-$k$ retrieved chains for a single intent — a structural contradiction — the mathematical engine catches it and the chain is rejected before fragment composition begins. This prevents a malformed inference from propagating into the emitted code as a contradictory operation. The rule set is the same direction-calculus used in Layer 2 Pass 2 (positive / negative / neutral mechanism classification), evaluated *over chains* rather than over individual inference edges. A chain is consistent if no $(h, t)$ pair simultaneously carries both a positive and a negative direction across the chain. Contradictions are diagnostic — they typically indicate a bug in the harvester (`auto_harvester.py`) or a knowledge-graph edit conflict — and are logged for review. #### 4.8 — The Sandbox Executor (`core/executor.py`, 1023 LOC) — auto-fix layer When Stage 1 (compile gate) catches an error that the Code Assembler can repair locally, or when the optional dry-run execution catches a runtime error in the emitted code, the Sandbox Executor's auto-fix layer activates. The executor classifies the failure into one of **eleven failure classes** and applies the corresponding deterministic fix strategy. flowchart TB fail[Sandbox failure detected] cls{Classify failure} f1["ModuleNotFoundError
→ pip install missing"] f2["NameError
→ AST-level rename"] f3["FileNotFoundError
→ create sample data"] f4["SyntaxError
→ AST repair"] f5["TypeError
→ type coercion"] f6["IndexError
→ boundary check"] f7["KeyError
→ dict.get with default"] f8["AttributeError
→ method lookup"] f9["ValueError
→ validation insertion"] f10["ImportError
→ path repair"] f11["UnicodeError
→ encoding annotation"] f12["ConnectionError
→ retry with backoff"] retry[Re-emit + re-verify] fail --> cls cls --> f1 --> retry cls --> f2 --> retry cls --> f3 --> retry cls --> f4 --> retry cls --> f5 --> retry cls --> f6 --> retry cls --> f7 --> retry cls --> f8 --> retry cls --> f9 --> retry cls --> f10 --> retry cls --> f11 --> retry cls --> f12 --> retry style fail fill:#e94560,color:#fff style cls fill:#533483,color:#fff style retry fill:#0f3460,color:#fff | # | Failure class | Detection | Strategy | |---|---|---|---| | 1 | `ModuleNotFoundError` | `_fix_module_not_found` | extract module name from stderr, run `pip install ` in the venv, retry execution | | 2 | `NameError` | `_fix_name_error` | parse the undefined name from stderr, search the AST for a close lexical match (Jaro–Winkler), rename to the matched identifier | | 3 | `FileNotFoundError` | `_fix_file_not_found` | extract requested path from stderr, generate a sample-data file at that path matching the expected format (CSV/JSON/text) | | 4 | `SyntaxError` | `_fix_syntax_error` | parse the partial AST, apply targeted AST surgery (missing colon, mismatched brackets, indentation correction) | | 5 | `TypeError` | `_fix_type_error` | analyze the type mismatch, insert type coercion at the error site (`str→int`, `int→float`, `bytes→str` via decode, etc.) | | 6 | `IndexError` | `_fix_index_error` | wrap the offending index access in a boundary check or `try/except` with safe default | | 7 | `KeyError` | `_fix_key_error` | replace `dict[key]` with `dict.get(key, default)` where default is inferred from surrounding type context | | 8 | `AttributeError` | `_fix_attribute_error` | search the object's type for a near-match method name; if found, rename; if not, insert a `hasattr` guard | | 9 | `ValueError` | `_fix_value_error` | insert input validation upstream of the failing call | | 10 | `ImportError` | `_fix_import_error` | repair circular imports, missing `__init__.py`, or wrong `sys.path` setup | | 11 | `UnicodeError` | `_fix_unicode_error` | add `encoding='utf-8'` (or detected encoding) to the failing I/O call | | 12 | `ConnectionError` | `_fix_connection_error` | wrap the network call in retry-with-exponential-backoff logic | Each strategy is **deterministic** (same failure → same fix → same retry outcome) and **bounded** (each strategy attempts at most $k=3$ retries before escalating to a chain re-resolution). The strategies compose: a fix that introduces a new failure triggers classification of the *new* failure and the next strategy. The repair rates from the shipped benchmark suite: | Failure class | Frequency in raw output | Repair success rate | |---|---|---:| | ModuleNotFoundError | 4.1 % | 99 % | | FileNotFoundError | 2.7 % | 100 % | | KeyError | 1.8 % | 92 % | | AttributeError | 1.1 % | 78 % | | TypeError | 0.9 % | 71 % | | (others, combined) | 1.3 % | 80 % | | **No failure** | **88.1 %** | — | After auto-fix, the **end-to-end clean-output rate exceeds 99 %** on the benchmark task suite. #### 4.9 — The AST Engine (`core/ast_engine.py`, 997 LOC) The toolkit that underpins every stage above. AST traversal with visitor pattern, AST mutation (used by auto-fix, mutation engine, self-repair), AST pattern matching (used by detection engine and evasion engine), AST pretty-printing, AST equivalence checking under $\alpha$-renaming. Verifier Stage 5 (symbolic execution), the auto-fix strategies of Stage 8, the mutation engine of Layer 5, and the self-repair pipeline of Layer 6 all sit on top of this single AST manipulation API. Centralizing AST operations into one module gives every downstream consumer the same well-tested semantics and the same bug-fix surface. #### 4.10 — Why seven stages and not fewer Each stage catches a class of error the others structurally cannot: | Bug class | Stage that catches it | |---|---| | Syntax error | 1 (compile gate) | | Missing module import / wrong fragment selection | 2 (structural intent) | | Wrong runtime behavior on common inputs | 3 (Hypothesis property) | | Hidden state, non-determinism, missing edge cases, wrong algebraic structure | 4 (algebraic properties) | | Dead code, unreachable branches, overflow, divide-by-zero, infinite loops | 5 (symbolic execution) | | Untrusted-input → dangerous-sink reach | 6 (taint analysis) | | Contradictory inference chain from Layer 2 | 7 (logic consistency) | | Runtime errors that survive all of the above | 8 (sandbox auto-fix) | The seven stages plus auto-fix form a **partition** of the error space: every bug class in generated code falls into exactly one of these buckets. No stage is redundant; no class is uncovered. LLM code generation can have any number of post-hoc verification layers bolted on, but none of those layers can *prevent* a hallucination from being emitted in the first place — they can only *detect* it after the fact. In O1-O, Stages 1–7 run pre-emission, before any byte of output is committed. The deterministic-by-construction property is not just the absence of LLM sampling — it is the active enforcement of correctness invariants at every gate. ### Layer 5 — Detection awareness and evasion The generated tool is **scanned against its own detection rules** and rewritten until clean (or the round limit is reached and the user is warned). - **Detection Engine** (`core/detection_test.py`, 821 LOC). 46 signatures across four classes: YARA-style string patterns, behavioral heuristics, Shannon entropy analysis, import-table inspection. Categories include reverse-shell, process-injection, credential-access, registry-persistence, file-encryption, keylogger, DNS-tunneling, C2 beacon, and ~37 more. - **Semantic Evasion Engine** (`core/semantic_evasion.py`, 1223 LOC). **17 transform classes**, each rewriting one behavioral pattern via an equivalent alternative implementation. `socket.connect()` → `socket.create_connection()` or raw-socket; reverse shell → multi-step indirection; `eval/exec` → AST-eval; etc. **This is not obfuscation — it is semantic equivalence under different API paths.** - **Mutation Engine** (`core/mutation_engine.py`, 617 LOC). 5 progressive levels combining variable renaming, string obfuscation, dead-code injection, control-flow transformation, and comment mutation. `generate_variants(code, n=5)` produces N functionally-identical variants from a single source — payload polymorphism without sampling. - **Payload Mutator** (`core/payload_mutator.py`, 888 LOC). Six AST-level operators for deeper transformation than the text-level mutation engine. - **EDR Subverter** (`core/edr_subverter.py`, 833 LOC). Generates target-specific EDR detection scripts (CrowdStrike, SentinelOne, Defender ATP, Carbon Black). Subversion primitives: ntdll unhooking from disk, direct syscalls, AMSI bypass, Defender exclusion registry manipulation, EDR telemetry blocking. - **Canary Detector** (`core/canary_detector.py`, 665 LOC). Identifies honey-tokens (AWS canary keys, file canaries, tracking-pixel URLs, honey accounts) *before* exfiltration — if a canary is detected, the operation pivots to alternative targets. - **Anti-Forensics** (`core/anti_forensics.py`, 321 LOC). Memory wipe, log manipulation, timestomp, process rename via `prctl`, stack-string cleanup post-use. Total evasion stack: **5,368 LOC**. ### Layer 6 — Self-improvement: four closed loops on a discrete symbolic substrate The mechanics are **pure discrete graph operations on the same `.causal` substrate** that powers Layer 2. There is no policy network, no value network, no gradient. Triplets are added, edges are weighted, fragments are bridged, failure patterns are cached with success rates. The reward signal is a compile pass plus an oracle verdict on output semantics; the update is a literal insertion into `learned.causal` and `failure_patterns.json`. The loop is bounded by **iteration count and wall-clock budget, not by data or compute**. Default: 500 iterations, 8-hour window. Configurable: python3 src/self_improve_runner.py --iterations 1000 --hours 12 python3 src/self_improve_runner.py --quick # 50 iter, 30 min — sanity check #### 6.0 — The four parallel closed loops flowchart TB detector[GapDetector
4 gap classes
orphan · low-conf ·
untested · underconnected] tasks[TaskGenerator
gap → BUILD intent] pipeline[Full Layer-3+4 pipeline
compose · verify · execute] oracle[OutputOracle
semantic verdict] loop1["Loop 1 — Code-pattern learning
core/learning.py · 240 LOC"] loop2["Loop 2 — Failure-pattern memory
core/failure_memory.py · 629 LOC"] loop3["Loop 3 — Auto-bridge generation
core/auto_bridge.py · 338 LOC"] loop4["Loop 4 — Knowledge harvesting
core/auto_harvester.py · 361 LOC"] learned[learned.causal
persisted triplets] failures[failure_patterns.json
persisted fix strategies] bridges[bridge_triplets.json
intent-to-fragment routing] kb[Knowledge graph
132 .causal files] detector --> tasks tasks --> pipeline pipeline --> oracle oracle -->|success| loop1 oracle -->|failure| loop2 loop1 -->|extract idioms| learned loop2 -->|cache strategy| failures detector -.->|orphan fragments| loop3 loop3 -->|generate routes| bridges detector -.->|unknown domain| loop4 loop4 -->|scrape + extract + validate| kb learned -.->|boot-time inject| pipeline failures -.->|next failure lookup| pipeline bridges -.->|extend coverage| kb kb -.->|new triplets| detector style detector fill:#e94560,color:#fff style tasks fill:#0f3460,color:#fff style pipeline fill:#16213e,color:#fff style oracle fill:#533483,color:#fff style loop1 fill:#0f3460,color:#fff style loop2 fill:#0f3460,color:#fff style loop3 fill:#0f3460,color:#fff style loop4 fill:#0f3460,color:#fff style learned fill:#1a1a2e,color:#fff style failures fill:#1a1a2e,color:#fff style bridges fill:#1a1a2e,color:#fff style kb fill:#1a1a2e,color:#fff All four loops share three properties: 1. **Closed-loop** — output of the loop modifies a persistent file that is read at next boot. Learning does not vanish at process exit. 2. **Deterministic** — same gap detected, same task generated, same persistence written. Self-improvement is reproducible. 3. **Auditable** — every persisted triplet, every cached failure strategy, every generated bridge has a source-tag pointing back at the iteration and the input that produced it. #### 6.1 — The driver: `core/self_improve.py` (979 LOC) The driver runs four phases per iteration: sequenceDiagram autonumber participant SI as SelfImproveLoop participant GD as GapDetector participant TG as TaskGenerator participant P as Pipeline (Layer 3+4) participant O as OutputOracle participant L as LearningLoop participant F as FailureMemory participant H as AutoHarvester loop until budget exhausted SI->>GD: detect_all() GD-->>SI: prioritized gap list SI->>TG: generate_from_gap(top_gap) TG-->>SI: BUILD intent string SI->>P: run_live_pipeline(intent) P-->>SI: emitted code + verifier verdict SI->>O: judge(code, intent) O-->>SI: pass / fail + semantic verdict alt pass SI->>L: extract_and_persist_idioms(code) L->>L: append to learned.causal else fail SI->>F: classify_failure(error_info) F->>F: cache strategy in failure_patterns.json end opt harvest enabled & unknown domain SI->>H: scrape_for_domain(gap.domain) H->>H: extract triplets, validate, inject end end SI->>SI: save results log #### 6.2 — Loop 0: GapDetector — finding the system's own blind spots The GapDetector enumerates **four gap classes** by walking the knowledge base and the fragment registry. The detected gaps are sorted by priority (highest first) and consumed by the loops in order. ##### Gap class 1 — Orphan fragments Formally, given the bridge set $\mathcal{B} \subseteq \mathcal{T}$ (triplets in `bridge_triplets.json` and `composition_triplets.json`), and the set of fragment keys $\mathcal{F}$ from the registry: $$ \text{OrphanFragments} = \{\,f \in \mathcal{F} : f \notin \pi_{\text{outcome}}(\mathcal{B})\,\} $$ where $\pi_{\text{outcome}}$ is the projection onto the outcome column of the triplet relation. Composition triplets `outcome = key1+key2` are split: each `key_i` counts as a bridged key. **Priority: 9 / 10** (highest). Unreachable code is the worst class of gap because the implementation cost has already been paid — the bug is purely routing. # core/self_improve.py — GapDetector._find_orphan_fragments gaps = [] for frag_key in self.fragments: if frag_key not in bridged_keys: gaps.append({ 'type': 'orphan_fragment', 'fragment_key': frag_key, 'priority': 9, 'description': f'Fragment "{frag_key}" has no bridge pointing to it', }) return gaps ##### Gap class 2 — Low-confidence explicit triplets $$ \text{LowConfTriplets} = \{\,t \in \mathcal{G}_{\text{explicit}} : c(t) < 0.5 \,\land\, \neg \text{IsInferred}(t)\,\} $$ Excluding meta-triplets (`effective_for`, `often_paired_with`, `solved_by`, `implements_with`) which are bookkeeping artifacts of the harvester rather than knowledge claims. Priority scales inversely with confidence: $$ \text{Priority}(t) = 5 + (0.5 - c(t)) \cdot 4 $$ — a triplet at $c = 0.30$ gets priority $5 + 0.8 = 5.8$; a triplet at $c = 0.10$ gets $5 + 1.6 = 6.6$. The detector pushes the most uncertain triplets to the top because each verified pass through the pipeline either confirms the triplet (Hebbian reward, $c$ rises) or falsifies it (Hebbian penalty, $c$ falls below the prune threshold). ##### Gap class 3 — Untested compositions These compositions encode multi-fragment recipes. If they have never been exercised, the recipe might be malformed (wrong fragment combination) or the composing logic might never fire on real intents. Each untested composition becomes a task whose BUILD intent forces the recipe to be tried. ##### Gap class 4 — Underconnected entities $$ \text{UnderconnectedEntities} = \{\,e \in \mathcal{E} : |\{t \in \mathcal{G} : e \in t\}| < 3\,\} $$ Entities at the periphery of the knowledge graph contribute little to the 7-pass inference closure. The gap detector schedules tasks targeting these entities so that the harvester or the learning loop can densify the graph around them. #### 6.3 — Loop 1: Code-pattern learning (`core/learning.py`, 240 LOC) Every successful pipeline run goes through `PatternExtractor`: # core/learning.py — extract idioms from the emitted AST class PatternExtractor: def extract_idioms(self, script: str) -> List[Dict[str, Any]]: tree = ast.parse(script) idioms = [] for node in ast.walk(tree): if isinstance(node, ast.With): idioms.append({'type': 'context_manager', 'node': 'with'}) if isinstance(node, ast.Try): idioms.append({'type': 'error_handling', 'node': 'try_except'}) if isinstance(node, (ast.For, ast.While)): idioms.append({'type': 'loop', 'node': 'iteration'}) # … 30+ more idiom matchers return idioms Each extracted idiom is paired with the task's intent to produce a **new triplet**: $$ (\text{intent_token}, \text{uses_idiom}, \text{idiom_type}) $$ For instance: `("download", uses_idiom, "context_manager")` means "downloads tend to use `with`-blocks" — a learned codegen preference. These triplets are persisted to `learned.causal` and **injected back into the live knowledge engine at boot time**. A critical implementation detail spelled out in the source comment: # core/learning.py:65–66 # CRITICAL: Inject learned triplets into the live knowledge engine. # Without this, learning is write-only — the engine never sees what it learned. self.knowledge.load_transient_triplets(self.learned_triplets, 'learned') This is the gate that separates a *learning* system from a *logging* system. Most "self-improving" code systems write success logs that are never read at boot. O1-O writes to `learned.causal`, and `learned.causal` is loaded at every boot via the same `KnowledgeEngine` instance. The next pipeline run sees what the previous run learned, in the same process lifecycle. #### 6.4 — Loop 2: Failure-pattern memory (`core/failure_memory.py`, 629 LOC) Failure handling has two layers. Layer 4 stage 8 has the **deterministic** auto-fix strategies (11 classes, immediate static repair). Loop 2 has the **learned** strategies: fixes discovered by the system itself during self-improvement runs, indexed by failure fingerprint and tracked by success rate. # Structure of a single learned failure pattern failure_pattern = { 'fingerprint': 'ModuleNotFoundError:cryptography:install', 'error_type': 'ModuleNotFoundError', 'context_signature': 'cryptography import in encrypt intent', 'fix_strategy': 'pip install cryptography', 'tried_count': 47, 'success_count': 45, 'success_rate': 0.957, 'first_seen': '2026-06-26T03:14:12', 'last_used': '2026-06-26T17:15:52', } The memory is keyed by **failure fingerprint** — a tuple of (error type, context signature, stack frame). When a new failure with the same fingerprint is encountered, the cached fix strategy is applied directly, bypassing the full auto-fix decision tree. Two updates after each application: $$ \text{tried_count} \mathrel{+}= 1, \qquad \text{success_count} \mathrel{+}= [\text{fix worked}] $$ If a strategy's success rate falls below a threshold $\rho = 0.40$ over $\geq 10$ trials, it is demoted: the next failure with the same fingerprint goes through the static auto-fix strategies again instead of the learned strategy. #### 6.5 — Loop 3: Auto-bridge generation (`core/auto_bridge.py`, 338 LOC) The most impactful loop. AutoBridge synthesizes intent-to-fragment routing for orphan fragments — fragments that exist in code but cannot be reached by any natural-language intent. The pipeline per orphan fragment: 1. **Analyze fragment code.** Tokenize identifiers and string literals; extract keyword set $K_f$. 2. **Generate intent pattern variations.** For each keyword in $K_f$, construct natural-language variations: synonyms, verb forms, adjective placements (`AES encryption` → `encrypt with AES` → `AES encryption tool` → `FIPS-compliant AES`). 3. **Emit bridge triplets.** For each variation $v$: emit `(v, IMPLEMENTS, frag_key)` with initial confidence $c = 0.7$. 4. **Validate.** Test whether running the variation through the intent parser correctly routes to the fragment. Variations that don't bridge correctly are dropped. 5. **Persist.** Successful bridges go to `bridge_triplets.json` and are loaded at boot. The historical numbers — committed in the audit notes — track the coverage lift: | Stage | Bridged fragments | Coverage | |---|---:|---:| | Pre-AutoBridge | 287 / 1,245 | 23.0 % | | After first AutoBridge run | 924 / 1,245 | 74.2 % | | After learning-loop bridges | 1,242 / 1,245 | 99.8 % | | **Steady-state (post-AutoBridge maturation)** | **1,245 / 1,245** | **100.0 %** | A 4.4× coverage lift achieved by code, with no manual triplet authoring. The three fragments at 99.8 % steady-state are deliberately unbridged (deprecated experiments held in the registry for backward-compatibility loading). #### 6.6 — Loop 4: Knowledge harvesting (`core/auto_harvester.py`, 361 LOC) When the system encounters a domain for which the knowledge base has *no* coverage (e.g. a newly-released cloud SDK, a recently-published cryptographic primitive, a fresh framework), the AutoHarvester scrapes GitHub repositories in that domain and extends the knowledge graph autonomously. Pipeline per unknown domain: 1. **Query GitHub API** for top repositories matching the domain keyword: order by stars × recency, language = python (extensible to other languages). 2. **Clone top-N** repositories shallowly. 3. **Extract triplets** by running the 7 regex causal-patterns from `core/web_harvester.py` against every README, every docstring, every comment. 4. **Validate** each extracted triplet through the full verifier stack — does the inferred triplet pass logic-consistency, does it introduce no propositional contradictions with existing knowledge? 5. **Score** by source signal: stars, recency, citation in other repositories. 6. **Inject** the survivors into `learned.causal` with confidence = score × 0.6 (cap to max 0.85 — harvested-and-validated triplets never reach the 1.0 confidence of peer-reviewed sources). 7. **Cycle back to GapDetector** — re-detect, generate tasks targeting the newly added triplets, validate by running them through the pipeline. The constraint that distinguishes this from naïve scraping: **every harvested triplet must validate**. A triplet that contradicts existing knowledge is rejected, not merged. The knowledge graph is not democratic — incoming triplets are tested against the existing structure, and the existing structure wins on conflict unless explicitly overridden by an operator. #### 6.7 — OutputOracle: semantic verdict (`core/output_oracle.py`, 332 LOC) The reward signal for Loop 1 and Loop 2 is not just exit code. The OutputOracle performs **semantic validation** of the emitted code's behavior: class OutputOracle: def judge(self, code: str, intent: dict, execution_result: dict) -> Verdict: # Phase 1: compile + run if not execution_result['compiled']: return Verdict.SYNTAX_FAILURE if execution_result['runtime_error']: return Verdict.RUNTIME_FAILURE # Phase 2: intent satisfaction check intent_keywords = intent.get('tokens', []) emitted_imports = self._extract_imports(code) emitted_calls = self._extract_calls(code) if not self._imports_match_intent(emitted_imports, intent_keywords): return Verdict.INTENT_MISMATCH if not self._calls_match_intent(emitted_calls, intent_keywords): return Verdict.OPERATION_MISMATCH # Phase 3: output content check stdout = execution_result.get('stdout', '') if intent.get('requires_output') and not stdout.strip(): return Verdict.EMPTY_OUTPUT if self._has_error_markers(stdout): return Verdict.SEMANTIC_FAILURE # Phase 4: structural alignment if not self._has_real_logic(code): return Verdict.STUB_ONLY return Verdict.SUCCESS Six verdicts, six structurally different reward signals — Loop 1 advances only on `SUCCESS`; Loop 2 caches strategies for `SYNTAX_FAILURE`, `RUNTIME_FAILURE`, `INTENT_MISMATCH`, `OPERATION_MISMATCH`, `EMPTY_OUTPUT`, `SEMANTIC_FAILURE`, `STUB_ONLY` separately, each with its own classification and fix-strategy lookup. #### 6.8 — `self_improvement_turbo.py` (339 LOC) — the benchmark variant The standard self-improve loop is *exploratory*: GapDetector picks gaps, TaskGenerator spins tasks, OutputOracle validates. For **reproducible benchmark runs** (used in internal regression testing and in the audit notes), `self_improvement_turbo.py` provides a Monte-Carlo-sampled variant over a fixed V3 task list (104 tasks): class SelfImprovementTurbo: def __init__(self, session, v3_tasks: List[str], use_monte_carlo: bool = True): self.session = session self.v3_tasks = v3_tasks self.metrics = TurboMetricsCollector(num_tasks=len(v3_tasks)) def run(self, num_cycles: int = 1000, ...): for cycle in range(num_cycles): task = self._sample_task() # Monte-Carlo or sequential code = self._generate_code(task) verdict = self._verify(code, task) if verdict == 'SUCCESS': self._learn_from_success(task, code) else: self._learn_from_failure(task, code, verdict) self.metrics.record_cycle(cycle, task, verdict) return self._generate_report() `TurboMetricsCollector` tracks the **score lift over baseline** per task and per iteration window. The historical benchmark output: starting from the cold-boot baseline score on the V3 task suite, the turbo loop lifts the score deterministically as the learning loops fill the knowledge graph. The turbo variant is what produces the reproducibility numbers for the audit notes. #### 6.9 — Measured outcome: the system gets better over time The shipped benchmark across 1,000 turbo cycles on the V3 task list: | Metric | Cold boot | After 1,000 cycles | |---|---:|---:| | Tasks passing all verifiers | 88 % | **100 %** | | Average generation time per task | 412 ms | **287 ms** (faster — cached learned patterns hit) | | Bridge coverage (fragments) | 23 % | **100 %** | | Failure-fingerprint cache hit rate | 0 % | **64 %** | | `learned.causal` triplet count | 0 | **399** | | `failure_patterns.json` entries | 0 | **127** | The two numbers that matter: 1. **Coverage lift is monotone** — every cycle either improves the knowledge graph or leaves it unchanged. There is no scenario in which the learning loop *degrades* performance, because the Hebbian dynamics only promote triplets that have been validated through the verifier stack. 2. **Latency improves** — not because the underlying hardware got faster, but because the learned cache hit rate climbs. The failure-pattern memory short-circuits the auto-fix decision tree on 64 % of failures by cycle 1000. #### 6.10 — Architectural property: no policy network, no gradient Most self-improving code systems in the literature use one of two approaches: 1. **RL with policy gradients** — train a policy network to select code-generation actions, update the network on the reward signal. Requires GPUs, requires convergence-guarded training, requires careful reward shaping. 2. **LLM-with-self-critique** — generate code via LLM, generate a critique via the same or a different LLM, regenerate. No formal convergence property; quality depends on the critique LLM's calibration. O1-O Layer 6 uses neither. The *only* updates are discrete edits to literal Python data structures: triplet insertions into `learned.causal`, fix-strategy entries into `failure_patterns.json`, bridge triplets into `bridge_triplets.json`, harvested triplets into the live knowledge engine. All four targets are deterministic, all four are auditable, all four are bit-exact reproducible across runs given the same task distribution. There is no neural network anywhere in this loop. The "intelligence" of the self-improvement layer is the discrete-graph dynamics of (gap detection) → (task generation) → (validated update). The system gets better the way a manually-edited codebase gets better — by adding correct entries and removing incorrect ones — except that the editor is software and the update logic is enforced by the verifier stack. ### Layer 7 — Native and binary operations The system is format-agnostic at the binary level. - **Native Engine** (`core/native_engine.py`, 83 LOC). GCC C-compilation, NASM x86_64 assembly, LIEF-based binary patching. - **Polyglot Generator** (`core/polyglot_generator.py`, 879 LOC). Files valid in *two* formats simultaneously, constructed byte-by-byte with correct format headers and checksums: **PDF/JavaScript** (PDF reader sees a document, browser executes JS), **PNG/HTML** (image viewer sees a PNG, browser executes HTML in tEXt chunk), **JPEG/ZIP** (image viewer + ZIP tool both parse it cleanly), **MP4/PE** (video player + PE loader both work). All payloads configurable. - **Platform Adapter** (`core/platform_adapter.py`, 1026 LOC). Inline PE / ELF / Mach-O / Mach-O Fat binary parsing **without** `pefile`, `pyelftools`, `macholib`, or LIEF. Sections, segments, imports, exports, symbols, hashes, packer signatures — all parsed by O1-O's own code, all sovereign. ### Layer 8 — The autonomous engagement operator (`/engage`) The single command is: /engage [--objective ...] [--max-tools N] [--deploy-to ...] [--port P] [--user U] [--key KEYFILE] [--timeout SECONDS] [--chain] [--dry-run] #### 8.0 — The engagement at a glance (high-level sequence) sequenceDiagram autonumber participant U as User (one command) participant E as /engage handler participant R as LiveReconEngine participant I as EngageIntelligence participant P as 11-step Pipeline participant D as DeployEngine participant T as Target host participant DB as Operations DB U->>E: /engage 10.0.0.1 --chain E->>R: Phase 1: recon(target) R->>T: TCP scan 47 ports, 30 workers T-->>R: services + banners R-->>E: service list E->>I: Phase 2: build target model I-->>E: prioritized tool plan E->>I: Phase 2.5: pre-flight counter-detect I-->>E: EDR-detect staged + canary armed loop For each selected tool (max N) E->>P: Phase 3: generate(tool, config) P-->>E: deployment-ready artifact E->>D: Phase 4: deploy(artifact, host) D->>T: SCP + SSH execute (randomized) T-->>D: tool output D-->>E: result + stdout alt deploy success E->>I: learn_from_result(success) E->>E: classify into kill-chain phase else deploy failure E->>E: adaptive retry (6 fail classes) end end E->>E: Phase 5: result classification by regex E->>E: Phase 6: chain follow-ups from outputs E->>I: Phase 7: plan_lateral() + plan_pivot() I-->>E: lateral targets + pivot tools loop For each lateral target E->>P: generate(lateral tool) P-->>E: artifact E->>D: deploy via stolen key D->>T: lateral SSH + execute T-->>D: lateral result alt uid= in stdout E->>P: generate pivot tools E->>D: deploy pivots on new host end end E->>DB: Phase 8: persist hosts, creds, c2_channels (encrypted) E->>U: Phase 9: full engagement report The nine phases are orchestrated in `src/o1o_live.py:7345–7920` (~575 LOC of pure orchestration, plus ~1,100 LOC of lazy-loaded subsystems in `core/engage_intelligence.py` and `core/engage_v2.py`). #### 8.1 — Phase 1: Reconnaissance (`LiveReconEngine`, ~210 LOC inline in `o1o_live.py`) The first phase walks the target's network surface: - **47-port default scan** covering common services (SSH 22, FTP 21, HTTP 80, HTTPS 443, SMB 445, RDP 3389, MSSQL 1433, MySQL 3306, PostgreSQL 5432, MongoDB 27017, Redis 6379, SNMP 161, NMEA 10110/2000, AIS 4001, Modbus 502, S7 102, BACnet 47808, …). - **30 parallel worker threads** sweep the port set with bounded socket-connect timeouts. - **Banner grabbing** on each open port: read the service's startup banner where present (SSH version string, HTTP `Server:` header, SMB negotiate response, etc.). - **Service identification** via banner-pattern matching (`OpenSSH_8.4p1` → `service='ssh', version='8.4p1', distribution='openssh'`). The output is a structured list of dicts: [ {'port': 22, 'service': 'ssh', 'banner': 'SSH-2.0-OpenSSH_8.4p1', 'state': 'open'}, {'port': 80, 'service': 'http', 'banner': 'Server: nginx/1.21.0', 'state': 'open'}, {'port': 445, 'service': 'smb', 'banner': '...', 'state': 'open'}, {'port': 5432, 'service': 'pgsql', 'banner': '', 'state': 'open'}, ] The recon engine is **fully self-contained** — no nmap, no masscan, no external scanner. The same 47-port sweep runs identically in any environment Python 3 runs in, air-gap included. #### 8.2 — Phase 2: Service identification + tool selection The discovered services drive automatic tool selection. The naïve mapping is $$ \text{service} \xrightarrow{\text{auto_configure}} \{\text{candidate tools}\} $$ For instance, `ssh` open ⇒ `{ssh_brute_force, ssh_key_harvester, ssh_credential_tester}`; `pgsql` open ⇒ `{sql_injection_scanner, sql_credential_tester, pgsql_enumerator}`. The V2 Intelligence layer (`core/engage_intelligence.py`, 221 LOC; `core/engage_v2.py`, 916 LOC) then **re-orders** the candidate tools by mission-objective priority. The reordering rules are: | Mission objective | Priority order | |---|---| | `"gain access"` (default) | recon → credential-harvest → exploit → lateral | | `"data exfiltration"` | recon → credential-harvest → post-exploit-enum → sensitive-file-scan → exfil | | `"persistence"` | recon → exploit → privesc → persistence-mechanism → c2 | | `"intelligence gathering"` | recon → post-exploit-enum → credential-harvest (no exploit) | The first principle (credential-harvest before exploitation) is operationally significant: having credentials *first* often eliminates the need for exploit-driven access entirely, and in any case enables lateral movement later in the engagement without re-discovery. The V2 layer also **enforces objective-aware tool count**: a "data exfiltration" engagement with `--max-tools 8` allocates more slots to enum/exfil tools and fewer to exploit tools than a "gain access" engagement with the same budget. #### 8.3 — Phase 2.5: Counter-detection pre-flight Before deploying the first offensive tool, two counter-defenses are staged: 1. **EDR detection** (`core/edr_subverter.py`, 833 LOC). An EDR identification script is generated and staged for first-tool deployment. Once executed on the target, it reports which EDR vendor is installed (CrowdStrike Falcon, SentinelOne, Defender ATP, Carbon Black, Cortex XDR, …) by checking driver names, registry keys, running processes, and loaded DLLs. Subsequent tools are mutated to bypass the identified EDR's known detection vectors. 2. **Canary detection** (`core/canary_detector.py`, 665 LOC). Honey-token catalogs are loaded into memory: known AWS canary key patterns, file canary magic numbers, tracking-pixel URLs, honey-account name patterns, beacon URLs. Before any exfiltration call, the canary detector intercepts the candidate data and rejects exfiltration if a canary is found — pivoting to alternative targets rather than triggering the alert. Both defenses run *pre-flight* — they shape the engagement before tool deployment, not after detection. #### 8.4 — Phase 3: Generation + Deployment For each tool in the prioritized plan: 1. **Build the FORGE-style intent** by formatting the tool description and target: intent = f"create a {tool_desc} targeting {target_ip}" 2. **Run the 11-step Layer-3 pipeline** (parse → knowledge query → assemble → compile → evasion → formal verify → write → package → OPSEC harden → audit → threat model). 3. **Package the artifact** into a deployable bundle: source + standalone binary + Dockerfile + OPSEC profile + threat model. 4. **Deploy via SCP + SSH** (`DeployEngine`, ~160 LOC inline in `o1o_live.py`): - Randomized remote filename to avoid IOC matching - Idempotent cleanup of previous deployment artifacts - Bounded SSH timeout - `stdout` + `stderr` + exit code captured Each tool deployment writes a structured deployment result that drives the adaptive retry machinery below. #### 8.5 — Adaptive retry state machine (`_adaptive_retry`, ~150 LOC) When deployment fails, the engagement classifies the failure and applies the corresponding mutation strategy: stateDiagram-v2 [*] --> Deploy Deploy --> Success: exit=0 Deploy --> Auth: "Permission denied (publickey,password)" Deploy --> ConnRefused: "Connection refused" Deploy --> Timeout: SSH timeout Deploy --> ToolCrashed: "exit!=0 + traceback" Deploy --> SSHDisconnect: "Connection reset" Deploy --> PermDenied: "Permission denied" in tool output Auth --> MutateAuth: try alt credentials / key ConnRefused --> MutatePort: probe alt port from recon Timeout --> MutateTimeout: increase timeout / simpler variant ToolCrashed --> MutateImpl: switch to alt implementation SSHDisconnect --> MutateThrottle: throttle + retry PermDenied --> MutatePrivesc: prepend privesc tool MutateAuth --> Retry MutatePort --> Retry MutateTimeout --> Retry MutateImpl --> Retry MutateThrottle --> Retry MutatePrivesc --> Retry Retry --> Deploy: attempt < 2 Retry --> Abort: attempt == 2 Success --> [*] Abort --> [*] The six failure classes and their strategies, tabulated: | # | Failure class | Detection signal | Mutation strategy | |---|---|---|---| | 1 | `auth_failure` | `Permission denied (publickey,password)` in stderr | switch to alternative credentials from target model; try password-spray subset; try key from `ssh_key_harvester` output | | 2 | `connection_refused` | `Connection refused` from socket | probe alternative port from recon list; check if the service migrated; try alternate transport (TCP→UDP where applicable) | | 3 | `timeout` | SSH timeout exceeded | extend timeout multiplicatively; switch to a simpler variant of the tool with lower dependency surface | | 4 | `tool_crashed` | nonzero exit + Python traceback in stdout | swap to alternative implementation (different fragment composition for the same intent); add error-handling shim | | 5 | `ssh_disconnected` | mid-session disconnect | throttle connection rate; insert sleep jitter; switch deploy host if `--deploy-to` was provided | | 6 | `permission_denied` | `Permission denied` in tool output (post-deploy) | insert a privesc-check tool before the failing tool; re-deploy in correct order | **Retry cap: 2 retries per tool.** After two failed retries the tool is dropped from the engagement and the V2 intelligence layer **blacklists the underlying technique** — if `paramiko`-based SSH brute-forcing failed twice with auth errors, all subsequent paramiko-based tools are skipped for the remainder of the engagement (no point burning attempts on a known-failing technique). Every retry result feeds back into the EngageIntelligence target model. #### 8.6 — Phase 5: Result classification Every tool output is parsed for kill-chain keywords and assigned to a phase: | Keyword match | Phase | MITRE tactic | |---|---|---| | `scan`, `recon`, `enum`, `discover`, `port` | `recon` | TA0043 Reconnaissance / TA0007 Discovery | | `brute`, `exploit`, `shell`, `inject`, `vuln` | `exploit` | TA0001 Initial Access / TA0002 Execution | | `persist`, `backdoor`, `rootkit` | `persist` | TA0003 Persistence | | `c2`, `beacon`, `botnet`, `command` | `c2` | TA0011 Command & Control | | `lateral`, `pivot`, `spray`, `movement` | `lateral` | TA0008 Lateral Movement | | `keylog`, `sniff`, `capture`, `harvest`, `credential`, `stealer` | `collect` | TA0006 Credential Access / TA0009 Collection | | `exfil`, `tunnel`, `stego`, `dns tunnel` | `exfil` | TA0010 Exfiltration | | (default) | `exploit` | — | The classification populates an in-memory phases dict that becomes the structure of the final engagement report. #### 8.7 — Phase 6: Result chaining (with `--chain`) Discovery-driven follow-up tool generation. For each successful deploy, the engagement parses stdout for three classes of discovery and spawns up to **3 follow-up tools**: # 1. Discovered credentials → SSH access tool _creds = re.findall( r'(?:credential|password|login|user)[:\s]+(\S+)[/:\s]+(\S+)', output, re.IGNORECASE ) if _creds: chain.append(("SSH command executor", {"username": _creds[0][0], "password": _creds[0][1]})) # 2. Discovered new ports → service exploit tool _new_ports = re.findall(r'(?:open|discovered|found)[:\s]*(\d+)', output, re.IGNORECASE) for port in _new_ports: if int(port) not in already_known_ports: service = _guess_service(int(port)) chain.append((f"{service} exploitation tool", {"target": target, "port": port})) # 3. Discovered sensitive files → exfil beacon if re.search(r'(?:found|readable|access).*' r'(?:shadow|passwd|config|\.conf|\.key|\.pem)', output, re.IGNORECASE): chain.append(("data exfiltration beacon", {"target": target})) The follow-up tools are themselves generated through the full 11-step pipeline and deployed through the same DeployEngine. Each follow-up output is itself parsed for discoveries — but the depth is capped at one chain level by default (configurable up to 3 to prevent exponential blow-up). This is the **autonomous-pivoting-on-output mechanism**: every emitted tool's stdout is parsed for keys that unlock more of the engagement, and the chain extends itself without operator input. #### 8.8 — Phase 7: Lateral movement + pivot execution (V2) When EngageIntelligence is active and `--chain` is enabled, the V2 layer checks the target model for lateral opportunities: lateral_plan = engage_intel.plan_lateral() `plan_lateral()` consults the target model: - Have we stolen SSH keys (`ssh_key_harvester` succeeded)? - Have we sprayed credentials successfully (`credential_stuffing` succeeded with hits)? - Are alternative hosts mentioned in tool outputs (`/etc/hosts`, `.ssh/known_hosts`, ARP cache dumps)? For each lateral target: 1. **Generate** an SSH-lateral-movement tool through the full pipeline. 2. **Deploy** to the original target host with credentials/keys from the model. 3. The deployed tool authenticates to the lateral target and runs `id` or equivalent. 4. **Success indicator** — the tool's stdout contains `uid=` (Unix identity output). 5. On success, call `plan_pivot(lateral_ip, lateral_user, lateral_key)` to generate **pivot tools** on the newly reached host: - post-exploitation enumeration - credential harvesting on the pivot host - privesc check on the pivot host - propagate target-model state to the pivot The lateral-pivot loop iterates until the time budget is exhausted, no further lateral targets remain, or `--max-tools` is hit. #### 8.9 — The EngageIntelligence target model The V2 intelligence layer maintains an **in-memory target model** that persists across all phases of the engagement and gets serialized to the encrypted Operations DB at the end: target_model = { 'target_ip': '10.0.0.1', 'services': [{'port': 22, 'service': 'ssh', ...}, {'port': 80, 'service': 'http', ...}, ...], 'credentials': [{'host': '10.0.0.1', 'user': 'web', 'pass': '****'}, ...], 'ssh_keys': ['/tmp/.harvested/id_rsa_web01', ...], 'lateral_targets': [{'ip': '10.0.0.5', 'discovered_from': 'web01_arp_cache'}, {'ip': '10.0.0.7', 'discovered_from': 'known_hosts'}, ...], 'defenses': ['CrowdStrike Falcon', 'no canary detected'], 'access_level': 'user', # initial | user | sudo | root | domain_admin 'attempts': [ {'tool': 'ssh_brute_force', 'result': 'auth_failure', 'mutation': 'switch_to_credential_spray'}, {'tool': 'credential_stuffing', 'result': 'success', 'creds_discovered': [...]}, ... ], 'blacklisted_techniques': {'paramiko'}, 'mission_objective': 'gain access', } Five decision methods consume the model: | Method | Reads | Returns | |---|---|---| | `plan_initial(services, objective)` | `services`, `objective` | prioritized list of tool descriptions | | `learn_from_result(tool, deploy_result, config)` | — (writes) | updates `credentials`, `ssh_keys`, `lateral_targets`, `attempts` | | `should_skip_tool(tool_desc)` | `blacklisted_techniques`, `attempts` | `(bool, reason)` — block tools using known-failed techniques | | `plan_lateral()` | `credentials`, `ssh_keys`, `lateral_targets` | lateral-movement tool plan | | `plan_pivot(ip, user, key)` | model snapshot | pivot tool sequence for the new host | Every method is deterministic: same model state → same plan. **The intelligence is literal Python over a literal dict** — there is no model, no probability, no embedding, no neural decision. The "AI" in this autonomous operator is the determinism of the combined Layer-3 composition and the V2 plan rules. #### 8.10 — Phase 8: Operations persistence The full engagement state is serialized to the encrypted Operations DB at the end of the run. Resume is supported via `/resume ` — the DB is decrypted, the target model is reconstructed, and the engagement continues from where it left off, with the same EngageIntelligence rules and the same deterministic decision logic. This is the persistence boundary between Layer 8 (engagement orchestration) and Layer 10 (operations persistence). The crypto and schema details are covered in §10 below. #### 8.11 — Phase 9: Engagement report The final intelligence product. The report aggregates: 1. **Timeline** — every tool deployment with timestamp, kill-chain phase, status. 2. **Discovered services** — the recon output with per-service status updates. 3. **Discovered credentials** — encrypted in the persisted DB; redacted in the on-screen report by default (full disclosure with `--reveal-credentials`). 4. **Lateral pathways** — graph of successful pivots, annotated with the credential or key that enabled each hop. 5. **Anomalies** — canary trips, EDR alerts, honeypot indicators that triggered defensive pivots during the engagement. 6. **MITRE ATT&CK coverage** — which techniques were applied, mapped to the standard Navigator layer JSON for direct visualization in the official MITRE Navigator. The report is human-readable in the terminal and machine-readable in the persisted session folder (`summary.json` per task, `engagement.json` for the full run). #### 8.12 — "The operator is software, not a human at a console" This is the architectural property that distinguishes O1-O from existing C2 / red-team frameworks. Make the comparison explicit: | Framework | Operator role | Decision mechanism | |---|---|---| | Metasploit | human at `msfconsole`, manual module selection | operator picks modules from `use exploit/...` | | Cobalt Strike | human at GUI, Aggressor scripts for automation | scripts encode operator preferences but operator drives | | Sliver | human in multi-op console, manual implant tasking | operator runs `interact`, `getsystem`, `migrate`, etc. | | Mythic | human at web UI, plugin architecture | operator queues tasks per implant | | Havoc | human at modern console, scripted post-ex | operator selects post-ex modules | | Empire / Starkiller | human at REST API or web UI | operator runs PowerShell stagers via UI | | **O1-O `/engage`** | **EngageIntelligence target model** | **deterministic rules over service profile and tool-output regex** | Every existing red-team platform requires a *human in the operator role*. The platform provides the tools; the operator decides which tool, when, and what to do with the output. O1-O's Layer 8 substitutes the operator with software: the same role (tool selection, phase transition, lateral planning, pivot execution) is played by deterministic Python code consulting a literal target-model dict. The structural consequence: **engagement reproducibility**. Given the same target with the same service profile, the same tool outputs lead to the same plan-lateral decisions, the same pivots, the same final report. The engagement trace is bit-exact reproducible in the same way that the Layer 3 code-composition output is bit-exact reproducible. This is the property that makes Layer 8 useful for **adversary emulation under regulatory audit**, **deterministic detection-rule validation**, and **reproducible incident response training** — use cases where probabilistic LLM-driven operators structurally cannot give the same answer twice. #### 8.13 — Engagement scale benchmarks (from the shipped suite) Indicative numbers from `Beast E2E Lab` engagements committed in the audit notes: | Configuration | Wall-clock | Tools deployed | Phases reached | |---|---:|---:|---| | V2-only, 4 services, `--max-tools 4` | 28.5 s | 4 | Recon → Generate → Deploy → Lateral → Exfil | | V1+V2 integrated, 32 services, `--max-tools 10` | 366 s | 10 + 5 follow-ups | full 9-phase chain incl. multi-hop pivots | Per-tool generation cost is the same as in the standalone pipeline (~270–613 ms median); the bulk of engagement wall-clock is spent in TCP scan and SSH deploy (network-bound). ### Layer 9 — MITRE ATT&CK coverage and reporting - **MITRE Coverage** (`core/mitre_coverage.py`, 390 LOC). All 14 enterprise tactics, 49 techniques covered with 145 fragment mappings. Standard ATT&CK Navigator layer JSON emitted by `/coverage` — uploadable directly to the official MITRE Navigator. - **Threat Model Generator** (`core/threat_model.py`, 280 LOC) — per-tool threat model with MITRE mapping, IOC list, attribution analysis. - **Deployment Guide** (`core/deployment_guide.py`, 247 LOC) — per-tool operational walkthrough with OPSEC notes. ### Layer 10 — Operations persistence (sovereign crypto from scratch) #### 10.1 — Schema The Operations DB is a SQLite database with six tables persisting the full engagement trace: | Table | Purpose | Encryption | |---|---|---| | `operations` | top-level engagement metadata (target, objective, start/end, status) | plaintext | | `hosts` | discovered hosts across the engagement | plaintext | | `credentials` | discovered username/password pairs, SSH keys, tokens | **AES-256-CTR encrypted** | | `payloads` | metadata about generated and deployed tools | plaintext + path references | | `c2_channels` | C2 endpoint configurations | **AES-256-CTR encrypted** | | `events` | timeline of every action (tool deploy, retry, classification, pivot) | plaintext | The encryption distinction is deliberate: credentials and C2 endpoint configs are sensitive enough to require encryption at rest; metadata and timeline events are not, and keeping them plaintext enables operational queries (`SELECT * FROM events WHERE phase='lateral' AND status='success'`) without re-keying. #### 10.2 — Key derivation: PBKDF2-HMAC-SHA256, 600 000 iterations Per-operation key derivation: $$ K \;=\; \text{PBKDF2-HMAC-SHA256}\!\bigl(\,\text{passphrase},\; \text{salt} = \text{op_id},\; n = 600{,}000,\; \text{dklen} = 32\,\bigr) $$ Mechanically: $$ K = T_1 \,\Vert\, T_2 \,\Vert\, \dots \,\Vert\, T_{\lceil 32/32 \rceil} $$ with $$ T_i = F(\text{passphrase}, \text{salt}, n, i), \qquad F(p,s,n,i) = \bigoplus_{j=1}^{n} U_j $$ and $$ U_1 = \text{HMAC-SHA256}(p,\; s \,\Vert\, \text{INT}_{32}(i)), \quad U_j = \text{HMAC-SHA256}(p,\; U_{j-1}) \text{ for } j > 1. $$ For a 32-byte output (one $T_i$ block at SHA-256's natural 32-byte output), this reduces to **600,000 HMAC-SHA256 evaluations** per key derivation. This is the OWASP 2026 recommendation for PBKDF2-SHA256 (raised from 310,000 in 2023, 600,000 in 2024, unchanged through 2026 at time of disclosure). Implementation: # core/operations_db.py — pure-stdlib KDF import hashlib def _pbkdf2_key(passphrase: str, salt: str, iterations: int = 600_000) -> bytes: """Derive 256-bit key from passphrase via PBKDF2-HMAC-SHA256.""" return hashlib.pbkdf2_hmac( 'sha256', passphrase.encode('utf-8'), salt.encode('utf-8'), iterations, dklen=32, ) The salt is the per-operation `op_id` (UUID4), ensuring that two engagements with the same passphrase derive different keys — a passphrase compromise does not retroactively unlock prior operations beyond the specific one whose `op_id` the attacker knows. Brute-force resistance: at $600{,}000$ HMAC-SHA256 invocations per key trial, an attacker with $10^{12}$ HMAC-SHA256 per second of hardware (well above commercial GPU capacity in 2026) tests $\approx 1.67 \times 10^{6}$ passphrases per second. A passphrase with $\geq 60$ bits of true entropy resists $> 10^{12}$ years of such hardware. #### 10.3 — AES-256-CTR from scratch The AES block cipher is implemented inline in `core/operations_db.py` using only `struct` and integer arithmetic. No external dependency, not even via `hashlib` (which is only used for the KDF above). ##### The S-box The Rijndael S-box is generated at module load via the standard field-math construction in $\text{GF}(2^8)$: $$ \text{S-box}[p] = \bigl( q \oplus \text{ROTL}_8(q,1) \oplus \text{ROTL}_8(q,2) \oplus \text{ROTL}_8(q,3) \oplus \text{ROTL}_8(q,4) \bigr) \oplus 0x63 $$ where $q$ is the multiplicative inverse of $p$ in $\text{GF}(2^8)$ under the Rijndael irreducible polynomial $x^8 + x^4 + x^3 + x + 1$. Implementation (the actual generator from `core/operations_db.py`): def _aes_sbox() -> bytes: """Generate AES S-box from Rijndael field math.""" sbox = [0] * 256 p, q = 1, 1 while True: # p = p * 3 in GF(2^8) — primitive element p = p ^ (p << 1) ^ (0x1b if p & 0x80 else 0) p &= 0xff # q = inverse of p in GF(2^8) q ^= q << 1 q ^= q << 2 q ^= q << 4 q ^= 0x09 if q & 0x80 else 0 q &= 0xff xformed = q ^ _rotl8(q, 1) ^ _rotl8(q, 2) ^ _rotl8(q, 3) ^ _rotl8(q, 4) sbox[p] = (xformed ^ 0x63) & 0xff if p == 1: break sbox[0] = 0x63 return bytes(sbox) def _rotl8(x: int, n: int) -> int: return ((x << n) | (x >> (8 - n))) & 0xff The S-box is generated once at module import (pre-computed) and stored as a 256-byte table for the rest of the process lifetime. The expected first few S-box bytes (`63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76 …`) match the FIPS-197 specification table A.2. ##### The round constants Rcon values for the key expansion are the powers of 2 in $\text{GF}(2^8)$: _RCON = [0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36] These are sufficient for AES-256's 14-round key schedule (10 Rcon values needed for the 8 derived 256-bit round keys). ##### The encryption mode: CTR AES-CTR (NIST SP 800-38A) turns the block cipher into a stream cipher: $$ C_i = P_i \oplus E_K(\text{CTR}_i) $$ where $$ \text{CTR}_i = \text{IV} \,\Vert\, \text{INT}_{64}(i) $$ is the 128-bit counter block — 8-byte random IV concatenated with an 8-byte big-endian block index $i$. Properties: 1. **No padding required** — CTR can encrypt arbitrary-length plaintexts byte-exactly. 2. **Parallelizable** — block $i$ depends only on the IV and $i$, not on $C_{i-1}$. 3. **Random-access decryption** — the SQLite column index translates directly to the block counter, so encrypted blob lookups don't need full-blob decryption. 4. **No bit-error propagation** — a corrupted byte in $C_i$ corrupts only that byte in $P_i$, not subsequent blocks. The IV is random per encryption (8 bytes of `secrets.token_bytes(8)`), stored as the prefix of the ciphertext blob. The block counter starts at 0 and increments per 16-byte block. ##### The complete encrypt API def aes256_ctr_encrypt(plaintext: bytes, key: bytes) -> bytes: """Encrypt with AES-256-CTR; returns IV(8) || ciphertext.""" iv = secrets.token_bytes(8) round_keys = _aes_key_expand_256(key) blocks = [] n_blocks = (len(plaintext) + 15) // 16 for i in range(n_blocks): ctr_block = iv + struct.pack('>Q', i) # 16 bytes ks = _aes_encrypt_block(ctr_block, round_keys) # 16 bytes keystream pt_chunk = plaintext[i*16 : (i+1)*16] ct_chunk = bytes(p ^ k for p, k in zip(pt_chunk, ks[:len(pt_chunk)])) blocks.append(ct_chunk) return iv + b''.join(blocks) Decryption is the same XOR operation (CTR is its own inverse): def aes256_ctr_decrypt(blob: bytes, key: bytes) -> bytes: iv, ciphertext = blob[:8], blob[8:] round_keys = _aes_key_expand_256(key) blocks = [] n_blocks = (len(ciphertext) + 15) // 16 for i in range(n_blocks): ctr_block = iv + struct.pack('>Q', i) ks = _aes_encrypt_block(ctr_block, round_keys) ct_chunk = ciphertext[i*16 : (i+1)*16] pt_chunk = bytes(c ^ k for c, k in zip(ct_chunk, ks[:len(ct_chunk)])) blocks.append(pt_chunk) return b''.join(blocks) ##### Conformance The implementation passes the **FIPS-197 known-answer tests** (NIST AES validation vectors C.3 for 256-bit key) at bit exactness. Test vectors run at module import in debug builds; in production they run in `tests/test_aes_kat.py`. #### 10.4 — Why from scratch and not `pycryptodome` The choice to implement AES-256-CTR and PBKDF2-HMAC-SHA256 directly in stdlib (rather than depending on `pycryptodome`, `cryptography`, or `pyca/cryptography`) is a deliberate architectural decision driven by four constraints: 1. **Air-gap installability.** The system must install on networks where `pip install` of non-stdlib crypto libraries is either forbidden by policy or impossible due to network egress restrictions. By committing to stdlib-only crypto, O1-O's installation surface is reduced to "Python 3.10+ is available" — which is satisfied on every modern Linux distribution out of the box. 2. **Audit transparency.** Every byte of the AES round function, the key schedule, the GF($2^8$) arithmetic, and the CTR mode is in this repository, readable in `core/operations_db.py`. There is no opaque dynamic library, no C extension binary, no third-party-vendored compiled artifact. The cryptographic implementation is auditable by reading 886 LOC of Python. 3. **Supply-chain isolation.** Cryptographic libraries are a high-value attack surface for supply-chain compromise (memorable historical incidents: `pyjwt` 0.4 backdoor, `event-stream` npm compromise, `colors.js` self-sabotage, `xz-utils` 2024 backdoor). By eliminating the crypto dependency entirely, the entire class of crypto-library supply-chain risk vanishes. 4. **Bit-exact reproducibility.** Pure Python with stdlib-only dependencies produces bit-exact identical encrypted blobs across runtime versions, OS versions, and CPU architectures (within the limits of Python's integer arithmetic, which is platform- independent). C-extension-backed crypto libraries do not always guarantee this — they inherit the underlying C library's platform-specific optimizations. Pure-Python AES runs at ~10 MB/s on a Mac mini. For the Operations DB workload — kilobytes of credential and C2 metadata per engagement, not megabytes of bulk data — the throughput sits orders of magnitude above the DB's I/O bandwidth needs. Sovereignty is the structural property; throughput is the operational measurement, and the operational measurement fits the workload. #### 10.5 — Resume semantics The `/resume ` command rebuilds the full engagement state from the encrypted DB: 1. Prompt for passphrase (or read from `O1O_OPS_PASSPHRASE` environment variable). 2. Derive $K$ via PBKDF2 with `salt = op_id` and the passphrase. 3. Decrypt the `credentials` and `c2_channels` columns row-by-row. 4. Reconstruct the EngageIntelligence target model from the joined rows. 5. Resume the engagement loop at the phase that was active at the last persist point. The persist semantics guarantee that a resumed engagement makes the same decisions as if it had run continuously: the target model is the *only* state input to the V2 layer's plan methods, and the target model is what is persisted. Resume is therefore engagement- trace-preserving. #### 10.6 — Sovereignty as a system property Layers 1–9 of O1-O are deterministic, auditable, and air-gap-capable. Layer 10 makes the sovereignty property *operational*: the encrypted state at rest, the resume capability, and the cross-engagement memory are all available in environments where the standard toolchain of "install crypto library + cloud vault + remote KMS" is structurally forbidden. This is the same architectural choice as the inline PE/ELF/Mach-O parsers in Layer 7 and the regex-only knowledge extraction in Layer 2: every external dependency that *could* be eliminated *is* eliminated, at the cost of slightly more in-tree code, in exchange for guaranteed installability across the entire operational deployment surface. ### Layer 11 — Specialty surface exploiters Dedicated subsystems per attack surface, loaded lazily: - WiFi exploiter (`/wifi`) — evil twin, captive portal, deauth, handshake capture, PMKID - USB exploiter (`/usb`) — Rubber Ducky / BadUSB / HID autoinfect, autorun, macro docs - VPN exploiter (`/vpn`) — OpenVPN/WireGuard config clone/hijack, DNS redirect - Email exploiter (`/email`) — EWS / IMAP / Microsoft Graph / OAuth persistence / Sieve - ML exploiter (`/ml`) — Ollama/LM Studio/HuggingFace/MLX model extraction & injection - Credential trigger engine (`/creds`) — automatic credential-cascade exploitation - Miner deployer (`/mine`) — hardware-adaptive miner generation - Hash monitor (`/hash-mon`) — file-integrity surveillance with VirusTotal lookups - Polyglot generator (`/polyglot`) — covered in Layer 7 - AST mutator (`/ast-mutate`) — covered in Layer 5 Plus 52 helper-agent templates for OPSEC profile classification and per-tool guided configuration. ## The 11-step main pipeline Every free-text intent in the REPL runs through this sequence (`run_live_pipeline`, `o1o_live.py:6263+`): 1/11 Parse Intent — IntentParser.parse() 2/11 Query Knowledge Graph — KnowledgeEngine.infer() — top-5 inference chains 3/11 Assemble Code — CodeAssembler.assemble() — color pipeline, then triplet, then V4 4/11 Compile Check — ast.parse() + compile() 5/11 Detection Evasion — scan → semantic transforms → syntactic mutation, iterate until clean 6/11 Formal Verification — structural intent + property + algebraic + symbolic + taint + logic 7/11 Write Session Artifacts — meta.json, source, provenance 8/11 Package Standalone Exec — PyInstaller / nuitka via ToolPackager 9/11 OPSEC Hardening — OpsecPackager 10/11 OPSEC Vulnerability Audit — OpsecAuditor with remediation rounds 11/11 Threat Model + Deploy Gd — per-tool intelligence product Typical latency: 270 ms – 613 ms per tool, end-to-end. Deterministic — the same intent with the same configuration produces bit-identical output. ## End-to-end worked examples #### WE.1 — A defensive tool: SIEM detection rule generator **Intent**: `"build a sigma rule generator from windows event log patterns"` **Reproduce**: python3 src/o1o_live.py "build a sigma rule generator from windows event log patterns" **Trace through the pipeline:** | Step | Output | |---|---| | 1 — Parse intent | `mode=BUILD`, `entities=['sigma', 'rule', 'generator', 'windows', 'event', 'log']`, `requires_output=True`, `is_composition=False` | | 2 — Knowledge query | 4 inference chains. Top chain: `(sigma_rule, GENERATES_FROM, windows_event_log) → (sigma_rule, FORMAT, yaml) → (windows_event_log, PARSED_BY, evtx_parser)` | | 3 — Color chain detected | `[PATH, STRUCT, TEXT, VOID]` — read .evtx, parse to event dicts, generate yaml strings, write to file | | 4 — Fragment resolution | `evtx_read` (PATH→STRUCT), `pattern_extract` (STRUCT→STRUCT), `sigma_template_render` (STRUCT→TEXT), `file_write` (TEXT→VOID) | | 5 — Type validation | All 4 edges direct match. 0 violations. | | 6 — Code emission | 287 LOC including imports, main function with argparse, sigma rule template, .evtx parser, output writer | | 7 — Compile gate | PASS | | 8 — Structural intent | `windows` keyword → expects `winreg` or `Evtx`; `log` keyword → expects `logging`. Both present. | | 9 — Property verification | Determinism: PASS (same .evtx → same rule). Boundary: PASS (empty log → empty rule list, no crash). | | 10 — Detection scan | 0 detections (defensive tool, no offensive signatures). | | 11 — OPSEC profile | `defensive_generator` — no hardening required. | | Total | **312 ms** end-to-end. | **Output artifacts** (in session folder): - `generated.py` — 287 LOC source - `README.txt` — usage instructions - `Dockerfile` — containerized deployment - `meta.json` — full pipeline trace with provenance - MITRE mapping: T1059.001 (PowerShell detection) — by virtue of the sigma rules generated; the *generator* itself maps to detection-engineering tactics. #### WE.2 — An offensive tool: SSH credential brute-force with adaptive retry **Intent**: `"create an SSH brute force tool with credential rotation targeting 10.0.0.5"` **Reproduce**: python3 src/o1o_live.py "create an SSH brute force tool with credential rotation targeting 10.0.0.5" **Trace through the pipeline:** | Step | Output | |---|---| | 1 — Parse intent | `mode=BUILD`, `entities=['ssh', 'brute', 'force', 'credential', 'rotation']`, `params={'target': '10.0.0.5'}`, `requires_output=True` | | 2 — Knowledge query | 2 inference chains. Top: `(ssh_brute_force, IMPLEMENTS, paramiko_ssh_client) + (credential_rotation, USES, password_list_cycle)` | | 3 — Color chain | `[VOID, VOID]` — self-contained offensive tool, bypasses chain mechanic, routes through VOID-pattern lookup | | 4 — Fragment resolution | `ssh_brute_paramiko` (the matched VOID-pattern fragment) — 247 LOC standalone | | 5 — Configuration injection | `{target: '10.0.0.5', port: 22, user: 'root', wordlist_path: 'passwords.txt'}` injected via template variable substitution | | 6 — Code emission | 261 LOC, class-based, with `SSHBruteForcer` class wrapping paramiko + retry logic + result logger | | 7 — Compile gate | PASS | | 8 — Structural intent | `ssh` → `paramiko` required, present. `brute` → loop construct required, present. | | 9 — Detection scan | **3 detections** — `socket_connect`, `credential_brute`, `paramiko_ssh_client`. Semantic evasion engaged: 2 transforms applied. Re-scan: 0 detections. | | 10 — OPSEC profile | `offensive_active` — hardening adds: random sleep jitter, exception suppression, success log encryption, deletion timer. | | 11 — Threat model + deployment guide | T1110.001 (Brute Force: Password Guessing), T1078 (Valid Accounts) — emitted as ATT&CK Navigator JSON. | | Total | **419 ms** end-to-end. | **Output artifacts**: - `generated.py` — 261 LOC source after semantic evasion - `` — standalone executable (PyInstaller-packaged) - `README.txt` — operational usage - `Dockerfile` + `docker-compose.yml` — containerized deployment with isolated network - `opsec/cleanup.sh`, `opsec/runtime.sh`, `opsec/network.conf`, `opsec/sanitize.py` - `infra/redirector.conf`, `infra/malleable.profile`, `infra/burn.sh` - `dropper.py`, `stager.py`, `delivery_powershell.txt`, `usb_autorun.inf`, `usb_launch.bat` - `opsec_audit.txt` — 100/100 grade A - `threat_model.txt` — MITRE mapping + IOCs - `deployment_guide.txt` — operational walkthrough - `meta.json` — full provenance #### WE.3 — An autonomous engagement: `/engage` against a multi-service target **Command**: `/engage 10.0.0.5 --chain --max-tools 8` **Reproduce** (from within the REPL): python3 src/o1o_live.py # at the O1-O> prompt: /engage 10.0.0.5 --chain --max-tools 8 **Engagement trace** (abridged): Phase 1 — Reconnaissance 47-port TCP scan, 30 workers, 8.2s wall-clock → 4 services discovered: 22/tcp ssh OpenSSH_8.4p1 80/tcp http nginx/1.21.0 445/tcp smb (banner suppressed) 3306/tcp mysql MySQL 8.0.28 Phase 2 — Tool selection (V2-prioritized) → 8 tools selected, ordered by mission objective "gain access": 1. SSH credential testing 2. HTTP application scanner 3. SMB enumeration 4. MySQL credential testing 5. HTTP exploit scanner 6. SSH credential harvester (post-exploit) 7. Post-exploit enumeration 8. Lateral SSH movement (conditional on credential success) Phase 2.5 — Counter-detection pre-flight ✓ EDR detection script staged (edr_detect.py, 312 LOC) ✓ Canary detector loaded — armed against AWS canary keys Phase 3 — Generate + Deploy (each tool runs the 11-step pipeline) [1/8] SSH credential testing Generated: 261 LOC, 312ms Deploy: SCP → SSH execute → 4.7s Result: 1 credential discovered (user:devops, pass:****) [2/8] HTTP application scanner Generated: 198 LOC, 287ms Deploy: 4.2s Result: /admin/ exposed, /backup.sql discovered [3/8] SMB enumeration Generated: 224 LOC, 318ms Deploy: 3.8s Result: share \\10.0.0.5\backup readable [4/8] MySQL credential testing Generated: 173 LOC, 269ms Deploy: timeout (extended), 22.4s Result: auth_failure → adaptive retry → success on retry 2 ... [5/8 through 8/8 elided] ... Phase 4 — Result chaining (with --chain) Discovery: credential (devops, ****) → spawn SSH command executor Discovery: /backup.sql exposed → spawn data exfiltration beacon Discovery: shared backup readable → spawn file harvest tool → 3 follow-up tools generated + deployed Phase 4.5 — V2 Lateral Movement Target model: 1 credential, 0 SSH keys yet → Lateral plan: try (devops, ****) against discovered alternate hosts → 0 lateral targets discovered yet (no plan_pivot triggered) Phase 5 — Engagement Report Tools generated: 11 (8 main + 3 chained) Tools deployed: 11/11 Wall-clock: 167.4 s Generation time: 3.4 s (sum across all tools) Deploy time: 153.2 s (network-bound) Credentials: 2 (1 main, 1 chained) Files exfiltrated: 3 (backup.sql, .env, /etc/passwd) MITRE coverage: T1110.001, T1046, T1135, T1078, T1041, T1083, T1003 Operations DB persistence ✓ All credentials AES-256-CTR encrypted at rest ✓ All c2_channels AES-256-CTR encrypted at rest ✓ Resume capability: /resume **End-to-end wall-clock: 167.4 seconds** for an 11-tool engagement against a 4-service target. The generation pipeline produced 11 tools in 3.4 cumulative seconds; the remaining 164 seconds is network I/O (TCP scan + SSH deploys). ## The `.causal` substrate — peer-reviewed validation #### Conferences and host institutions | Conference | Full title | Host institution | Location | Dates | |---|---|---|---|---| | **IEEE ICECET 2026** | 6th International Conference on Electrical, Computer and Energy Technologies | **Medipol University** | **Rome, Italy** | July 6-9, 2026 | | **IEEE-NANO 2026** | 26th IEEE International Conference on Nanotechnology — *flagship conference of the IEEE Nanotechnology Council* | **Nanjing University International Conference Center** (General Chair: Prof. Xinran Wang, Nanjing University) | **Nanjing, China** | July 5-8, 2026 | | **IEEE IRI 2026** | 27th IEEE International Conference on Information Reuse and Integration for Data Science | **University of Washington Bothell** | **Seattle, WA, USA** | July 31 - August 2, 2026 | | **NURER 2026** | 8th International Conference on Nuclear and Renewable Energy Resources — *organized in cooperation with the International Atomic Energy Agency (IAEA)* | **NURER Conference Organizing Committee** | **Almaty, Kazakhstan** | September 10-12, 2026 | All four are IEEE-affiliated international conferences with peer-reviewed proceedings. IEEE IRI maintains a sustained acceptance rate under 30 %. IEEE-NANO is the flagship conference of the IEEE Nanotechnology Council. NURER is co-sponsored by the IAEA. The IBM z/OS technical report is published independently under CC-BY 4.0 (SSRN preprint 6298178, February 2026). Vendor coordination follows a 90-day responsible disclosure window from the disclosure date in the report. #### Published 2026 — formal substrate and inference engine **`.causal` format specification** — IRI Seattle 2026, Paper 78 *"The .causal Format: Embedded Deterministic Inference for Domain-Agnostic Knowledge Graph Amplification."* **Accepted at the 27th IEEE International Conference on Information Reuse and Integration for Data Science** (IEEE IRI 2026), hosted by the **University of Washington Bothell**, Seattle, WA, USA, July 31 - August 2, 2026. 8-byte magic header, msgpack + zlib payload, three-pass embedded inference (exact keyword chaining, semantic direction propagation, Jaro–Winkler fuzzy matching at $\geq 0.85$), executed during deserialization. **1.9× to 6.8× amplification, 20.8× compression on genomic data (3.06 GB → 147 MB)**, **497/497 correctness against independent nuclear reference data**. Validated across **seven scientific domains** including biomedicine, incident analysis, genomics, cryptanalysis, security tool synthesis, nuclear physics, gravitational waves. **Closed-loop knowledge discovery** — IRI Seattle 2026, Paper 1 *"Input-Agnostic Causal Knowledge Discovery with Deterministic Validation and Autonomous Gap Resolution."* **Accepted at the 27th IEEE International Conference on Information Reuse and Integration for Data Science** (IEEE IRI 2026, Seattle, USA, summer 2026). Three-stage closed loop: candidate triplet extraction → 14-predicate Foss Gate validation → autonomous gap-driven retrieval. Cross-domain evaluation on six source types, **$1.9\times$ to $6.8\times$ amplification, 497/497 ground-truth correctness on nuclear decay chains, 5.2× network expansion on $\sim$700K-variant consumer-grade genotype data** ($N=1$, clinically confirmed). Three model architectures with 9× variation in extraction rates produce byte-identical validated output. **This paper is the symbolic ancestor of O1-O's Layer 6 self-improvement loop.** **Deterministic LLM validation** — ICECET 2026, Paper 1143 *"Deterministic Validation for Reliable LLM-Based Causal Knowledge Extraction."* **Accepted at the 6th IEEE International Conference on Electrical, Computer and Energy Technologies** (IEEE ICECET 2026), hosted by **Medipol University**, Rome, Italy, July 6-9, 2026. The 14-step Foss Gate. **88 % precision on DocRED, 100 % semantic F1 on CRED, 100 % byte-level determinism across 150 repeated extractions**. Cross-model validation on Qwen-8B, Gemma-2B, and Llama-3B confirms determinism is a property of the validation architecture, not the model. Stochastic-sampling experiments at temperature 0.8 confirm no hidden randomness. #### Published 2026 — applied cryptanalysis (CASI metric) **ARX cipher full-round structural leakage** — ICECET 2026, Paper 1276 *"Persistent Cross-Round Carry Leakage in ARX Ciphers: Detection, Prediction, and Topological Classification."* **Accepted at the 5th IEEE International Conference on Electrical, Computer, Engineering and Telecommunications** (ICECET 2026, Rome, Italy, July 6-9, 2026). **The first full-round known-key distinguisher for the entire Speck family** ($Z > 4{,}000$ at all specified rounds) **and for Threefish-256 at all 72 rounds** ($Z \approx 5{,}900$). Closed-form prediction function $\text{MI}(\beta) = 0.78 \cdot \exp(-1.42\beta)$ with $R^2 = 0.999997$. Universal leak-model compiler at $9/9$. Topological classification identifies necessary-and-sufficient conditions for the leakage class. Applied to 13 cipher families including ChaCha20, AES-128, SPARX, LEA — full-round immune. **Compression-based cipher trust verification** — IEEE-NANO 2026 (flagship) *"Compression-Based Trust Verification of Lightweight Ciphers Deployed in Nano-IoT Communication Standards."* **Accepted at the 26th IEEE International Conference on Nanotechnology** (IEEE-NANO 2026) — the **flagship conference of the IEEE Nanotechnology Council** — hosted at the **Nanjing University International Conference Center**, Nanjing, China, July 5-8, 2026. **The largest single-methodology comparison of deployed lightweight ciphers**: 41 implementations across 9 architectural families covering ISO/IEC 29167 (RFID air-interface security) and NIST SP 800-232 (ASCON). **All 10 Speck variants (ISO/IEC 29167-22) exhibit statistically significant structural deficiencies at full rounds** ($t = +17.9$ to $+40.4$, $p < 10^{-6}$). **Autonomous detection of 4 implementation faults in SKINNY-128 and KATAN through output analysis alone.** Reliable detection with as few as $N = 1{,}000$ samples at cipher-specific frontier rounds. **Cipher-agnostic security margin** — ICECET 2026, Paper 1141 *"Causal Graph Topology for Automated Security Margin Analysis and Blind Cipher Identification."* **Accepted at the 5th IEEE International Conference on Electrical, Computer, Engineering and Telecommunications** (ICECET 2026, Rome, Italy, July 6-9, 2026). The CASI metric formally defined as $\text{CASI}(r) = S_{\text{signal}}(r) / S_{\text{signal}}(\text{full})$. **CASI detection frontiers align with published cryptanalysis on six cipher families across four architectural classes.** **Matches Gohr's neural distinguisher on Speck 32/64 within one round** — achieved without training data or GPU computation. Correct **blind identification of all five tested architecture classes from graph topology alone**. **Post-quantum ciphertext distributional analysis** — ICECET 2026, Paper 1142 *"Compression Isolation of Distributional Signatures in NIST Post-Quantum Ciphertext."* **Accepted at the 6th IEEE International Conference on Electrical, Computer and Energy Technologies** (IEEE ICECET 2026), hosted by **Medipol University**, Rome, Italy, July 6-9, 2026. **The first black-box statistical characterization of output from the three NIST post-quantum cryptographic standards** ML-KEM (FIPS 203), ML-DSA (FIPS 204), and HQC. Distributional distance from uniform random measured with stability confirmed over five independent replications at $5 \times 10^5$ samples. **Crypto-CASI cleanly separates the three PQC families.** Compression isolation experiment on ML-KEM separates the non-uniformity of LWE coefficient distributions after polynomial compression from byte-alignment artifacts from bit-packing. #### Published 2026 — nuclear domain **Backward causal inference on nuclear knowledge graphs** — NURER 2026, Paper 1 *"Backward Causal Inference on Nuclear Knowledge Graphs: Domain-Specific Entity Resolution, Fault Path Discovery, and Blind Accident Prediction."* **Accepted at the 8th International Conference on Nuclear and Renewable Energy Resources** (NURER 2026) — **organized in cooperation with the International Atomic Energy Agency (IAEA)** — Almaty, Kazakhstan, September 10-12, 2026. 240 nuclear domain triplets across decay chains, reactor physics, material degradation, and safety systems. **Nuclear entity resolver: 97.8 % accuracy on 1,190 isotope notation pairs vs. 19.9 % for generic Jaro–Winkler** — and inference complexity reduced from $O(N^2)$ to $O(N^{1.3})$, a **373× speedup with higher precision**. **Validation against 1,512 NUBASE2020 nuclides confirms 497/497 inferred decay chains with zero false positives.** Blind prediction on TMI-2 with only general PWR domain knowledge recovers 4 of 5 known root causes (80 %). Möbius velocity addition as the multiplicative confidence model for hub aggregation. **Spectral signatures of PRNGs in nuclear Monte Carlo** — NURER 2026, Paper 2 *"Spectral Signatures of Pseudo-Random Number Generators in Nuclear Monte Carlo Codes: Detection via Markov Transition Analysis and Compression-Based Exchangeability Testing."* **Accepted at the International Conference on Nuclear and Renewable Energy Resources** (NURER 2026, September 2026). **Sharp binary phase transition across 15 generator families**: all structurally adequate PRNGs (MT19937, PCG-32, SplitMix64, xoshiro256, Xorshift128, AES-CTR, LFSR-32) cluster at spectral gap $\gamma \approx 0.925$ with **universal ratio $\gamma/\varphi \approx 1.87$ to the Cheeger conductance**. All LCG-type generators exhibit $\gamma = 0$, including MCNP's default 48-bit LCG. **No generator falls between these regimes.** Period function $g(\lambda) = (1 - \lambda^2)^{-1/2}$ via the **Möbius–Lorentz correspondence on Markov chain spectra**. Detection in as few as 500 bytes. Secondary contribution: dead-time-induced correlations in radiation detector inter-event times (+90 % to +360 %). #### Published 2026 — applied production assessment **IBM z/OS mainframe cryptographic security assessment** — CC-BY 4.0 technical report (SSRN preprint 6298178), February 2026 *"Cryptographic Security Assessment of IBM z/OS Mainframe Infrastructure Using CASI Distributional Analysis."* **The first comprehensive quantitative cryptographic security assessment of IBM z/OS**, the platform that processes an estimated 87 % of global credit card transactions and hosts critical infrastructure for banking, insurance, government, and healthcare. Performed in two authorized environments: Hercules 4.9.1 emulation with TK5 MVS 3.8j for initial algorithm analysis, and the vendor's free educational platform (real IBM z15 hardware, z/OS V2.5) for production validation. **No zero-days disclosed, no privilege escalation attempted or achieved** — standard student account only. **50 findings across 8 security domains.** Headline results: - **RACF Legacy DES key derivation has 42.17 bits of effective entropy, not 56** — a 24.7 % reduction caused **99.7 %** by the EBCDIC CP037 encoding step, isolated by component ablation. **220 of 256 possible byte values (85.9 %) never occur** in derived keys. CASI $\|Z\| = 10{,}263{,}251$, six orders of magnitude above the $p_{95}$ threshold. - **Bit-for-bit validation on real IBM z15 hardware (z/OS V2.5)** via the z/OSMF REST API: **4 of 4 test passwords produce identical EBCDIC, identical DES keys, identical output at every pipeline stage** between the local model and the production RACF implementation. The fourth password was the account's own system-assigned credential — validating against a non-dictionary input. - **The 42-bit RACF DES keyspace exhausts in 7.6 minutes on a single NVIDIA RTX 4090 at $0.08 cloud cost** (1.1 minutes on an 8× A100 cluster at $0.15). hashcat supports RACF DES hashes natively as mode 8500. Consumer GPUs have made the 42-bit keyspace trivially crackable since approximately 2020. **KDFAES (PBKDF2-HMAC-SHA256) has been available since 2007** — 19 years — and reduces CASI to 4.9, indistinguishable from random. - **TN3270 cleartext detection at a 14,000:1 signal ratio with zero false positives.** CASI distinguishes cleartext EBCDIC ($\|Z\| = 87{,}411$) from AES-GCM encrypted traffic ($\|Z\| \approx 4\text{-}6$) by byte-level distributional properties alone — no protocol parsing. - **Configuration debt across every layer** of the vendor's educational z/OS infrastructure: TN3270 transmits RACF credentials in cleartext, IBM MQ V9.4.5 operates with **all 23 of 23 channels unencrypted** (`SSLCIPH( )` empty, `SSLTASKS(0)`, no SSL keyring), ICSF runs `CHECKAUTH(NO)` (no RACF authorization checks on hardware crypto), and an AT-TLS policy created in 2016 was never applied to the production TN3270 service — written once, never activated. - **End-to-end attack chain: < 8 minutes total** from passive TN3270 capture to RACF account access to ICSF hardware crypto and MQ messaging without further authorization. Cross-system MQ channel definitions to multiple external IBM systems with empty `SSLCIPH( )` were documented from configuration data; remote systems were **not** accessed, scanned, or tested. **Production-ready detection tools released with the report**: `tn3270_scanner.py` (CASI-based subnet scanner with JSON/CSV output and CI/CD exit codes), `tn3270-casi-detect.nse` (Nmap NSE script), `irrdbu00_parser.py` (RACF IRRDBU00 audit parser identifying Legacy DES accounts, stale passwords, and SPECIAL+DES risk combinations), and `racf_hashcat_pipeline.py` (mainframe-specific hashcat input generator). **The fix for every finding already exists in z/OS.** KDFAES migration, AT-TLS configuration, MQ SSL, and ICSF authorization are all single configuration changes. *The gap is not capability — it is configuration.* The CASI engine driving this assessment is the same metric formally defined in ICECET 2026 Paper 1141 and shipped as the `live-casi` package below. #### Companion open-source tool: `live-casi` The CASI metric is shipped as a standalone Python package: - **PyPI**: [`pip install live-casi`](https://pypi.org/project/live-casi/) — black-box cryptographic quality analysis on any byte stream - **GitHub**: [github.com/DT-Foss/live-casi](https://github.com/DT-Foss/live-casi) `live-casi` applies the same CASI metric used in the four cryptanalysis papers to any runtime byte stream. The package is what makes the substrate's findings independently reproducible: drop in a PRNG output, drop in a cipher ciphertext, drop in an arbitrary byte source — the same metric returns the same signed Z-score profile that classifies the source. #### Cross-references - [dotcausal.com](https://dotcausal.com) — formal substrate documentation - [github.com/dotcausal](https://github.com/dotcausal) — substrate reference implementations - [github.com/DT-Foss/live-casi](https://github.com/DT-Foss/live-casi) — runtime CASI metric - [github.com/DT-Foss/gssm](https://github.com/DT-Foss/gssm) — GSSM (pillar 1 of the O1 stack) - [github.com/DT-Foss/O1](https://github.com/DT-Foss/O1) — O1 (pillar 2 of the O1 stack) #### What this implies for O1-O The substrate carries the audit-trail discipline, the deterministic validation, the multi-pass inference engine, and the format specification into this repository unmodified. The seven publications listed above establish the substrate's properties on independent scientific domains. O1-O applies the same substrate to deterministic code synthesis — the seventh domain explicitly enumerated in the `.causal` format paper. ## Sovereign engineering decisions — a complete accounting #### S.1 — The four-dependency installation profile $ cat requirements.txt msgpack>=1.0.0 # binary .causal format read/write jellyfish>=1.0.0 # Jaro-Winkler fuzzy entity matching (Pass 3 inference) requests>=2.28.0 # web harvester only (Layer 2.8 — optional) beautifulsoup4>=4.11.0 # web harvester only (Layer 2.8 — optional) Two of the four (`requests`, `beautifulsoup4`) are used exclusively by the optional web-harvester. A pure inference + composition + verification + engagement install needs only `msgpack` and `jellyfish`. Both are pure-Python wheels (no native code dependency), both vendorable, both auditable in under 5,000 LOC each. #### S.2 — Sovereign choice catalog The eight decisions where O1-O implements something the rest of the ecosystem typically depends on a library for: | Subsystem | Eliminated dependency | In-tree replacement | LOC | Reasoning | |---|---|---|---:|---| | AES-256-CTR | `pycryptodome` / `cryptography` | `core/operations_db.py` inline AES + Rijndael S-box + key schedule | ~400 | air-gap, audit, supply-chain isolation, bit-exact reproducibility | | PBKDF2-HMAC-SHA256 | `pycryptodome` / `cryptography` | `hashlib.pbkdf2_hmac` (stdlib) | 1 line | stdlib-only | | PE binary parsing | `pefile` | `core/platform_adapter.py` inline PE parser | ~300 | air-gap, no native lib | | ELF binary parsing | `pyelftools` | `core/platform_adapter.py` inline ELF parser | ~300 | air-gap, no native lib | | Mach-O binary parsing | `macholib` | `core/platform_adapter.py` inline Mach-O parser | ~250 | air-gap, no native lib | | Mach-O Fat parsing | `LIEF` | `core/platform_adapter.py` inline FAT magic detection | ~100 | air-gap, no native lib | | Knowledge extraction | LLM-based extraction (any model) | `core/web_harvester.py` seven causal regex patterns | 120 | auditability — every triplet has a literal source match | | NLP / intent parsing | spaCy / NLTK / transformers | `core/intent_parser.py` classical tokenize+stem+set-intersection-disambiguation+Jaro-Winkler | 412 | µs latency, KB-sized footprint, deterministic | | Network port scanning | nmap binary / python-nmap / scapy | `LiveReconEngine` inline TCP-connect scanner with banner grab | ~210 | air-gap, no native lib | | TCP / SSH deployment | paramiko (where avoidable) | `DeployEngine` shells out to OS `scp` / `ssh` | ~160 | leverages OS-provided tooling, no Python crypto dep | | Code formatting / linting | black / ruff / autopep8 | inline AST normalization | — | own engine, fits the deterministic-emission constraint | | SAT / SMT solving | z3-solver | `core/symbolic_executor.py` lightweight constraint propagation | 561 | targeted analysis, no native lib | | Property-based testing | hypothesis (optional, fallback) | `core/verifier.py` Hypothesis is used when available, falls back to direct execution | 98 | graceful degradation | The total *non-stdlib* footprint of the runtime is four pure-Python wheels. #### S.3 — Air-gap deployment scenario A complete O1-O installation on a fully air-gapped network: # Phase 1: on an internet-connected mirror box pip download msgpack jellyfish requests beautifulsoup4 \ --dest ./o1o-wheels --platform any --python-version 3.10 # Phase 2: physically transfer the wheel cache to the air-gapped network # (USB drive after AV scan, or one-way data diode, or any approved transfer mechanism) # Phase 3: on the air-gapped target pip install --no-index --find-links=./o1o-wheels \ msgpack jellyfish requests beautifulsoup4 git clone /O1-O.git # if internal git mirror exists # OR tar xzf O1-O.tar.gz # if transferred as archive cd O1-O python3 src/o1o_live.py --demo # runs offline, no further network needed The installation procedure does not require, at any point, the target network to reach the public internet. The runtime does not require any post-install license check, no phone-home, no remote configuration fetch, no remote model download. Compare this to LLM-codegen tools: | Tool | Installation surface | Internet egress needed at runtime | |---|---|---| | GitHub Copilot | GitHub auth, model API endpoint | continuously (every keystroke) | | Cursor | Cursor auth, model API endpoint | continuously (every keystroke) | | Claude Code | Anthropic API key, model API | continuously (every interaction) | | Devin | Cognition auth, model API, browser sandbox | continuously | | Local LLM (Ollama / vLLM) | GB-scale model download, GPU drivers, CUDA / Metal | at install (model download); after that, optional | | **O1-O** | **4 pip wheels + git clone** | **none** | #### S.4 — Sovereign-cloud deployment scenario Government sovereign clouds (Bleu in France, T-Systems-SoVerein in Germany, Bundeswolke, GovCloud variants in the US, JFD in Japan) impose constraints O1-O is structurally compatible with: - **No third-party SaaS APIs.** O1-O makes zero outbound calls during normal operation. - **No proprietary closed-source dependencies.** O1-O is Apache-2.0 throughout; all four Python dependencies are permissively licensed open source. - **Auditability.** Every line of crypto, every line of binary parsing, every line of NLP is in this repository. There is no closed-source library to attest, no native binary to reverse-engineer. - **Bit-exact reproducibility.** Deterministic output enables compliance-grade audit trails. A regulator can run the same intent on the same target and verify byte-identical results. - **Format independence.** The `.causal` knowledge graphs are a documented binary format (msgpack + zlib) — they can be inspected, modified, and audited without the runtime. This is the architectural property that makes O1-O viable for **financial sector compliance code generation** (PCI-DSS audit trails), **medical device firmware** (FDA documentation reproducibility), **defense and intelligence operational tooling** (classified-network installability), and **regulatory adversary emulation** (deterministic red-team trace for audit). LLM-codegen tools cannot reach these deployment surfaces at any tuning budget — the architecture forbids it. #### S.5 — Supply-chain risk mathematics Each external dependency multiplies supply-chain risk. The exposure model is multiplicative: $$ \text{Trust_required}(\text{system}) = \prod_{d \in \text{Dependencies}} \text{Trust}(d) $$ where each $\text{Trust}(d) \in [0, 1]$ is the operator's confidence in dependency $d$. A system with 100 transitive dependencies, each at $\text{Trust} = 0.99$, has overall trust $0.99^{100} \approx 0.366$ — a 63 % chance of *some* dependency being a compromise vector. The 2024 `xz-utils` backdoor (CVE-2024-3094, OpenWall mailing list 2024-03-29) is the canonical recent example: a low-frequency but high-impact compromise of a transitive dependency at the level of system libraries. The crypto library `pyjwt` had a backdoor in 0.4.x; the JavaScript `event-stream` package was compromised in 2018; `colors.js` and `faker.js` were self-sabotaged in 2022; `node-ipc` shipped destructive payloads in 2022. For O1-O's operational use cases — security tooling, regulated industries, sovereign-cloud — the multiplicative trust calculation is the dominant architectural concern. Four dependencies at $\text{Trust} = 0.99$ each yields overall trust $0.961$, an order of magnitude higher confidence than the typical 50–100 dependency system. #### S.6 — Operational performance profile In-tree pure-Python implementations and native C-backed libraries occupy different points on the throughput axis: | Subsystem | O1-O in-tree throughput | Native equivalent | Engagement wall-clock share | |---|---:|---:|---:| | AES-256-CTR | ~10 MB/s | `cryptography` ~300 MB/s | **0.5 %** | | PE parsing | ~1 ms per file | `pefile` ~0.1 ms | **0.8 %** | | ELF parsing | ~1.5 ms | `pyelftools` ~0.15 ms | **1.2 %** | | TCP port scan | 47 ports / 8 s | nmap 47 ports / 1 s | **3 %** | The shipped workload — kilobytes of credential metadata per engagement, one binary at a time during generation, 47 ports per engagement — fits inside the in-tree throughput envelope at single-digit percent of the total engagement wall-clock. A 30× speedup on crypto would lift overall engagement throughput by under 4 %. The in-tree path delivers the operational performance the workload requires, and it does so without taking on a single external dependency. #### S.7 — Sovereignty is a compositional property The eight in-tree implementations are realizations of one principle applied uniformly: **every external dependency that can be eliminated is eliminated**. The principle, applied once, produces a single self-contained module. Applied across the whole architecture, it produces a system that installs, runs, and audits as a unit in an environment with zero external trust. The compositional structure is the same as the determinism property: every individual layer is deterministic; composed, the whole system is bit-exact reproducible end to end. Sovereignty and determinism are both whole-system properties — neither comes from a single module, both come from the uniform application of one principle at every layer. This is the same compositional structure as the determinism argument from Layer 3: every individual layer is deterministic on its own; *composed*, they produce a system whose output is bit-exact reproducible end to end. Determinism and sovereignty are both compositional architectural properties — they are produced by the system as a whole, not by any single module within it. ## Genesis — how the architecture came to be #### G.1 — Foundation: the `.causal` substrate (2025–2026) The `.causal` substrate is the foundation. Binary-format causal knowledge graphs, the 14-step FOSS Gate for deterministic triplet validation, and the multi-pass inference engine were established in 2025 in the course of producing nine peer-reviewed papers at four 2026 IEEE conferences. See [dotcausal.com](https://dotcausal.com) for the formal treatment. Two earlier verticals built on the substrate: 1. **fabel** — conversational reasoning over `.causal` graphs with model-agnostic extraction validation (ICECET 2026). 2. **Symbolic synthesis for nuclear-knowledge-graph reasoning** — the IEEE-NANO Nanjing flagship paper. O1-O is the third vertical: the substrate applied to **deterministic code synthesis**. The 7-pass inference engine, the `.causal` binary format, the harvester pipeline, and the audit-trail discipline carry over directly. O1-O contributes the architectural layers above the substrate: the algebraic type system on data-flow categories (Layer 3), the seven-stage verification pipeline (Layer 4), the autonomous engagement operator (Layer 8), and the sovereign in-tree implementations of crypto and binary parsing (Layers 7 and 10). #### G.2 — The Color System (February 2026) The 8-color algebraic type system at the heart of Layer 3 is the structural mechanism that makes deterministic code synthesis possible. The design space for fragment composition has three candidates: 1. **LLM-driven composition** — a language model decides which fragment combines with which. This is the architecture every existing code-generation tool uses. 2. **Hardcoded recipes** — every fragment combination is pre-authored for every intent. This scales as O($n^2$) in the fragment count and requires continuous manual maintenance. 3. **Type-driven composition** — every fragment carries a type signature; composition is type-matched edge resolution in a closed graph. O1-O uses the third design. Eight colors (TEXT, STRUCT, TABULAR, BYTES, SERIAL, PATH, RESPONSE, VOID) span the data-flow category space at 100 % coverage of the 1,245-fragment registry with zero ambiguity. The eight are the **minimal spanning basis**: seven is insufficient — at least one pair of structurally-distinct categories collapses into a single color, producing composition errors; nine introduces redundancy without adding distinguishing power. VOID is the closure operator that admits standalone tools and pure functions into the same algebra. With VOID, server processes, schedulers, and whole tools live in one composition system. Without it, the architecture would require two parallel composition mechanisms. The 337 intent-chain regex patterns and the ~632 registry entries are the empirical realization of the algebra over the shipped knowledge base. AutoBridge (Loop 3 in Layer 6) extends the routing coverage to 100 % autonomously. The color algebra is the principle that closes the architecture: deterministic, auditable, algebraically reasonable composition. Every subsequent layer — verification, evasion, self-improvement, engagement — depends on this property. #### G.3 — The 7-pass inference engine The inference engine extends the explicit knowledge graph along seven independent dimensions. Each pass implements a structurally distinct inference rule: - **Pass 1 (exact-chain)** — transitive closure under confidence-weighted Kleene join. - **Pass 2 (semantic direction)** — propositional-logic-style direction propagation through chained mechanisms (positive / negative / neutral). - **Pass 3 (fuzzy-match)** — prefix-bucketed Jaro-Winkler bridges linguistically equivalent entities in O($n^2/k$) time. - **Pass 4 (analogical)** — transfers attributes across similar entities. - **Pass 5 (cross-domain analogy)** — compares mechanism-structure signatures across source graphs and transfers patterns. This is the key cross-graph operator: it lets `port_scanner` in `offensive_security.causal` and `nmap_automation` in `devops.causal` share knowledge because their mechanism signatures match, regardless of entity name. - **Pass 6 (contextual cross-graph)** — co-occurring entities activate the intersection of their neighborhoods. - **Pass 7 (creative recombination)** — bridge entities recombine endpoints across graphs, with the deepest confidence discount. The seven passes are independent and monotone: each strictly extends the graph or leaves it unchanged. The combined inference lifts the shipped knowledge base from **46,442 explicit triplets to 69,942 reachable triplets** at +51 % amplification. #### G.4 — The seven-stage verification pipeline Layer 4 partitions the error space into seven structurally distinct classes and assigns one stage to each: | Stage | Error class caught | |---|---| | 1 — Compile gate | Syntax violations | | 2 — Structural intent | Module / operation coverage gaps | | 3 — Hypothesis property | Random-input contract violations | | 4 — Algebraic properties | Determinism, boundary, commutativity, idempotence, associativity, identity, involution, monotonicity | | 5 — Symbolic execution | Dead code, overflow, divide-by-zero, infinite loops | | 6 — Taint analysis | Information-flow violations source → sink | | 7 — Logic consistency | Contradictory inferences in the triplet chain | The seven classes form a **partition** of the error space: every bug class falls into exactly one stage. The eight algebraic properties at stage 4 are the minimal set that catches the bug classes the other six stages structurally cannot — hidden state (determinism), missing edge cases (boundary), wrong commutative-semigroup operation (commutativity / identity), off-by-one cancellation (involution), retry-safety violations (idempotence). Every stage runs pre-emission. No byte of output is written until all seven verdicts return clean. This is what makes Layer 4 a *prevention* layer, not a detection layer. #### G.5 — The autonomous engagement operator `/engage` substitutes the human red-team operator with deterministic Python over a literal target-model dict. The V2 intelligence layer (~1,100 LOC across `core/engage_intelligence.py` and `core/engage_v2.py`) takes the per-tool latency of the synthesis pipeline (270–613 ms) and composes it into a **sub-30-second end-to-end engagement** on the standard service profile, with adaptive retry, lateral movement, and pivot execution. The architectural principle: every decision a human operator makes in a red-team engagement — tool selection, phase transition, output interpretation, lateral planning, pivot execution — is reducible to deterministic rules over a state dict and regex patterns over tool stdout. Layer 8 makes these rules explicit and runs them in software. #### G.6 — Sovereignty as a compositional system property The from-scratch crypto, the inline binary parsers, the regex-only knowledge extraction, and the embedding-free NLP are not eight independent decisions. They are eight realizations of a single architectural principle: **every external dependency that can be eliminated is eliminated**. Composed, they produce the air-gap-installability property that no individual subsystem could give. This is the most important compositional property of O1-O: sovereignty does not come from any single module; it comes from the discipline of taking the harder implementation path at every layer where the alternative would be a third-party library. The same compositional logic produces determinism (every layer is deterministic on its own; composed, the system is bit-exact reproducible end-to-end) and auditability (every layer is readable Python; composed, the whole system is auditable as a unit). The architecture is built on a single Mac mini, in the first half of 2026, on top of a pre-existing `.causal` substrate that already had peer-reviewed validation in unrelated domains. The combination of (a) the substrate, (b) the color algebra, (c) the seven-stage verification, (d) the autonomous operator, and (e) sovereignty as a compositional property is what produces a system that deterministic LLM-codegen architectures cannot replicate at any scaling budget. ## Reproduce # Boot the platform, run the 5-task showcase: python3 src/o1o_live.py --demo # Run the full 18-task showcase (Red Team + Blue Team + Generalization): python3 src/o1o_live.py --demo-full # Drive a single intent and exit: python3 src/o1o_live.py "build a port scanner with banner grabbing for 10.0.0.1" # Interactive REPL: python3 src/o1o_live.py # Run the autonomous self-improvement loop overnight: python3 src/self_improve_runner.py --iterations 1000 --hours 12 # Run a test of the deterministic pipeline directly: python3 src/quick_test.py In the REPL, `/help` lists all 28 slash commands. `/coverage` prints the MITRE ATT&CK coverage map. `/stats` prints knowledge-base statistics. ## Repository layout O1-O/ ├── README.md this file ├── LICENSE Apache-2.0 ├── CITATION.cff academic citation metadata ├── requirements.txt 4 external deps (msgpack, jellyfish, requests, bs4) ├── src/ │ ├── o1o.py ForgeSession class — main session handler │ ├── o1o_live.py interactive REPL (8955 LOC) │ ├── o1o_daemon.py background-daemon mode │ ├── boot.py V2 boot with auto-bridge injection │ ├── self_improve_runner.py overnight self-play runner │ ├── quick_test.py deterministic pipeline smoke test │ ├── compile_knowledge.py knowledge-graph compiler (.causal builder) │ ├── compile_single.py single-graph compiler │ ├── bulk_harvest.py bulk knowledge harvest driver │ ├── profile_turbo.py performance profiling harness │ └── core/ 107 core modules (the architecture) │ ├── intent_parser.py Layer 1 — natural language → structured intent │ ├── session_memory.py Layer 1 — multi-turn state │ ├── knowledge_engine.py Layer 2 — 7-pass causal inference │ ├── web_harvester.py Layer 2 — autonomous knowledge acquisition │ ├── auto_harvester.py Layer 2 — GitHub repository harvester │ ├── auto_bridge.py Layer 2 — auto bridge generation │ ├── color_types.py Layer 3 — 8-color algebraic type system │ ├── color_assembler.py Layer 3 — pipeline build via type matching │ ├── color_checker.py Layer 3 — pre-emission type validation │ ├── code_assembler.py Layer 3 — composition driver │ ├── fragment_registry.py Layer 3 — fragment metadata + wiring │ ├── executor.py Layer 4 — sandbox + 11 auto-fix strategies │ ├── verifier.py Layer 4 — Hypothesis-based property tests │ ├── formal_verifier.py Layer 4 — structural intent verification │ ├── property_verifier.py Layer 4 — 8 algebraic properties │ ├── symbolic_executor.py Layer 4 — path exploration + constraints │ ├── taint_analyzer.py Layer 4 — taint flow analysis │ ├── mathematical_engine.py Layer 4 — logic-consistency check │ ├── ast_engine.py Layer 4 — AST toolkit │ ├── detection_test.py Layer 5 — 46 detection signatures │ ├── semantic_evasion.py Layer 5 — 17 transform classes │ ├── mutation_engine.py Layer 5 — 5 mutation levels │ ├── payload_mutator.py Layer 5 — 6 AST mutation operators │ ├── edr_subverter.py Layer 5 — EDR-specific bypass │ ├── canary_detector.py Layer 5 — honey-token detection │ ├── anti_forensics.py Layer 5 — trace elimination │ ├── self_improve.py Layer 6 — main self-play loop │ ├── self_improvement_turbo.py Layer 6 — Monte-Carlo benchmark variant │ ├── learning.py Layer 6 — pattern learning + persistence │ ├── failure_memory.py Layer 6 — failure pattern memory │ ├── output_oracle.py Layer 6 — semantic output validation │ ├── native_engine.py Layer 7 — GCC / NASM / LIEF │ ├── polyglot_generator.py Layer 7 — 4 byte-level polyglot formats │ ├── platform_adapter.py Layer 7 — inline PE/ELF/Mach-O parsers │ ├── engage_intelligence.py Layer 8 — V2 adaptive solver │ ├── engage_v2.py Layer 8 — V2 engagement implementation │ ├── mitre_coverage.py Layer 9 — ATT&CK mapping + Navigator JSON │ ├── threat_model.py Layer 9 — per-tool threat model │ ├── deployment_guide.py Layer 9 — operational walkthrough │ ├── operations_db.py Layer 10 — encrypted persistent state │ ├── wifi_exploiter.py Layer 11 — WiFi attack surface │ ├── usb_exploiter.py Layer 11 — USB attack surface │ ├── vpn_exploiter.py Layer 11 — VPN attack surface │ ├── email_exploiter.py Layer 11 — Email / EWS / Graph │ ├── ml_exploiter.py Layer 11 — ML model exploitation │ └── ... 60+ more core modules covering specific subsystems ├── knowledge/ 132 binary .causal knowledge graphs ├── fragments/ 73 JSON files, 1245 code fragments ├── triplets/ 35 source triplet JSON files ├── tests/ 22 benchmark task lists + runners ├── docs/ (placeholder for in-depth subsystem docs) └── examples/ (placeholder for sanitized demo outputs) ## Reproducibility guarantees #### R.1 — Bit-exact output reproducibility The same intent on the same committed knowledge base produces a bit-identical generated file across processes, machines, and operating systems. **The chain of properties that produces this**: 1. **Knowledge graph load is deterministic.** `.causal` files are zlib-compressed msgpack blobs; both serializers produce canonical byte sequences for primitive types. The load order across the 132 files is sorted lexicographically. 2. **Inference is deterministic.** The 7-pass engine iterates over fixed sets in a fixed order. The `seen` deduplication set is built deterministically. Confidence arithmetic is performed in Python `float` (IEEE 754 double precision), which is bit-stable on all conformant platforms. 3. **Intent parsing is deterministic.** Regex matches in a fixed order, set intersections in a fixed order, no time-dependent or random-source branches. 4. **Composition is deterministic.** Color-edge resolution returns the first matching fragment from a sorted list; identical inputs yield identical fragment sequences. 5. **Code emission is deterministic.** Template variable resolution iterates over sorted parameter dicts; AST normalization is stable. The combined property: identical input ⇒ identical output, end to end, byte for byte. Validation script: # Run the same intent twice; outputs must be byte-identical python3 src/o1o_live.py --blind "list files in a directory" > /tmp/run1.py python3 src/o1o_live.py --blind "list files in a directory" > /tmp/run2.py diff /tmp/run1.py /tmp/run2.py # (no output — files are identical) #### R.2 — Provenance traceability Every emitted line of code traces back to a specific fragment, every fragment to a specific knowledge triplet, every triplet to a specific source — either a `.causal` file shipped in `knowledge/`, an explicit triplet in `triplets/*.json`, an inference pass with named parents, or a harvested triplet with a recorded source URL. The provenance chain is committed alongside every generated artifact: session/2026-06-26_171552/001_botnet_c2_server.../meta.json { "intent": "botnet C2 server with AES-encrypted command channel", "knowledge_chains_considered": 2, "knowledge_chain_chosen": [ {"trigger": "c2_server", "mechanism": "uses", "outcome": "encrypted_transport", "confidence": 0.92, "source": "offensive_security.causal"}, {"trigger": "encrypted_transport", "mechanism": "IMPLEMENTS", "outcome": "aes_cbc_transport", "confidence": 0.88, "source": "inference_pass1"}, ... ], "fragments_composed": [ {"key": "c2_server", "domain": "advanced_offensive_fragments.json", "loc": 142}, {"key": "aes_cbc_transport", "domain": "crypto_stdlib_fragments.json", "loc": 87}, ... ], "color_pipeline": ["VOID", "VOID"], "verification_passed": ["compile", "structural_intent", "property", "evasion"], ... } A regulator, an auditor, or a security researcher can take any emitted line of code, look up the contributing fragment in the registry, trace the contributing knowledge triplets to their sources, and reproduce the exact chain that produced the line. #### R.3 — Engagement-trace reproducibility (`/engage`) The same target with the same service profile produces the same engagement trace, bit for bit. The properties that ensure this: 1. **Tool selection is deterministic.** Same service set ⇒ same auto_configure mapping ⇒ same V2-priority order ⇒ same tool sequence. 2. **Tool generation is deterministic** (by R.1) ⇒ same intent ⇒ same emitted source. 3. **Adaptive retry is deterministic.** Failure classification is regex-based on stderr; mutation strategy per failure class is fixed; retry order is bounded and stable. 4. **Result classification is deterministic.** Keyword match on stdout returns the first match in a fixed order. 5. **Chain follow-up is deterministic.** Regex extraction returns matches in stable order; first $k$ matches become the chain. 6. **Lateral planning is deterministic.** Target model is a literal dict; `plan_lateral` iterates in stable order; first $k$ lateral candidates are tried. The property has operational consequences: an engagement that ran on a target at time $t_1$ can be replayed at time $t_2$ against the same (snapshot of the) target and produce an identical trace. This is what makes the architecture viable for **adversary emulation under regulatory audit** and **deterministic detection-rule validation** — replay identical operator actions and verify the detection rules catch them, every time. #### R.4 — Knowledge-base reproducibility The 132 shipped `.causal` files are byte-stable artifacts: their contents are committed to the repository, their inference closure is computable at boot in <200 ms, and the amplification at the shipped state is **+51 %** (46,442 explicit → 69,942 reachable triplets). The triplet sources in `triplets/*.json` are committed source-of-truth: the `.causal` files are derived from them via `src/compile_knowledge.py`. To verify the knowledge base is unmodified: sha256sum knowledge/*.causal > /tmp/causal-hashes.txt # compare against the shipped /tmp/causal-hashes.expected.txt diff /tmp/causal-hashes.txt /tmp/causal-hashes.expected.txt The `.causal` files can be regenerated from the source triplets via `compile_knowledge.py` — a regenerated set is bit-identical to the shipped set modulo timestamps embedded in the msgpack-extra metadata fields (which are excluded from the hash above). ## Status O1-O is operational software. It runs on a Mac mini, fully offline, sub-second per generated tool, deterministic end-to-end. Every count in this README is reproducible from the source committed in this repository. The architectural properties — determinism by construction, auditability to source, sovereignty through stdlib-only implementation, seven-stage pre-emission verification, autonomous engagement at deterministic operator-IS-software level — are the deployment surface. Regulated industries (PCI-DSS, FDA, FedRAMP), sovereign-cloud environments, classified networks, security tooling that must itself be trustworthy: these are the operational contexts in which the architecture's guarantees become the primary value. ## Dual use The system synthesizes defensive tools (Sigma rule generators, YARA compilers, log parsers, forensic timeline builders, detection-rule writers) and offensive tools (port scanners, credential harvesters, post-exploit enumerators, lateral-movement chains) through the same pipeline, from the same fragments, with the same verification. The deterministic substrate, the bit-exact reproducibility, the auditable provenance, and the operator-IS-software engagement layer are the same on both sides. Reproducible adversary emulation with Navigator-mapped coverage on the red side; reproducible threat-model generation, deterministic detection-rule writing, and MITRE-aligned engagement testing on the blue side. The architecture is symmetric. Use the system for authorized purposes. You are responsible for what you generate. ## Citation @misc{foss2026o1o, author = {Foss, David Tom}, title = {{O1-O: A Deterministic Code Synthesis Operator with an Algebraic Type System, a 7-Pass Causal Knowledge Inference Engine, and an Autonomous Engagement Pipeline}}, year = {2026}, note = {Public research disclosure (prior art). Composes working programs from natural-language intent by type-matched edge lookup in a closed-set 8-color algebraic type system over 1,245 verified fragments. Seven-pass deterministic causal-graph inference, seven- layer pre-emission verification, four parallel self-improvement closed-loops, autonomous nine-phase engagement pipeline. Zero LLM calls in the generation path. Air-gap capable. github.com/DT-Foss/O1-O}, } ## Contact David Tom Foss — `dtfoss-dev@proton.me` ## Part of the O1 stack - **GSSM** — [github.com/DT-Foss/gssm](https://github.com/DT-Foss/gssm) — the bounded reproducing-kernel SSM mathematical core. - **O1** — [github.com/DT-Foss/O1](https://github.com/DT-Foss/O1) — the living-stream architecture with runtime knowledge retrieval and a measured capacity threshold. - **O1-O** — *this repository* — the deterministic code synthesis operator. - **fabel / .causal** — [github.com/dotcausal/dotcausal](https://github.com/dotcausal/dotcausal) · [dotcausal.com](https://dotcausal.com) — the underlying causal knowledge engine.
标签:人工智能, 代码生成, 渗透测试工具, 用户模式Hook绕过, 离线计算, 类型系统, 网络调试, 自动化, 请求拦截, 逆向工具