trynullsec/nullsec-s1
GitHub: trynullsec/nullsec-s1
Stars: 209 | Forks: 52
(security-tuned model: detect · classify · explain · patch)"] B -->|raw output| C["Structured JSON verdict
(verdict schema)"] C --> D["Security Alignment Layer
parse · schema-validate · normalize severities"] D --> E["Nullsec Safety Layer
deterministic enforcement R1–R6"] E --> F["Enforced verdict
(production_ready computed, never trusted from the model)"] F --> G["Patch · Report · CI gate · API response"] Plain-text view of the same flow: AI-generated app / repo / PR / MCP tool / wallet flow │ ▼ Nullsec S1 reasoning pipeline (security-tuned model: detect · classify · explain · patch) │ raw output ▼ structured JSON verdict (data/schemas/verdict.schema.json) │ ▼ Security Alignment Layer (parse · schema-validate · type-check · normalize severities) │ structurally-valid verdict ▼ Nullsec Safety Layer (deterministic enforcement: rules R1–R6, severity/risk flooring) │ ▼ enforced verdict (production_ready recomputed, never trusted from the model) │ ▼ patch · report · CI gate · API response The model's own `production_ready` claim is **advisory only**. The Safety Layer recomputes it and allows `true` only when all **eight** check dimensions pass with no HIGH/CRITICAL finding: `auth · secrets · input_validation · rate_limits · permissions · dangerous_exec · dependency_risk · environment_exposure` Prompt and schema details: [`docs/PROMPT_FORMAT.md`](docs/PROMPT_FORMAT.md). Full design: [`docs/SYSTEM_OVERVIEW.md`](docs/SYSTEM_OVERVIEW.md). ## Core system components | Path | What it is | |------|------------| | [`corpus/`](corpus/) | Curated training corpus — the single source of truth (`authored/` + opt-in `ingested/` + `synthetic/`). | | [`taxonomy/`](taxonomy/) | The 16-category security taxonomy mapped to 8 check dimensions (`taxonomy.json`). | | [`nullsec/safety/`](nullsec/safety/) | The Security Alignment Layer (`alignment.py`) + Nullsec Safety Layer (`enforcement.py`). | | [`nullsec/core/`](nullsec/core/) | Reasoning pipeline (`engine.py`), verdict models, canonical prompts, version/fingerprint. | | [`nullsec/ingest/`](nullsec/ingest/) | CVE/NVD, Semgrep, SARIF/CodeQL ingestion into the verdict contract. | | [`training/`](training/) | Dataset prep, QLoRA training, corpus validation, release threshold, preflight. | | [`benchmarks/`](benchmarks/) | Evaluation runners + adversarial Safety Layer probes. | | [`scripts/validate_claims.py`](scripts/validate_claims.py) | Public claim validator — the honesty gate. | | [`scripts/release_candidate.py`](scripts/release_candidate.py) | Release gate — builds a bundle only from real artifacts. | | [`serving/`](serving/) | FastAPI serving layer (`/v1/model`, `/v1/analyze`, `/v1/patch`, streaming). | | [`cli/`](cli/) | `nullsec1` command-line analyzer + CI gate. | | [`reports/`](reports/) | Corpus curation sprint reports (auditable provenance). | | [`docs/`](docs/) | Technical documentation (system overview, safety layer, corpus, roadmap, non-claims). | ## What is live now vs coming next Live now: - source repo - GitHub Release artifact - Hugging Face PEFT adapter - `inference.py` - benchmark suite - baseline comparison scripts - [`docs/EVALS.md`](docs/EVALS.md) Coming next: - hosted scanner at `s1.trynullsec.com` - API backend - GitHub Action / PR guard - CLI hardening - larger benchmark suite - more framework coverage ## Current verified state The corpus exceeds the v1.0 and RC2/v1.1 data thresholds, the deterministic Safety Layer is enforced, and the trained RC2/v1.1 release artifacts are published as GitHub Release assets rather than committed to source. This snapshot reflects the artifacts on disk right now. Every number below is produced by a command in this repo — none are hand-entered. Run the commands in [Quickstart](#quickstart) to reproduce them. | Fact | Value | Source command | |------|-------|----------------| | Curated corpus | **1,741** examples (1,304 hand-authored + 437 curated-ingested) | `training/dataset_stats.py --include-ingested` | | Train / eval split | **1,393 train / 348 eval** (eval_frac 0.2, seed 42) | `training/prepare_dataset.py --include-ingested` | | Taxonomy categories | **16** categories → 8 check dimensions | `taxonomy/taxonomy.json` | | Per-category coverage | every category **≥ 60** curated | `training/release_threshold.py --include-ingested --profile rc2` | | Safety Layer consistency | **100%** (1,741 / 1,741) | `training/dataset_stats.py --include-ingested` | | Benchmark suite | **111** labeled cases across all 16 categories | `benchmarks/datasets/detection.json` | | Adversarial safety probes | **8 / 8 blocked**, 0 bypassed | `python -m benchmarks.safety_probes` | | Test suite | passing | `pytest -q` | | Release threshold (v1.0) | **PASS** | `training/release_threshold.py --include-ingested` | | Release threshold (v1.1 / RC2) | **PASS** | `training/release_threshold.py --include-ingested --profile rc2` | The honesty gate (`scripts/validate_claims.py --check`) ties public wording to local artifacts. To reproduce the release-asset claim state, unpack the GitHub Release bundle locally and run the [Quick Verification](#quick-verification) command. ## The Security Alignment Layer The deterministic layer is the reason Nullsec S1 is a security *system* rather than a code model that emits opinions. It runs in two stages. **Stage 1 — Security Alignment Layer** (`nullsec/safety/alignment.py`): extract the JSON object (tolerant of code fences, preamble, and trailing prose), validate it against the verdict schema, type-check it into the `Verdict` model, and normalize finding severities *up* to each category's taxonomy floor. Anything that cannot be aligned raises `VerdictParseError` instead of being guessed at. **Stage 2 — Nullsec Safety Layer** (`nullsec/safety/enforcement.py`): take the structurally-valid verdict and deny `production_ready` if **any** of these hold: | Rule | `production_ready` is denied when… | |------|------------------------------------| | R1 | any required dimension is `not_checked` | | R2 | any required dimension is `fail` | | R3 | any finding is HIGH or CRITICAL | | R4 | `risk_score` exceeds the production threshold (default 20) | | R5 | a finding contradicts a dimension reported as `pass` | | R6 | overall severity is HIGH or CRITICAL | It also **raises (never lowers)** severity and `risk_score` to match the worst finding, so the model cannot under-report. Because enforcement is deterministic and independent of the model, an attacker who manipulates the model — e.g. via prompt injection embedded in the code under review — still cannot obtain a false `production_ready: true`. This is verified by adversarial probes in [`benchmarks/safety_probes.py`](benchmarks/safety_probes.py), including a prompt-injection-in-prose probe. Deep dive: [`docs/SECURITY_ALIGNMENT_LAYER.md`](docs/SECURITY_ALIGNMENT_LAYER.md). ## Quickstart Local CPU machines can verify the corpus, the deterministic layers, and the safety probes — no GPU required. python3.11 -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip setuptools wheel python -m pip install -e ".[dev]" python training/prepare_dataset.py --include-ingested --out data/processed pytest -q python training/validate_corpus.py --include-ingested python training/release_threshold.py --include-ingested python scripts/validate_claims.py --check Inspect model identity and the reproducible fingerprint at any time: python -m nullsec.core.version ## Corpus status `corpus/` is the single source of truth for training data. The current curated corpus is **1,741 examples** (1,304 hand-authored + 437 curated-ingested), every taxonomy category has **≥ 60** curated examples, and **Safety Layer consistency is 100%** — so both the v1.0 and RC2/v1.1 data thresholds pass. Provenance is tracked explicitly and never blurred: - `hand_authored` — original examples written for this repo (counts as curated). - `curated_ingested` — CVE / scanner / real-failure records that passed human review and source-provenance enforcement (counts as curated). - `synthetic_variant` — labeled, structure-preserving augmentations; **never** counts toward curated thresholds. Raw and rejected candidates are tracked separately and are never training-eligible. The curation workflow, schema, and provenance rules are documented in [`docs/CORPUS.md`](docs/CORPUS.md), with auditable sprint reports in [`reports/`](reports/). ## Training workflow The training targets are built from the corpus through the **same** alignment + safety layers used at serving time, so no malformed or gate-inconsistent verdict ever enters training. # 1. build chat-formatted train/eval JSONL from the curated corpus python training/prepare_dataset.py --include-ingested --out data/processed # 2. confirm the corpus is genuinely v1.0-ready (exits non-zero until it is) python training/release_threshold.py --include-ingested # 3. (on a GPU box) preflight, then train the QLoRA adapter python training/preflight_train.py python training/train_qlora.py --config training/config.yaml The release adapter was trained with QLoRA on `Qwen/Qwen2.5-Coder-7B-Instruct` (Apache 2.0). Training instructions remain in [`RELEASE_TRAINING.md`](RELEASE_TRAINING.md), [`RUNPOD.md`](RUNPOD.md), and [`GPU_QUICKSTART.md`](GPU_QUICKSTART.md). ## Training on GPU Local CPU machines can verify the corpus and the safety layer, but **cannot realistically train the model**. QLoRA training requires a CUDA-capable NVIDIA GPU. The end-to-end pipeline (prepare → preflight → train → merge → benchmark → release → validate) is wrapped in one script: bash scripts/run_training_pipeline.sh A complete, beginner-friendly walkthrough — choosing a GPU box, disk requirements, environment setup, expected artifacts, and how to collect outputs — is in **[`GPU_QUICKSTART.md`](GPU_QUICKSTART.md)**. `training/preflight_train.py` checks the GPU stack before you spend money: it **exits `2` when no CUDA GPU is available** (the expected result on a laptop), so you never start a doomed run. ## Benchmark workflow for reproduction / development The benchmark suite measures detection accuracy, false-safe rate, hallucination rate, OWASP coverage, patch correctness (structural), and a secure-generation score. It reports numbers only from real runs. The RC2/v1.1 real-model report ships as a GitHub Release asset under `v1.0.0-rc25`; generated benchmark reports are not committed to source. # against the live model (GPU): python benchmarks/run_all.py --mode model --adapter outputs/nullsec-s1-qlora # against captured real outputs (no GPU); reports are marked replay-only: python benchmarks/run_all.py --mode replay --replay path/to/captured.jsonl A case with no model output is recorded as a real miss, never a synthetic pass. In a source-only checkout, artifact-gated claims remain unavailable until the trained adapter and release report are downloaded/unpacked locally. ## Release pipeline for reproduction / development The release pipeline is how maintainers reproduce a release bundle from real local artifacts: python scripts/release_candidate.py --adapter outputs/nullsec-s1-qlora --dataset detection.json python scripts/validate_claims.py --adapter outputs/nullsec-s1-qlora \ --report releases/nullsec-1.0/benchmark/SUITE.json --check `release_candidate.py` aborts (writing nothing) if the adapter is missing, the model fails to load, no outputs are produced, any report section is empty, or any Safety Layer probe is bypassed. The published RC2/v1.1 artifact already passed this path; running it again is a reproducibility workflow. The full path is documented in [`RELEASE_TRAINING.md`](RELEASE_TRAINING.md). ## Repo structure README.md you are here GPU_QUICKSTART.md beginner-friendly GPU training walkthrough RELEASE_TRAINING.md training-to-release runbook CONTRIBUTING.md how to contribute (corpus, taxonomy, probes, docs) SECURITY.md vulnerability reporting & responsible disclosure model_card/ Nullsec-1 model card (identity, intended use, limits) taxonomy/ 16-category security taxonomy — single source of truth corpus/ curated training corpus (authored/ + ingested/ + synthetic/) data/ verdict schema (data/schemas) + processed datasets training/ prepare_dataset · train_qlora · merge_adapter · validate_corpus · release_threshold · preflight_train · config.yaml nullsec/ core/ reasoning pipeline, verdict models, prompts, version/fingerprint safety/ Security Alignment Layer + Nullsec Safety Layer ingest/ CVE/NVD, Semgrep, SARIF/CodeQL ingestion serving/ FastAPI serving layer benchmarks/ benchmark suite + adversarial Safety Layer probes scripts/ release_candidate.py · validate_claims.py · run_training_pipeline.sh examples/ worked vulnerable cases + expected verdicts releases/ generated release bundles (real artifacts only; ships empty) cli/ nullsec1 CLI analyzer + CI gate tests/ deterministic-layer test suite (no GPU) docs/ architecture · system overview · safety layer · corpus · roadmap .github/ CI security gate · issue templates · PR template ## Honest scope Results are benchmark-scoped to the Nullsec RC2/v1.1 111-case benchmark. Nullsec-S1 is not a replacement for human security review. A clean verdict reduces risk; it does not prove the absence of vulnerabilities. ## Security Please report vulnerabilities responsibly and **never submit real secrets** — use placeholders for any credential in examples or reports. See [`SECURITY.md`](SECURITY.md). ## License Apache 2.0 — matching the `Qwen2.5-Coder` base model. See the license note in [`model_card/NULLSEC1.md`](model_card/NULLSEC1.md).