nityansh10/nexus-shield-threat-intelligence

GitHub: nityansh10/nexus-shield-threat-intelligence

Stars: 0 | Forks: 0

# NEXUS-SHIELD // Automated Threat Co-Pilot A hybrid AI cybersecurity incident-triage console combining a **fine-tuned Small Language Model (SLM)** adapter layer with a **Retrieval-Augmented Generation (RAG)** NIST policy engine. The system ingests raw, unstructured network telemetry and audit logs and emits structured JSON incident reports with classified threat vectors and recommended operational postures. ## Architecture Overview ┌───────────────────────────────────────────────────────────┐ │ RAW TELEMETRY INPUT │ │ (SSH logs, EDR alerts, DLP events, SIEM correlations) │ └────────────────────────┬──────────────────────────────────┘ │ ┌───────────▼────────────┐ │ DUAL-ENGINE PIPELINE │ └───────────┬────────────┘ │ ┌──────────────┴──────────────────┐ │ │ ┌───────▼────────┐ ┌─────────▼────────┐ │ FINE-TUNED │ │ RAG POLICY │ │ SLM ADAPTER │ │ ENGINE (NIST) │ │ │ │ │ │ Behavioral │ │ Rule retrieval │ │ pattern │◄─ AUGMENTS ──│ for domain- │ │ extraction │ │ specific overrides│ │ (400+ rows) │ │ (NIST 901, etc.) │ └───────┬────────┘ └──────────────────┘ │ ┌───────▼──────────────────────────────────────────────┐ │ STRUCTURED INCIDENT REPORT │ │ { vector_class, target_infrastructure, │ │ operational_posture } │ └──────────────────────────────────────────────────────┘ ### Engine 1 — Fine-Tuned SLM Adapter Weights The behavioral classification layer is trained on **401 labeled cybersecurity incidents** (`nexus_data.jsonl`). Each example follows the instruction-tuning format: { "instruction": "Transform unstructured cyber logs into raw, machine-readable Security Incident JSON format.", "input": "", "output": "{\"incident_report\": {\"vector_class\": \"...\", \"target_infrastructure\": \"...\", \"base_posture\": \"...\"}}" } The model learns to extract three fields from noisy, heterogeneous log formats: | Field | Description | |---|---| | `vector_class` | Attack category (see table below) | | `target_infrastructure` | Affected system identifier | | `base_posture` | Recommended initial response posture | ### Engine 2 — RAG Policy Engine (NIST Compliance) The NIST RAG database is a structured in-memory retrieval store of compliance rules. On each inference pass, the pipeline checks whether the classified threat falls within any governed domain (e.g., financial systems, payroll infrastructure). If a NIST rule matches, its prescribed response **overrides** the base posture from the fine-tuned layer. const NIST_RAG_DATABASE = { "financial_override": "NIST REGULATION 901 MATCHED // CRITICAL DOMAIN // FORCE COMPLIANCE VALUE TO: 'CRITICAL_CREDENTIAL_REVOCATION_REQUIRED'." }; This models the real-world pattern of **RAG-augmented generation**: the retriever fetches the most relevant policy chunk, which is then injected into the output generation context to enforce compliance-critical behaviour. ## Threat Classification Reference ### Vector Classes | `vector_class` | Description | Example Signal | |---|---|---| | `BRUTE_FORCE_ATTEMPT` | Repeated authentication failures, spray patterns | `Failed password for invalid user`, Kerberos pre-auth failures | | `DATA_EXFILTRATION` | Anomalous outbound data volume or DLP alerts | `412GB outbound`, bulk ledger sync to external repo | | `ENDPOINT_COMPROMISE` | Local daemon manipulation, registry hooking, LSASS dump | `mimikatz`, `rclone` spawned with root, binary hash mismatch | | `RANSOMWARE_DEPLOYMENT` | File encryption entropy spikes, VSS deletions | `.locked`/`.crypto` renaming, `README_DECRYPT.txt` generation | | `INSIDER_THREAT` | Off-hours privileged access, removable media dumps | Credential hijacking, off-shift SIEM correlation | ### Operational Postures | `base_posture` / `operational_posture` | Severity | Action | |---|---|---| | `CONTAINMENT_MODE` | Medium | Isolate affected process; monitor lateral movement | | `ISOLATION_POSTURE` | High | Network-segment the node; suspend outbound routing | | `CREDENTIAL_REVOCATION` | High | Invalidate session tokens; force re-authentication | | `CRITICAL_CREDENTIAL_REVOCATION_REQUIRED` | **CRITICAL** | Immediate full credential revocation; RAG rule enforced | ## Training Data **File:** `nexus_data.jsonl` **Format:** JSON Lines (one JSON object per line) **Size:** 401 examples **Coverage:** 5 balanced threat categories across diverse infrastructure nomenclature The dataset uses realistic infrastructure names (`finance-payroll-02`, `prod-ledger-primary`, `corp-cluster-replica`) and log formats drawn from: - Linux `sshd` / PAM authentication logs - Windows Security Event ID 4625 / 4648 - Kernel network hooks and firewall alerts - EDR / SIEM correlation events - VSS and File Integrity Monitor alerts - DLP agent alerts - Kerberos pre-authentication failures To fine-tune a real SLM (e.g., Phi-3 Mini, Mistral 7B, or Llama 3.2 3B) on this dataset, convert to your framework's chat format and use LoRA / QLoRA adapter training. ## Tech Stack | Layer | Technology | |---|---| | UI Framework | React 18 | | Build Tool | Vite 5 | | Styling | Inline CSS (monochrome cyber aesthetic, `#050811` / `#00f0ff`) | | Fine-Tuning Data | JSONL instruction-tuning format | | RAG Store | In-memory JS object (extensible to vector DB) | | Error Handling | React Error Boundary (`App.jsx`) | ## Getting Started ### Prerequisites - Node.js >= 18 - npm >= 9 ### Install npm install ### Development npm run dev Opens at `http://localhost:3000` with hot module replacement. ### Production Build npm run build npm run preview ## Project Structure nexus-finetuning-rag/ ├── index.js # NexusShieldConsole React component ├── nexus_data.jsonl # 401-row instruction-tuning dataset ├── index.html # Vite HTML entry point ├── vite.config.js # Vite configuration ├── package.json ├── .gitignore ├── src/ │ ├── main.jsx # React DOM root mount │ └── App.jsx # Layout wrapper + ErrorBoundary + global styles └── README.md ## Using the Console 1. **Live Ingestion Feed (left panel)** — streams pre-loaded historical log events to simulate a real-time telemetry feed. 2. **Compliance Analysis Gateway (center panel)** — paste any raw log string into the textarea and click **Run Hybrid Inference Pipeline**. 3. **Matrix Compiler View (right panel)** — displays the structured JSON output. The risk posture badge turns **CRITICAL** (red) when the RAG engine fires a compliance override. ### Example Inputs **Brute Force:** Auth failure: sshd[32258]: Failed password for invalid user admin from 192.168.97.128 port 443 ssh2. Continuous retry count=22. **Data Exfiltration:** Kernel network hook captured unexpected outbound connection from db-cluster-primary to remote IP 10.84.2.144. Outbound volume 412GB exceeds baseline operational metrics. **RAG Override (Financial → CRITICAL):** File Integrity Monitor alert on finance-payroll-02: high-frequency encryption behavior detected in ledger directories. ## Extending the Architecture ### Replacing the Simulated SLM with a Real Model The `executeDualEnginePipeline` function in `index.js` contains the simulated behavioral pattern matching. To replace with a real inference call: // Replace the keyword-matching block with: const response = await fetch('/api/classify', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ log: inputLog }), }); const { vector_class, target_infrastructure, base_posture } = await response.json(); Then deploy your fine-tuned SLM behind `/api/classify` using vLLM, Ollama, or a managed inference endpoint. ### Scaling the RAG Store Replace `NIST_RAG_DATABASE` with a vector similarity search: // Embed the input log and retrieve top-k NIST policy chunks const topChunks = await vectorStore.similaritySearch(inputLog, { k: 3 }); const ragContext = topChunks.map(c => c.pageContent).join('\n'); Suitable vector stores: Chroma, Pinecone, Weaviate, pgvector. ## License MIT
标签:自定义脚本