ahammadshawki8/DeepSIFT
GitHub: ahammadshawki8/DeepSIFT
一款运行在 SANS SIFT Workstation 上的零幻觉 DFIR 自主代理,通过 MCP 中间件将 148 种取证工具输出结构化并接入 LLM,实现可验证的数字取证分析。
Stars: 0 | Forks: 1
# DeepSIFT
**AI-Driven Forensic Investigation for SANS SIFT Workstation**
DeepSIFT is a Model Context Protocol (MCP) middleware layer that turns Claude into a
zero-hallucination digital forensics analyst. Instead of letting an LLM guess at raw CLI
output, DeepSIFT parses every SIFT tool response into structured JSON, injects per-tool
forensic discipline (caveats, advisories, corroboration hints), enriches findings with
MITRE ATT&CK tags and RAG-backed threat intelligence, and enforces chain-of-custody audit
logging before the LLM ever sees a single byte of evidence.
**148 typed forensic MCP tools (+ environment preflight self-check) · 23 tool modules · 15 parser modules · Per-tool RAG enrichment · Post-hoc grounding verification · 4-axis quantified confidence scoring · 3,700+ Sigma rules via Hayabusa · 6-type contradiction detection · case-agnostic benchmark runner · zero-dependency Examiner Portal**
### 🧑⚖️ For judges (and judging agents)
- **Start here:** [`AGENTS.md`](AGENTS.md) (agent orientation, entry points, 60-second run) and
[`docs/JUDGING.md`](docs/JUDGING.md) (every Stage-2 criterion → exact code + how to verify).
- **Measured, not asserted:** ROCBA **4/4** and FOR500 "Abducted Zebrafish" **4/4** vs Protocol SIFT,
**0 hallucinations, 100 % claim grounding** — scored by `benchmark/scorer.py` against published
ground truth.
- **See real output without running anything:** [`docs/sample/`](docs/sample/) holds committed
example outputs for both cases — grounded `findings.json` + a rendered Examiner report (verdict,
hypothesis ledger, evidence grounding, full chain of custody).
- **Don't trust the score — verify the evidence:** `python3 verify_findings.py` independently
re-checks every claim against the cited raw tool output and recomputes the audit hash chain. The
ground-truth files are *derived from the organizer case scenario* (see each file's `_provenance`),
so trust rests on **reproducible grounding**, not our number.
- **Verify in minutes (no API key):** `python3 preflight.py` · `pytest -q` (74 pass) ·
`python3 examiner_portal.py` (review UI + live audit-chain integrity).
- **Drive it as an agent:** connect Claude Code to the MCP server (`.mcp.json`) and ask it to
investigate `/mnt/evidence` — disk-only is a first-class autonomous case.
## Table of Contents
1. [Why DeepSIFT](#why-deepsift)
2. [Architecture](#architecture)
3. [How DeepSIFT Eliminates Hallucination](#how-deepsift-eliminates-hallucination)
4. [Tool Inventory (155 MCP tools)](#tool-inventory-155-mcp-tools)
5. [Investigation Workflow](#investigation-workflow)
6. [What Sets DeepSIFT Apart](#what-sets-deepsift-apart)
7. [How to Run It (three ways)](#how-to-run-it-three-ways)
8. [Examiner Portal — Human Review](#examiner-portal--human-review)
9. [Architectural Guardrails](#architectural-guardrails)
10. [Validated Results](#validated-results)
- [ROCBA — FOR508 (memory + disk)](#rocba--for508-memory--disk)
- [Abducted Zebrafish / Vanko — FOR500 (disk-only)](#abducted-zebrafish--vanko--for500-disk-only)
- [Production Hardening](#production-hardening)
11. [Setup & Installation](#setup--installation)
12. [Verify It Yourself (no API key)](#verify-it-yourself-no-api-key)
13. [Evidence Integrity & Chain of Custody](#evidence-integrity--chain-of-custody)
14. [RAG Knowledge Base](#rag-knowledge-base)
15. [Benchmarking](#benchmarking)
16. [Project Structure](#project-structure)
17. [MITRE ATT&CK Coverage](#mitre-attck-coverage)
18. [Environment Variables](#environment-variables)
19. [License](#license)
## Why DeepSIFT
Protocol SIFT (the prompt-only baseline) passes raw CLI output directly into LLM context,
relies on natural-language safety rules, and has no structured parsing. This creates the
failure modes that DeepSIFT eliminates architecturally:
| Problem | Protocol SIFT | DeepSIFT |
|---|---|---|
| Raw CLI output → hallucination | Volatility/log2timeline text enters context unparsed | Python parsers produce typed JSON — raw text never reaches the LLM |
| Safety via prompt → bypassable | "Do not write to /cases/" is a suggestion | `guard_output_path()` raises `PermissionError` at OS level |
| No context → generic analysis | LLM has no threat intel during tool execution | ChromaDB RAG + MITRE ATT&CK injected into every tool response |
| Unverifiable LLM claims | No grounding check — analyst must manually verify | `verify_findings` checks every claim token against raw export bytes |
| Qualitative confidence | "high/low" with no definition | 4-axis 0-100 score: Tool Reliability + Corroboration + IOC Specificity + MITRE Accuracy |
| No Sigma rule coverage | Raw event log text to LLM | Hayabusa 3,700+ Sigma rules → structured MITRE-tagged alerts |
| Contradictions ignored | No cross-artifact consistency check | `detect_contradictions` finds 6 contradiction types (DKOM, ghost PIDs, log wipes, etc.) |
## Architecture
flowchart TD
A["Claude Code\n(LLM Agent)"] -->|"Typed MCP calls only\nno generic shell"| B
B["DeepSIFT MCP Server\nmcp_server/server.py"]
B -->|"Structured JSON only\nnever raw text"| A
B --> C["Tool Modules\n23 modules · 148 typed functions"]
C --> D["SIFT Tools\nVolatility · log2timeline · Sleuthkit\nEZ Tools · YARA · Hayabusa\nbulk_extractor · capa · FLOSS · exiftool"]
D -->|"raw output"| E["Middleware Parsers\npslist · netscan · malfind · timeline\nbrowser · cloud · document · network_log\nlinux · mitre_auto_map · rag_enrichment"]
E -->|"structured dict"| F["Forensic Knowledge Envelope\ncaveats · advisories · corroboration"]
F -->|"enriched JSON"| B
B --> G["RAG Pipeline\nChromaDB + sentence-transformers"]
G --> H["Knowledge Sources (case-agnostic)\nMITRE ATT&CK · LOLBAS · Hunt Evil baseline\n+ opt-in per-case IOCs / threat intel"]
H --> G
B --> I["Audit Logger\naudit_id · SHA-256 · forensic_audit.log"]
I --> J["exports/\nRaw tool output SHA-256 indexed\nanalysis/forensic_audit.log"]
## Tool Inventory (155 MCP tools)
DeepSIFT exposes **155 MCP tools**: **148 typed forensic tools** across 18 categories, plus
**7 control/utility tools** (preflight self-check, the hypothesis-ledger trio, and the
evidence-index trio). No `run_shell`, no `execute_command` — every tool has a typed signature,
a middleware parser, and returns RAG-enriched structured JSON. Run `python3 preflight.py` first
to see which tool groups are operational in your environment; a tool whose backing binary is
missing returns a clear "unavailable" status with an install hint instead of crashing the
investigation.
### Memory Forensics — Core (Volatility 3)
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `get_process_list` | EPROCESS walk; SANS Hunt Evil baseline comparison | `suspicious`, `anomaly_details`, `mitre_techniques` |
| `scan_hidden_processes` | pslist vs psscan diff → DKOM detection (T1014) | `hidden_processes`, `dkom_suspected` |
| `find_injected_code` | malfind with injection type classification | `risk_level`, `injection_type`, `mitre_techniques` |
| `get_running_services` | svcscan with suspicious binary path detection (T1543.003) | `suspicious_services` |
| `get_network_connections` | netscan with external IP flagging + MITRE tags | `external_connections`, `mitre_techniques` |
| `get_command_history` | cmdline with suspicious pattern detection | `suspicious_cmdlines`, `mitre_techniques` |
| `get_loaded_dlls` | DLL listing for a specific PID | `dlls`, `unsigned_count` |
| `get_registry_hives` | List hives in memory image | `hives` |
| `get_registry_key` | Read a specific registry key from memory | `key`, `values` |
### Memory Forensics — Extended (Volatility 3)
| Tool | Purpose | Key Forensic Value |
|---|---|---|
| `get_privileges` | Token privilege enumeration per PID | SeDebugPrivilege on non-system process = T1134 |
| `get_mutexes` | Mutex object scan (mutantscan) | Malware-family mutex fingerprinting |
| `get_env_vars` | Process environment block variables | PATH hijacking, unusual TEMP locations |
| `get_vad_info` | Virtual Address Descriptor tree | Private RWX non-file-backed regions = injection staging |
| `get_ldrmodules` | Compare InLoad / InMem / InInit PEB lists | DLLs absent from all three = reflective injection (T1055.001) |
| `get_ssdt` | System Service Descriptor Table hooks | Non-ntoskrnl hooks = rootkit (T1014) |
| `get_callbacks` | Kernel callback registrations | Unknown driver callbacks = rootkit |
| `get_filescan` | FILE_OBJECT pool scan | Open handles to files not visible in process DLL list |
| `get_timeliner` | Memory-resident timestamp timeline | Process / DLL / registry chronology |
| `get_devicetree` | Kernel device tree | Hidden filter drivers, rootkit stack position |
### Timeline Analysis (log2timeline / Plaso)
| Tool | Purpose |
|---|---|
| `create_super_timeline` | Build a Plaso super-timeline from a disk image (long-running) |
| `filter_timeline` | Extract events for a specific time window; highlights suspicious keywords |
| `get_browser_history` | Extract WEBHIST events (URLs, downloads, searches) from timeline |
### Disk Forensics (Sleuth Kit)
| Tool | Purpose |
|---|---|
| `get_partition_table` | Read partition layout; returns sector offsets for follow-up calls |
| `get_file_listing` | Recursive file listing with deleted-file flags |
| `extract_file` | Extract file by inode number to `exports/` |
| `search_deleted_files` | List only deleted/unallocated entries |
### Windows Artifact Analysis (EZ Tools)
| Tool | Source Artifact | Key Evidence |
|---|---|---|
| `parse_event_logs` | .evtx via EvtxECmd | Logon, service install, task create, PS script blocks, WMI, RDP |
| `parse_shimcache` | SYSTEM hive via AppCompatCacheParser | Executable existence (proves file was on disk) |
| `parse_amcache` | Amcache.hve via AmcacheParser | Execution evidence + SHA1 hash per executable |
| `parse_prefetch` | C:\Windows\Prefetch via PECmd | Execution history with last 8 run times |
| `parse_mft` | $MFT via MFTECmd | Full file-system timeline; detects timestamp anomalies |
| `parse_lnk_files` | Recent Items via LECmd | Recently accessed file paths with timestamps |
| `parse_jump_lists` | AutomaticDestinations via JLECmd | Application-specific recent file access |
| `parse_registry_hive` | Any hive via RECmd | Raw key/value search with pattern matching |
| `parse_recycle_bin` | $Recycle.Bin via RBCmd | Deleted file recovery with original paths |
| `parse_srum` | SRUDB.dat via SrumECmd | Network bytes sent/received per application (exfil quantification) |
| `parse_usn_journal` | $UsnJrnl:$J via MFTECmd | File system change journal; burst deletion detection |
| `lookup_ip_reputation` | AbuseIPDB + VirusTotal APIs | Confidence score, country, ISP, VT malicious count |
### Windows Event Log — Hayabusa / Sigma
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_hayabusa` | Apply 3,700+ community Sigma rules to .evtx directory | `alerts`, `critical_count`, `mitre_techniques` |
| `list_hayabusa_rules` | Show available Hayabusa rule profiles | `profiles`, `rule_count` |
### Static File Analysis
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `get_pe_metadata` | PE header, sections, imports, compile timestamp, entropy | `high_entropy_sections`, `suspicious_imports`, `timestamp_anomaly` |
| `extract_strings` | String extraction + IOC pattern scan (IPs, URLs, base64, registry) | `iocs_found`, `ioc_summary` |
| `detect_packer` | Entropy analysis + UPX/MPRESS/Themida signature detection | `verdict`, `overall_entropy`, `packer_signatures_found` |
### Network Traffic Analysis
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_pcap_summary` | TShark PCAP summary — top talkers, exfil signals | `large_transfers`, `external_conversations` |
| `extract_dns_queries` | DNS extraction — DGA detection, beaconing, DNS tunneling | `suspicious_domains`, `beaconing_candidates` |
| `parse_arp_cache` | Volatility netstat as host enumeration proxy | `unique_hosts_seen`, `hosts` |
### Cross-Artifact Correlation
| Tool | Purpose |
|---|---|
| `correlate_artifacts` | Join findings across memory/disk/network/registry by PID, path, IP, user |
| `adversarial_review` | Challenge current hypothesis with counter-arguments before `finish_analysis` |
| `detect_contradictions` | Find UNRESOLVED_CONTRADICTION findings: DKOM, ghost PIDs, log wipes, hidden services |
### Investigation Control & Autonomous Reasoning
| Tool | Purpose |
|---|---|
| `record_hypothesis` | Record an explicit, falsifiable hypothesis before testing it (returns `H1`, `H2`, …) |
| `update_hypothesis` | Confirm / disprove / mark-inconclusive a hypothesis with confidence + evidence `audit_ids` (captures self-correction) |
| `get_investigation_state` | Review the live hypothesis ledger + summary (confirmed/disproved/self-corrections) |
| `verify_findings` | Verbatim token grounding check — every claim vs raw export bytes (run before `finish_analysis`) |
| `finish_analysis` | Structured report with grounding score, 4-axis confidence score, hypothesis ledger, `audit_ids` citation |
### Scale, Health & Self-Verification
| Tool | Purpose |
|---|---|
| `index_evidence` | Ingest the **full** artifact rows (EZ tools' `exports/*.csv`) into a stdlib SQLite store |
| `query_evidence` | Return only the matching subset from the indexed store — reach a 100k-row MFT without dumping it |
| `evidence_store_stats` | Row counts per indexed artifact source |
| `check_tool_availability` | Preflight: which external tool groups are operational in this environment, with install hints |
### YARA Hunting
| Tool | Purpose |
|---|---|
| `list_yara_rule_sets` | Enumerate available rule sets |
| `scan_memory_with_yara` | Yarascan via Volatility 3 (finds memory-resident payloads) |
| `scan_file_with_yara` | Static file scan against named rule set |
**Built-in YARA rule sets:** `suspicious_strings` · `webshells` · `ransomware` · `rats` · `packers`
### Memory Forensics — Advanced (Volatility 3)
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `get_modules` | Kernel module list; flags unsigned/suspicious drivers | `suspicious_modules`, `mitre_techniques`, `threat_intel` |
| `get_driverirp` | IRP dispatch table hook detection (rootkit) | `hooked_handlers`, `threat_intel` |
| `get_getsids` | Security identifiers per process (privilege enumeration) | `sids`, `admin_processes` |
| `get_hashdump` | NTLM password hash extraction from SAM in memory | `accounts`, `non_empty_hashes`, `threat_intel` |
| `get_lsadump` | LSA secrets from memory (service account passwords) | `secrets`, `threat_intel` |
| `get_cachedump` | Domain cached credential hashes (DCC2) | `cached_accounts` |
| `get_clipboard` | Clipboard contents at time of acquisition | `clipboard_text` |
| `get_atoms` | Windows atom table (GUI attack staging) | `atoms` |
| `get_sessions` | Terminal Services / RDP session list | `sessions`, `rdp_sessions` |
| `get_mft_memory` | In-memory MFT record extraction | `mft_records` |
| `get_ads_memory` | Alternate Data Stream detection from memory image | `ads_entries` |
| `dump_process` | Dump a suspicious process to disk for static analysis | `output_path`, `sha256` |
### Browser Artifacts
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_chrome_history` | SQLite history + downloads; cloud exfil domain classification | `suspicious_visits`, `suspicious_downloads`, `parser_summary`, `threat_intel` |
| `parse_firefox_history` | places.sqlite history + downloads; threat flags | `suspicious_visits`, `parser_summary`, `threat_intel` |
| `parse_chrome_extensions` | Installed extensions; flags risky permissions | `suspicious_extensions`, `high_risk_count` |
| `parse_browser_cookies` | Cookie store extraction; session token discovery | `cookies`, `suspicious_domains` |
| `run_hindsight` | Full Chrome/Chromium browser artifact extraction | `output_dir`, `summary` |
| `parse_browser_passwords` | Saved password store; credential theft evidence | `credentials`, `domain_count` |
| `parse_ie_edge_legacy_history` | IE/Edge Legacy WebCacheV01.dat history | `visits`, `downloads` |
| `parse_chromium_cache` | Chromium disk cache; cached malware delivery pages | `cache_entries`, `suspicious_urls` |
### Email Artifacts
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_pst_ost` | Outlook PST/OST via readpst; exfiltration email search | `email_count`, `suspicious_emails`, `attachments` |
| `parse_thunderbird` | Thunderbird mbox profile extraction | `emails`, `suspicious_emails` |
| `parse_eml_file` | Single .eml file; header analysis + attachment extraction | `headers`, `attachments`, `iocs` |
| `extract_email_attachments` | Bulk attachment extraction for malware analysis | `extracted_count`, `suspicious_attachments` |
| `analyze_email_headers` | RFC 5322 header forensics; spoofing + routing analysis | `spf_result`, `dkim_result`, `hop_analysis`, `mitre_techniques` |
### Cloud Storage Artifacts
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_dropbox_logs` | Dropbox sync logs; exfiltration risk classification | `sync_events`, `parser_summary`, `threat_intel` |
| `parse_onedrive_logs` | OneDrive sync/activity logs | `sync_events`, `parser_summary`, `threat_intel` |
| `parse_google_drive_logs` | Google Drive desktop sync logs | `sync_events`, `parser_summary` |
| `parse_slack_artifacts` | Slack desktop app data; workspace + channel forensics | `workspaces`, `suspicious_events` |
| `parse_teams_artifacts` | Microsoft Teams SQLite databases; chat + call forensics | `accounts`, `messages`, `suspicious_events` |
| `parse_icloud_logs` | iCloud for Windows sync logs | `sync_events`, `parser_summary` |
### Document Analysis
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `analyze_pdf_doc` | pdfid keyword scan; JavaScript/OpenAction/launch classification | `risk_score`, `suspicious_keywords`, `mitre_techniques`, `threat_intel` |
| `analyze_ole_doc` | oletools VBA macro extraction + malicious pattern detection | `macros`, `classified_risks`, `mitre_techniques` |
| `analyze_rtf_doc` | rtfobj embedded object extraction; malicious CLSID detection | `objects`, `clsid_risks` |
| `analyze_zip_archive` | ZIP entry inspection; password-protected + double-ext detection | `entries`, `suspicious_entries` |
| `detect_dde_payload` | DDE/DDEAUTO command injection in Office documents | `dde_found`, `commands`, `threat_intel` |
### Linux / macOS Forensics
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `get_linux_processes` | Volatility linux.pslist; attack command + LD_PRELOAD detection | `suspicious`, `threat_flags`, `threat_intel` |
| `get_linux_bash_history` | Bash command history with attack pattern classification | `commands`, `classified_suspicious`, `threat_intel` |
| `get_linux_network` | linux.netstat via Volatility | `connections`, `external` |
| `get_linux_modules` | Kernel module list; rootkit LKM detection | `modules`, `suspicious` |
| `get_linux_syscall` | System call table hook detection | `hooks` |
| `get_linux_malfind` | malfind equivalent for Linux memory images | `injected` |
| `get_linux_envars` | Process environment variables | `envars`, `suspicious` |
| `get_linux_mounts` | Mount table; network share + hidden mount detection | `mounts`, `suspicious` |
| `parse_syslog` | Syslog/auth.log parsing; auth failure + sudo classification | `classified_events`, `classified_summary`, `threat_intel` |
| `parse_linux_crontab` | Crontab persistence detection across all users | `cron_entries`, `suspicious_schedules` |
### Network Forensics — Extended
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_zeek_logs` | Zeek conn/dns/http/ssl/files log parsing; DNS tunneling detection | `suspicious_dns`, `external_conns`, `threat_intel` |
| `parse_iis_logs` | IIS W3C access logs; web shell + SQLi + scanner detection | `suspicious_requests`, `web_shells`, `threat_intel` |
| `parse_apache_logs` | Apache access/error logs; same threat classification | `suspicious_requests`, `port_scans` |
| `extract_pcap_files` | Extract files from PCAP via NetworkMiner/tshark | `extracted_files` |
| `parse_firewall_logs` | Firewall deny/allow logs; lateral movement flagging | `suspicious_flows`, `internal_scanning` |
| `decode_rdp_bitmap_cache` | RDP bitmap cache → screenshot reconstruction | `output_dir`, `image_count` |
| `parse_netflow` | NetFlow/IPFIX analysis; top talkers + exfil signals | `top_talkers`, `large_flows`, `exfil_candidates` |
### Anti-Forensics Detection
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `detect_timestomping` | SI vs FN MACB delta comparison; round-number timestamps | `si_fn_delta_anomalies`, `mitre_techniques`, `threat_intel` |
| `detect_log_wiping` | Event ID 1102/104/4719; zero-byte EVTX detection | `log_clear_events`, `threat_intel` |
| `detect_secure_deletion` | SDelete/Eraser/CCleaner traces in prefetch + shimcache | `secure_deletion_indicators`, `threat_intel` |
| `detect_ads_streams` | NTFS Alternate Data Stream discovery | `suspicious_streams`, `threat_intel` |
| `analyze_vss_shadows` | Volume Shadow Copy inventory; deletion evidence | `shadow_copy_count`, `rag_context` |
| `detect_prefetch_anomalies` | Temp path execution + anti-forensics tool execution | `suspicious_entries`, `anti_forensics_tools` |
| `detect_event_log_tampering` | Event ID 1102/4719/7040 audit policy changes | `findings`, `threat_intel` |
### File Carving and Static Analysis
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `run_bulk_extractor` | Bulk feature extraction: emails, URLs, IPs, CCNs, Base64 | `top_iocs`, `enriched_email_iocs`, `enriched_url_iocs` |
| `carve_files_foremost` | Header/footer file carving from unallocated space | `recovered_files_by_type`, `total_recovered` |
| `carve_files_scalpel` | Configurable signature-based file carving | `recovered_files_by_type` |
| `analyze_with_exiftool` | Metadata extraction (GPS, author, software, revision) | `interesting_fields`, `full_metadata` |
| `calculate_file_hashes` | MD5/SHA1/SHA256/SHA512 + ssdeep fuzzy hash | `hashes`, `ssdeep` |
| `detect_capabilities_capa` | capa: capability detection mapped to MITRE ATT&CK | `capabilities`, `mitre_techniques`, `threat_intel` |
| `extract_floss_strings` | FLOSS: XOR/stack/tight decoded string extraction | `decoded_strings`, `ioc_ips_in_decoded`, `threat_intel` |
| `get_file_type` | Magic byte vs extension mismatch (masquerade detection) | `extension_mismatch`, `mitre_techniques` |
### Extended Registry Forensics
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `parse_shellbags` | Folder navigation history; deleted dir + USB + share access | `suspicious_path_accesses`, `threat_intel` |
| `parse_windows_timeline` | ActivitiesCache.db: app launches + file opens | `file_opens`, `app_launches` |
| `parse_bam_dam` | BAM/DAM last-execution timestamps per user SID | `suspicious_executions`, `threat_intel` |
| `parse_typed_paths` | Explorer address bar history; network share + admin share paths | `network_share_paths`, `removable_media_paths` |
| `parse_run_mru` | Run dialog (Win+R) execution history | `suspicious_run_commands`, `threat_intel` |
| `parse_open_save_mru` | Open/Save dialog recent file access | `entries` |
| `parse_wordwheelquery` | Windows Search query history; sensitive file discovery | `suspicious_searches`, `threat_intel` |
| `parse_installed_software` | Installed programs; RAT/hacking tool detection | `suspicious_software`, `threat_intel` |
| `parse_sam_hive` | Local user accounts and last logon info | `entries` |
| `parse_logon_history` | Cached domain credentials in SECURITY hive | `entries`, `forensic_note` |
### Extended Disk Forensics
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `get_fs_statistics` | fsstat: block size, volume name, creation/mount timestamps | `fs_type`, `block_size`, `creation_time` |
| `get_image_info` | ewfinfo/mmls: image format, acquisition hash, partition table | `ewf_metadata`, `partition_table` |
| `create_mac_timeline` | mactime: body-file MAC(B) timeline generation | `total_timeline_entries`, `output_path` |
| `read_raw_block` | blkcat: hexdump specific sectors; magic byte detection | `hexdump`, `detected_structure` |
| `analyze_slack_space` | blkls: file slack space extraction + IOC scanning | `ips_in_slack`, `urls_in_slack`, `threat_intel` |
| `verify_image_integrity` | MD5/SHA256 + ewfverify chain-of-custody verification | `integrity_verified`, `chain_of_custody` |
### Threat Intelligence
| Tool | Purpose | Key Output Fields |
|---|---|---|
| `lookup_hash_reputation` | VirusTotal file hash lookup (MD5/SHA1/SHA256) | `detection_ratio`, `verdict`, `mitre_techniques`, `threat_intel` |
| `lookup_domain_reputation` | VirusTotal + WHOIS domain reputation check | `verdict`, `mitre_techniques`, `threat_intel` |
| `search_mitre_technique` | RAG query for MITRE ATT&CK technique details | `rag_results`, `static_knowledge` |
| `search_ioc_database` | Search all IOCs in the RAG knowledge base | `matches`, `match_count` |
| `calculate_fuzzy_hash_similarity` | ssdeep similarity between two files/hashes (malware variants) | `similarity_score`, `interpretation` |
## How DeepSIFT Eliminates Hallucination
flowchart LR
A["Raw Tool Output\nVolatility / EZ Tools / etc."] --> B["Python Parser\nStructured dict"]
B --> C["MITRE Auto-Map\nmap_process_anomalies\nmap_injection\nmap_network_connection"]
C --> D["RAG Enrichment\nChromaDB query\nMITRE · threat intel · case IOCs"]
D --> E["Forensic Knowledge Envelope\ncaveats · advisories · corroboration"]
E --> F["audit_id\nSHA-256 of raw output\nTimestamp + export file path"]
F --> G["Structured JSON\nto LLM Context"]
Every tool call generates a unique `audit_id` (e.g. `dsift-2026-06-11-a3f9b2c1`).
`finish_analysis` **requires** an `audit_ids` list — fabricated findings without a
traced audit_id are structurally impossible to submit.
## Investigation Workflow
### Memory Image
flowchart TD
A["get_process_list\nHunt Evil baseline + MITRE tags"] --> B["scan_hidden_processes\nDKOM rootkit detection"]
B --> C["find_injected_code\nmalfind injection classification"]
C --> D["get_running_services\nsuspicious binary paths"]
D --> E["get_network_connections\nexternal IP flagging"]
E --> F["get_command_history\nsuspicious pattern detection"]
F --> G["lookup_ip_reputation\nAbuseIPDB + VirusTotal"]
G --> H["correlate_artifacts\ncross-source PID/path/IP joins"]
H --> I["adversarial_review\nchallenge hypothesis"]
I --> J["finish_analysis\nobservation + interpretation\naudit_ids required"]
### Windows Artifact Analysis
flowchart TD
A["parse_event_logs\nlogon · service · task · PS · WMI · RDP"] --> B["parse_shimcache\nexecutable existence"]
B --> C["parse_amcache\nSHA1 hash per executable"]
C --> D["parse_prefetch\nexecution history x8 runs"]
D --> E["parse_mft\nfull FS timeline + timestamp anomalies"]
E --> F["parse_srum\nbytes sent per application\nexfil quantification"]
F --> G["parse_usn_journal\nburst deletion detection"]
G --> H["correlate_artifacts"]
H --> I["adversarial_review"]
I --> J["finish_analysis"]
## What Sets DeepSIFT Apart
The challenge is to take Protocol SIFT — Claude Code wired directly to the SIFT
Workstation — and make it production-grade. DeepSIFT does exactly that. Against the
prompt-only baseline, every dimension a DFIR agent is judged on is upgraded from a
prompt-level suggestion to an **architecturally enforced guarantee**:
| Judging dimension | Protocol SIFT (prompt-only baseline) | **DeepSIFT** |
|---|---|---|
| Tool output → LLM | 10k+ lines of raw CLI text in-context | **Typed JSON from 15 middleware parsers** — the model never sees raw text |
| Hallucination control | Natural-language "be careful" rules | **Per-claim grounding verification** against raw export bytes (`verify_findings.py`) |
| Confidence | Qualitative "high/low" | **4-axis quantified score (0–100)** |
| Safety boundaries | Prompt instructions | **`guard_command` + `guard_output_path` raise at the OS layer** — evidence is read-only by construction |
| Audit trail | None | **SHA-256 hash chain + optional HMAC signing** (forgery-resistant), one entry per tool call |
| Threat intel | Training-time memory | **RAG (MITRE ATT&CK + LOLBAS + Hunt Evil) injected into every tool call** |
| Autonomy evidence | Lives in the chat, then lost | **Server-side hypothesis ledger** with confirm/disprove/self-correction + confidence |
| Detection breadth | ~30 event IDs | **3,700+ Sigma rules** (Hayabusa) + 6-type contradiction detection |
| Scale | Dump artifacts into context | **Indexed SQLite evidence store** — query the full set, page only the matches |
| Human review | None | **Interactive Examiner Portal** with HMAC sign-off, drill-down, multi-case |
| Accuracy (must-identify) | 25% on ROCBA, missed disk-only FOR500 | **100% on both, 0 hallucinations, 100% grounding** |
**Capabilities unique to DeepSIFT's design:**
- **Grounding at the tool layer** — every claim token is matched against the raw evidence its
`audit_id` cites; findings are reproducible from first principles, not taken on trust.
- **Quantified, multi-axis confidence** — tool reliability + corroboration + IOC specificity +
MITRE accuracy, not an adjective.
- **Forgery-resistant chain of custody** — hash-chained and HMAC-signable; tampering (modify /
insert / delete) is provably detectable, and signatures cannot be forged without the key.
- **Captured autonomy with no API key** — Claude Code drives the typed tools and records its
reasoning server-side, so the senior-analyst loop is auditable, not anecdotal.
- **Per-tool forensic knowledge envelope** — caveats, advisories, and corroboration hints wrap
every response, so the model reasons with forensic discipline at every step.
- **Client-agnostic** — the same tool surface serves over stdio *or* HTTP (SSE/streamable-http)
to any MCP client or remote agent.
## How to Run It (three ways)
DeepSIFT runs three ways:
- **Claude Code + MCP server** *(how a judge can drive it directly, no extra API key)*: point
Claude Code at the DeepSIFT MCP server via `.mcp.json` and ask it to investigate `/mnt/evidence`.
Claude Code *is* the agent; every action goes through the typed, parsed, audited, guard-railed
tools — it cannot run a raw shell command. The session records its reasoning via
`record_hypothesis`/`update_hypothesis`/`finish_analysis`, producing an auditable autonomy trail
with no API key. **Client-agnostic:** set `DEEPSIFT_MCP_TRANSPORT=sse` to serve the same tools over
HTTP to *any* MCP client (Claude Desktop, Cherry Studio, LibreChat, a remote agent, or a gateway).
- **`investigate.py` — agentic reasoning** *(the senior-analyst mode)*: an LLM forms explicit
**hypotheses**, chooses which typed MCP tool to run next, reads the parsed/audited JSON,
marks each hypothesis **confirmed / disproved / inconclusive with a confidence**, **self-corrects**
when a tool fails or a result contradicts a hypothesis, and reconstructs the **attack chain**.
Works on **any** evidence shape and adapts its first triage step accordingly:
export ANTHROPIC_API_KEY=sk-ant-...
# disk-only (no memory image) — a first-class autonomous run
python3 investigate.py --evidence-mount /mnt/evidence
# memory + disk
python3 investigate.py --image /cases//memory.raw --evidence-mount /mnt/evidence
# memory-only
python3 investigate.py --image /cases//memory.raw
- **`demo.py` — deterministic pipeline**: fixed multi-agent sequence (no LLM/key) for reproducible,
scriptable benchmark runs.
## Examiner Portal — Human Review
A reviewer or judge can inspect a completed investigation in one command — **no pip installs**
(Python standard library only):
python3 examiner_portal.py # interactive live UI → http://127.0.0.1:8420
python3 examiner_portal.py --cases-root /cases # adds a multi-case picker across investigations
python3 examiner_portal.py --html reports/examiner_review.html # static read-only file (no server)
The portal shows the **verdict + confidence**, the **autonomous-reasoning hypothesis ledger**
(confirmed/disproved/self-corrections), every **finding** (suspicious processes, exfil IOCs, named
MITRE ATT&CK badges, timeline, files accessed), the **evidence-grounding** result (verified vs
unverified claims), and the full **chain of custody** — every audited tool call with the SHA-256 of
its raw output plus a **recomputed hash-chain integrity verdict that detects tampering**. It is
**interactive**: click any audit row to **drill into the raw evidence** (with a live SHA-256 match
check), browse **multiple cases**, and perform an **examiner sign-off** — approve/reject each finding
and produce an **HMAC-signed, tamper-evident manifest** binding the findings hash + audit-chain head.
This directly answers the "usability" and "audit trails" judging criteria.
## Architectural Guardrails
Enforced in code, not prompts — these raise exceptions; the model cannot talk its way past them:
- Every tool hard-codes its own forensic binary and builds an **argv list (never `shell=True`,
never a shell string)** — the model cannot choose the binary or smuggle a second command, so an
arbitrary destructive command is unreachable by construction.
- `mcp_server.audit.guard_command` adds defense-in-depth on the parameter-rich binary launchers
(Volatility, EZ Tools / Windows-artifact, and registry exec paths): it blocks destructive/
exfiltration binaries (`rm`, `dd`, `shred`, `mkfs`, `wget`, `curl`, `scp`, `ssh`, `nc`, shells…)
and shell redirection/chaining tokens, and rejects shell-string commands outright.
- `guard_output_path` blocks writes under evidence roots (`/cases/`, `/mnt/`, `/media/`).
- Tool output is parsed to JSON before reaching the LLM; every call is logged with a SHA-256 of
the raw output (`analysis/forensic_audit.log`).
- **Tamper-evident *and* tamper-resistant audit chain.** Entries form a SHA-256 hash chain (any
modify/insert/delete breaks it). Set `DEEPSIFT_AUDIT_KEY` (held off the evidence host) to also
**HMAC-sign** the chain — an attacker who rewrites the entire log still cannot forge valid
signatures without the key. `verify_audit_chain()` reports both; the Examiner Portal shows it live.
- **Token-scale by design + a queryable store.** The LLM only ever sees each tool's parsed, *capped*
summary JSON; the full raw evidence (up to MBs) goes to the on-disk audit record, never into the
prompt (`AGENT_TOOL_RESULT_CHARS`). For full-disk scale, `index_evidence` ingests the *complete*
artifact rows (the EZ tools' exports/*.csv) into a stdlib **SQLite** store and `query_evidence`
returns only the matching subset — reach a 100k-row MFT or full shellbag set without dumping it.
A dependency-light alternative to standing up OpenSearch.
## Validated Results
DeepSIFT has been validated end-to-end on **two organizer-provided SANS cases** — one memory+disk,
one disk-only — each scored by `benchmark/scorer.py` against published ground truth and
**independently reproducible** by a judge via `python3 verify_findings.py`.
| Case | Evidence | Protocol SIFT baseline | **DeepSIFT** | Hallucinations | Claim grounding |
|---|---|:---:|:---:|:---:|:---:|
| **ROCBA** (FOR508) | memory + disk | 0 / 4 (0 %) | **4 / 4 (100 %)** | 0 | **100 %** |
| **Abducted Zebrafish / Vanko** (FOR500) | disk-only | 3 / 4 (75 %) | **4 / 4 (100 %)** | 0 | **100 %** |
### ROCBA — FOR508 (memory + disk)
End-to-end benchmark on the SANS FOR508 **ROCBA** case (`Rocba-Memory.raw` 18 GB +
`rocba-cdrive.e01` 81 GiB C: volume).
The memory image was captured **3 days after** the 2020-11-13 incident, so the break-in evidence
exists only on disk. DeepSIFT's disk + browser analysis reconstructs it with zero hallucinations:
- **Unauthorized access (2020-11-13)** — wave of Event 4625 *Failed Logon* (RDP brute force).
- **IP theft / exfiltration** — LNK artifacts show SRL project files (`Megaforce Specs & Research.docx`,
`Blue Thunder blueprint`, `Files from SRL system`) copied to an external **`F:\` USB drive** on 2020-11-13.
- **Cloud usage + incident-window browsing** — Google Drive + SharePoint (`starkresearchlabs-my.sharepoint.com`)
access on Nov 14 UTC (= Nov 13 evening EST), and a Google search for **`sdelete download`** (anti-forensics).
Reproduce (deterministic, no LLM/API key required):
python3 demo.py \
--image /cases/ROCBA/Rocba-Memory.raw \
--evidence-mount /mnt/evidence \
--baseline benchmark/baselines/protocol_sift_rocba_findings.json \
--ground-truth benchmark/ground_truth/rocba_ground_truth.json
### Abducted Zebrafish / Vanko — FOR500 (disk-only)
A **disk-only** case (physical Microsoft Surface 3 image, no memory capture) — the scenario the
prompt-only baseline handles least well, and a first-class autonomous run for DeepSIFT. DeepSIFT
scored **4/4 must-identify, 0 hallucinations, 100 % claim grounding**, and was the only configuration
to recover the classified research **subject matter** the baseline missed. Every claim below traces
to an audited tool call:
- **Access to classified StarkResearch directories (Level 5–8)** — shellbags (SbECmd, 252 entries)
record Explorer browsing of the `\\192.168.1.5\StarkResearch\Level 5–8 Classified` SMB share and
the locally-staged `Downloads\vacation photos\` cover-named copies.
- **Classified subject matter (the criterion the baseline missed)** — jump lists / LNK recover
`zebrafish.pdf`, `ZF DNA splice test notes.docx`, and `Rapid cell regeneration research.docx`.
- **Tooling & staging** — decoded UserAssist shows just-in-time `7-Zip` (2016-06-29 16:01) and
`VeraCrypt 1.17` (6 runs, last 2016-06-30 01:56), plus `StarkCollector.exe` and `sdelete.exe`.
- **Exfiltration channels** — `parse_usb_history` enumerates 9 USB mass-storage devices (WD
My Passport, SanDisk Cruzer, Verbatim Store-N-Go, PNY, Innostor) and Chrome history shows an
`icloud.com` cloud-storage visit.
This case is what motivated DeepSIFT's [Production Hardening](#production-hardening) below — the
disk-artifact/registry path was hardened so analysis is correct and case-isolated on any acquired
image.
### Production Hardening
The EZ Tools / registry path was hardened so disk-artifact analysis is correct and
case-isolated on any acquired image (no behaviour is specific to a particular case):
- **Cross-case isolation** — every EZ Tools run clears its own CSV output directory first, so a
prior case's output (e.g. another user profile's LNK history) can never be re-read as the
current case's evidence.
- **Dirty-hive parsing** — RECmd/SbECmd are invoked with `--nl` so live-acquired registry hives
(which ship TxR `.blf` logs, not the `.LOG1/.LOG2` files those tools replay) are parsed
as-acquired instead of aborting and silently returning zero rows.
- **Offline-hive keys** — registry lookups resolve `ControlSet001` (acquired hives have no
`CurrentControlSet` symlink), and single-key (`--kn`) dumps that write to stdout rather than
CSV are still parsed into structured entries.
- **Correct artifact decoding** — UserAssist entries are read from the decoded program-path /
run-count / last-executed columns (not the raw ROT13 value name); SbECmd output (named per
hive) is read by scanning all CSVs it produces.
- **Case-agnostic knowledge base** — the offline RAG corpus contains only general forensic
knowledge (MITRE catalog, LOLBAS, Hunt Evil baseline). Per-case IOCs are opt-in
(`--case-ioc-json` / `--load-rocba`) so one case never biases another.
### Running on SIFT Workstation (Linux) — notes
- **EZ Tools** are invoked as .NET assemblies (`dotnet /opt/zimmermantools/.dll`, subdir-aware),
not Windows `.exe` — works on stock SIFT with the dotnet runtime.
- **Evidence mounting** is read-only; NTFS volume images with a truncated backup-boot sector mount via
the kernel `ntfs3` driver (`mount -t ntfs3 -o ro /mnt/evidence`).
- **Offline / air-gapped RAG** — if a GPU build of `torch`/sentence-transformers or the embedding model
is unavailable, the knowledge base falls back to an offline hashing embedder and seeds from the bundled
Hunt Evil process baseline + case IOCs (no network needed).
- **Event-log scope** — disk_agent parses live security/system/RDP/PowerShell channels plus the most
recent rotated Security archives (bounded), and retains events **date-stratified** so the incident
window is never truncated away.
- **Browser coverage** — all profiles of all installed browsers (Chrome/Edge/Brave + Firefox) are
analysed, auto-discovered from the evidence mount.
## Setup & Installation
### Prerequisites
- SANS SIFT Workstation (Ubuntu 20.04+)
- Python 3.10+
- Volatility 3, log2timeline, Sleuth Kit (pre-installed on SIFT)
- EZ Tools at `/opt/zimmermantools/` (install with SIFT EZ Tools script) — run via the dotnet runtime
### Installation
git clone https://github.com/ahammadshawki8/DeepSIFT
cd DeepSIFT
# Install Python dependencies
pip3 install -r requirements.txt
# Copy environment config
cp .env.example .env
nano .env # Add ABUSEIPDB_API_KEY and VIRUSTOTAL_API_KEY (optional but recommended)
# Initialize RAG knowledge base (first run only, ~3-5 minutes)
python3 rag/ingest/run_all.py
# Run tests
pytest tests/
# Expected: 75 passed, 1 skipped
### Connect to Claude Code
Add to `~/.claude.json` (or `.claude/settings.json` in your project):
{
"mcpServers": {
"deepsift": {
"command": "python3",
"args": ["/path/to/deepsift/mcp_server/server.py"]
}
}
}
Start the server in a separate terminal:
python3 mcp_server/server.py
## Verify It Yourself (no API key)
Everything below runs offline with **no API key** — a judge can confirm DeepSIFT from
first principles in a few minutes.
# 1. Works on a fresh clone immediately:
python3 preflight.py # which forensic tool groups are operational here (honest, per-host)
pytest -q # full test suite → 75 passed, 1 skipped
# 2. See a completed result instantly (committed sample — no run needed):
# open docs/sample/vanko_examiner_report.html (or rocba_examiner_report.html) in a browser,
# or render the portal view of a sample:
python3 examiner_portal.py --findings docs/sample/vanko_findings.json --html /tmp/review.html
After you run your own investigation (next section) the live results land in `analysis/`, and
these confirm them independently:
python3 verify_findings.py # re-checks every claim vs raw evidence + recomputes the hash chain
python3 examiner_portal.py # live review UI → http://127.0.0.1:8420
**Reproduce the head-to-head benchmark** (deterministic; no LLM/API key required):
python3 demo.py \
--image /cases/ROCBA/Rocba-Memory.raw \
--evidence-mount /mnt/evidence \
--baseline benchmark/baselines/protocol_sift_rocba_findings.json \
--ground-truth benchmark/ground_truth/rocba_ground_truth.json
**Drive a live investigation as the agent** — connect Claude Code to the MCP server
(`.mcp.json`) and ask, e.g.:
Investigate /mnt/evidence for unauthorized access and data exfiltration. Use DeepSIFT
tools only: record hypotheses, confirm or disprove each with the right tool, then call
finish_analysis citing every audit_id.
Claude Code follows the workflow, records its hypotheses and self-corrections, cross-correlates
artifacts, challenges its own conclusions with `adversarial_review`, and calls `finish_analysis`
with a grounded, structured report citing every `audit_id` — no external API key required.
## Evidence Integrity & Chain of Custody
Every tool call generates an immutable, hash-chained audit record:
{
"audit_id": "dsift-2026-06-11-a3f9b2c1",
"timestamp": "2026-06-11T14:23:07.412Z",
"tool": "get_process_list",
"command": "python3 -m volatility3 -f /cases/ROCBA/Rocba-Memory.raw windows.pslist.PsList",
"raw_output_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"raw_output_file": "exports/get_process_list_2026-06-11T14-23-07-412Z.txt",
"prev_hash": "…",
"entry_hash": "…"
}
- **Provenance-gated reporting** — `finish_analysis` requires an `audit_ids` list. Any finding not
traceable to a prior tool call is structurally blocked — the tool errors and no report is written.
- **Tamper-evident chain** — each entry binds the previous entry's hash, so any modify/insert/delete
breaks the chain (`verify_audit_chain()`).
- **Tamper-resistant (optional)** — set `DEEPSIFT_AUDIT_KEY` (held off the evidence host) to
additionally **HMAC-sign** the chain; an attacker who rewrites the whole log cannot forge valid
signatures without the key.
- **Independently checkable** — `python3 verify_findings.py` recomputes both the grounding and the
chain integrity from the on-disk artifacts.
## RAG Knowledge Base
The RAG pipeline (ChromaDB + sentence-transformers, with an offline hashing-embedder fallback)
ships a **case-agnostic** corpus — only general forensic knowledge. One case's indicators are
**never** baked in by default, so an investigation is never biased by an unrelated case.
| Source (default corpus) | Coverage |
|---|---|
| MITRE ATT&CK technique catalog | Technique IDs + names mapped by the parsers (kept in sync with `mitre_auto_map`) |
| LOLBAS reference | Commonly abused signed Windows binaries and how attackers misuse them |
| SANS Hunt Evil baseline | Known-normal Windows process baseline for anomaly detection |
Per-case IOCs are **opt-in** and loaded only for that investigation
(`python3 rag/ingest/run_all.py --case-ioc-json `; a bundled ROCBA example pack is
available via `--load-rocba`). RAG context is injected into tool responses **at call time** — the
model sees threat intelligence alongside the parsed artifact data, not as a separate lookup step.
## Benchmarking
### Protocol SIFT vs DeepSIFT (ROCBA case)
python3 demo.py \
--image /cases/ROCBA/Rocba-Memory.raw \
--baseline benchmark/baselines/protocol_sift_rocba_findings.json \
--ground-truth benchmark/ground_truth/rocba_ground_truth.json \
--report-output docs/accuracy_report.html
The HTML report shows:
- Side-by-side finding comparison (DeepSIFT vs Protocol SIFT)
- Color-coded MITRE ATT&CK badges
- Precision, recall, and F1 scores vs ground truth
- Chain-of-custody audit trail summary
### vigia-cases Standardized Benchmark
DeepSIFT supports the `annatchijova/vigia-cases` standardized benchmark dataset used
across multiple hackathon submissions for objective cross-system comparison:
# Clone vigia-cases dataset
git clone https://github.com/annatchijova/vigia-cases
# Run DeepSIFT against all cases
python3 benchmark/vigia_runner.py \
--vigia-root ./vigia-cases \
--results-root ./benchmark/deepsift_results \
--output-json benchmark/reports/vigia_report.json \
--output-md benchmark/reports/vigia_report.md
Scored dimensions: MITRE Recall · IOC Recall · Narrative Recall · Hallucination Rate · Grounding Score · Confidence Score · Contradictions Found
## Project Structure
DeepSIFT/
├── mcp_server/
│ ├── server.py ← MCP server entry point (148 tools, 23 modules)
│ ├── config.py ← Tool paths, environment config
│ ├── audit.py ← audit_id generation, tool counter, chain-of-custody log
│ ├── tools/
│ │ ├── volatility.py ← 12 core Volatility tools + verify_findings + finish_analysis
│ │ ├── volatility_extended.py ← 10 advanced Volatility tools (privileges, VAD, SSDT, callbacks)
│ │ ├── volatility_advanced.py ← 12 Volatility tools (modules, IRP hooks, hashdump, dump_process)
│ │ ├── windows_artifacts.py ← 16 EZ Tools wrappers (event logs, registry, execution artifacts)
│ │ ├── registry_extended.py ← 10 registry tools (shellbags, BAM/DAM, MRU, SAM, timeline)
│ │ ├── browser_artifacts.py ← 8 browser tools (Chrome, Firefox, Edge, Hindsight, cache)
│ │ ├── email_artifacts.py ← 5 email tools (PST/OST, Thunderbird, EML, header forensics)
│ │ ├── cloud_artifacts.py ← 6 cloud tools (Dropbox, OneDrive, Google Drive, Slack, Teams)
│ │ ├── document_analysis.py ← 5 document tools (PDF, OLE/VBA, RTF, ZIP, DDE)
│ │ ├── linux_forensics.py ← 10 Linux tools (processes, bash history, syslog, crontab)
│ │ ├── network_analysis.py ← 3 network tools (PCAP, DNS, ARP)
│ │ ├── network_extended.py ← 7 network tools (Zeek, IIS, Apache, firewall, netflow, RDP)
│ │ ├── anti_forensics.py ← 7 anti-forensics detection tools (timestomp, log wipe, ADS, VSS)
│ │ ├── file_carving.py ← 8 tools (bulk_extractor, foremost, scalpel, capa, FLOSS, exiftool)
│ │ ├── file_analysis.py ← 3 static analysis tools (PE metadata, strings, packer detection)
│ │ ├── disk_extended.py ← 6 disk tools (fsstat, ewfinfo, mactime, blkcat, slack, integrity)
│ │ ├── threat_intel_extended.py ← 5 threat intel tools (VT hash/domain, MITRE search, IOC DB, ssdeep)
│ │ ├── log2timeline.py ← 3 Plaso tools
│ │ ├── sleuthkit.py ← 4 Sleuth Kit tools
│ │ ├── yara_tools.py ← 3 YARA tools
│ │ ├── hayabusa.py ← 2 Hayabusa tools (3,700+ Sigma rules)
│ │ └── correlation.py ← 3 tools: correlate_artifacts, adversarial_review, detect_contradictions
│ └── parsers/
│ ├── pslist_parser.py ← SANS Hunt Evil baseline (31 processes), masquerade detection
│ ├── netscan_parser.py ← External IP extraction and flagging
│ ├── malfind_parser.py ← Injection type classification (PE/shellcode/reflective)
│ ├── timeline_parser.py ← Suspicious keyword detection in Plaso timeline
│ ├── mitre_auto_map.py ← Rule-based MITRE ATT&CK mapping (80+ rules, 19 categories)
│ ├── rag_enrichment.py ← Shared RAG enrichment helpers (enrich_findings, build_rag_summary)
│ ├── browser_parser.py ← Browser URL/download threat classification
│ ├── cloud_parser.py ← Cloud sync exfiltration risk classification
│ ├── document_parser.py ← PDF/OLE/RTF/DDE/ZIP malicious document classification
│ ├── network_log_parser.py ← Web/firewall/DNS log threat classification
│ ├── linux_parser.py ← Linux process/command/syslog threat classification
│ ├── grounding_verifier.py ← Post-hoc verbatim token grounding check
│ ├── confidence_scorer.py ← 4-axis quantified confidence scoring (0-100)
│ └── forensic_knowledge.py ← Per-tool forensic caveats/advisories/corroboration (148 entries)
├── rag/
│ ├── knowledge_base.py ← ChromaDB vector store
│ ├── query.py ← Semantic search interface
│ └── ingest/
│ ├── knowledge_corpus.py ← Case-agnostic offline corpus (MITRE catalog + LOLBAS + Hunt Evil)
│ ├── mitre_attack.py ← MITRE ATT&CK Enterprise ingestion
│ ├── case_history.py ← Per-case findings ingestion (opt-in, per investigation)
│ ├── rocba_iocs.py ← Example case-IOC pack (opt-in via --load-rocba; not auto-loaded)
│ └── run_all.py ← One-command RAG initialization
├── agents/
│ ├── orchestrator.py ← LangGraph multi-agent coordination (deterministic pipeline)
│ └── reasoning_agent.py ← Agentic LLM reasoning loop over the typed tools
├── benchmark/
│ ├── scorer.py ← must-identify / hallucination scoring vs ground truth
│ ├── compare.py ← Case-agnostic side-by-side comparison + HTML report
│ ├── vigia_runner.py ← vigia-cases standardized multi-case benchmark
│ ├── ground_truth/ ← Per-case ground-truth scoring files
│ ├── baselines/ ← Protocol SIFT reference findings
│ └── reports/html_report.py ← Visual HTML comparison report
├── tests/ ← pytest unit tests (74 passing, 1 skipped)
├── yara_rules/
│ ├── suspicious_strings.yar ← T1059.001, T1003, T1218, T1547.001
│ ├── webshells.yar ← T1505.003
│ ├── ransomware.yar ← T1486, T1490
│ ├── rats.yar ← T1219, T1071
│ └── packers.yar ← T1027.002
├── analysis/ ← findings.json + forensic_audit.log (runtime)
├── exports/ ← raw tool outputs SHA-256 indexed (runtime)
├── docs/ ← architecture.md, dataset.md, devpost_submission.md, JUDGING.md
├── investigate.py ← Autonomous agentic investigation (memory / disk-only / both)
├── demo.py ← Deterministic multi-agent pipeline (no LLM/key)
├── examiner_portal.py ← Interactive human-review UI (sign-off, drill-down, multi-case)
├── verify_findings.py ← Independent re-verification of claims + audit chain
├── preflight.py ← Environment self-check (operational tool groups)
├── AGENTS.md ← Orientation for coding/judging agents
├── .env.example ← Environment template
└── requirements.txt
Additional first-class modules: `mcp_server/preflight.py` (dependency map),
`mcp_server/evidence_store.py` (stdlib SQLite evidence index), and the MCP tool modules
`tools/system_health.py`, `tools/investigation_state.py` (hypothesis ledger),
`tools/evidence_index.py` (`index_evidence` / `query_evidence`).
## MITRE ATT&CK Coverage
DeepSIFT's `mitre_auto_map.py` (80+ rules, 19 categories) tags findings at the tool layer:
| Finding | Technique |
|---|---|
| Process injection (PE header in RWX region) | T1055 — Process Injection |
| PowerShell encoding (`-enc`, `-e` flags) | T1059.001 — PowerShell |
| Registry run key modification | T1547.001 — Registry Run Keys |
| Active external network connection from suspicious process | T1071 — Application Layer Protocol |
| LSASS memory access | T1003.001 — LSASS Memory |
| DKOM-hidden process (pslist vs psscan gap) | T1014 — Rootkit |
| Service install (event 7045 / 4697) | T1543.003 — Windows Service |
| Scheduled task (event 4698 / 106) | T1053.005 — Scheduled Task |
| WMI event subscription (event 5860 / 5861) | T1546.003 — WMI Persistence |
| Lateral movement (RDP / SMB) | T1021.001 / T1021.002 |
| Executable in temp dir (shimcache) | T1036.005 — Match Legitimate Name |
| PowerShell script block (event 4104) | T1059.001 — PowerShell |
| Cloud storage upload (SRUM high bytes_sent) | T1567.002 — Exfiltration to Cloud Storage |
| Burst file deletion (USN Journal) | T1070 — Indicator Removal |
| Timestamp anomaly (MFT 0x10 vs 0x30) | T1070.006 — Timestomping |
| Browser visit to cloud exfil domain | T1567.002 — Exfiltration to Cloud Storage |
| DNS query subdomain length > 40 chars | T1048.003 — DNS Tunneling |
| Web shell URL pattern (cmd.php, shell.aspx) | T1505.003 — Web Shell |
| VBA AutoOpen / Shell / PowerShell call | T1566.001 — Spearphishing Attachment |
| DDE/DDEAUTO in Office document | T1559.002 — Dynamic Data Exchange |
| LD_PRELOAD in process environment | T1574.006 — LD_PRELOAD |
| Linux crontab persistence entry | T1053.003 — Cron |
| History file wiped (`.bash_history` → `/dev/null`) | T1070.003 — Clear Command History |
| Port scan (10+ unique ports from one host) | T1046 — Network Service Discovery |
| IRP hook in driver dispatch table | T1014 — Rootkit |
| Secure deletion tool in prefetch | T1070.004 — File Deletion |
| VSS shadow count = 0 | T1490 — Inhibit System Recovery |
| NTFS Alternate Data Stream | T1564.004 — Hide Artifacts: NTFS ADS |
| File extension / magic byte mismatch | T1036.007 — Masquerading |
| Remote access tool installed (AnyDesk, TeamViewer) | T1219 — Remote Access Software |
## Hard Rules (Architectural Enforcement)
These are not prompts — they are code:
1. **Read-only evidence** — `guard_output_path()` raises `PermissionError` for any write
attempt under `/cases/`, `/mnt/`, or `/media/`. No prompt override possible.
2. **No shell escape** — There is no `run_command` or `execute_shell` tool on the MCP
surface. The server exposes only the typed tools listed above; each runs a fixed binary via an
argv list (no shell), and `guard_command` additionally blocks destructive/exfiltration binaries
and shell-string commands on the Volatility / EZ Tools / registry exec paths.
3. **Bounded, evidence-driven budget** — `audit.py` counts every tool call; the agent runs until
the evidence is sufficient (configurable via `MAX_ITERATIONS`) and then calls `finish_analysis`.
The count is recorded in the report, so depth of analysis is transparent.
4. **Provenance-gated reporting** — `finish_analysis` requires a non-empty `audit_ids`
list. An empty list returns an error — fabricated findings structurally cannot be submitted.
5. **Observation/interpretation split** — `finish_analysis` takes separate `observation`
(factual, what tools showed) and `interpretation` (analytical, what it means) parameters.
This separation reduces hallucination by preventing blending of artifact data with inference.
## Environment Variables
Copy `.env.example` to `.env` and configure:
# SIFT tool commands (usually pre-configured on SIFT VM)
VOLATILITY_CMD=python3 -m volatility3
LOG2TIMELINE_CMD=log2timeline.py
PSORT_CMD=psort.py
FLS_CMD=fls
MMLS_CMD=mmls
ICAT_CMD=icat
YARA_CMD=yara
# EZ Tools directory (SIFT default)
EZ_TOOLS_DIR=/opt/zimmermantools
# Hayabusa event log analyzer (3,700+ Sigma rules)
HAYABUSA_CMD=hayabusa
# Optional — enables IP reputation lookups
ABUSEIPDB_API_KEY=your_key_here
VIRUSTOTAL_API_KEY=your_key_here
# Investigation constraints
MAX_TOOL_TIMEOUT=300
MAX_ITERATIONS=40
# MCP transport (stdio default; sse / streamable-http expose an HTTP endpoint for any client)
DEEPSIFT_MCP_TRANSPORT=stdio
# Optional: HMAC-sign the chain of custody (held off the evidence host) for forgery resistance
DEEPSIFT_AUDIT_KEY=
## Development
# Run tests (74 passing, 1 skipped)
pytest tests/ -v
# Syntax check
python -m py_compile mcp_server/tools/*.py mcp_server/parsers/*.py
# Seed the case-agnostic RAG knowledge base (MITRE + LOLBAS + Hunt Evil baseline)
python3 rag/ingest/run_all.py
# Optionally load a case's own IOCs for that investigation (per case, opt-in)
python3 rag/ingest/run_all.py --case-ioc-json analysis/findings.json
# (the bundled ROCBA example pack: --load-rocba)
## License
MIT License — see `LICENSE` file.
*DeepSIFT was built for the [Find Evil! hackathon](https://findevil.devpost.com/) hosted by SANS DFIR.*
标签:LLM代理, MCP, SANS SIFT, 人工智能, 数字取证, 用户模式Hook绕过, 自动化脚本, 逆向工具