TIEESUN/Hokage-Intel

GitHub: TIEESUN/Hokage-Intel

Stars: 5 | Forks: 1

# Hokage Intel [![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-orange.svg)](https://www.gnu.org/licenses/agpl-3.0) [![Python](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/) [![Local-first](https://img.shields.io/badge/architecture-local--first-success.svg)](#) [![Database](https://img.shields.io/badge/database-SQLite-lightgrey.svg)](#) [![Tests](https://img.shields.io/badge/smoke_checks-137-brightgreen.svg)](scripts/smoke.py) 1 2 3 ## Why Hokage Intel? Commercial threat-intel platforms start at $30k/year and require an ops team. Open-source TIPs like MISP and OpenCTI work but demand Docker stacks, Elasticsearch, message queues, and dedicated databases. Hokage Intel takes the opposite approach: **everything runs in a single FastAPI process backed by SQLite, behind a brutalist red/black/white UI, with one batch file to launch it**. You bring an internet connection. The platform brings: - 30+ pre-wired threat-intel feeds with idempotent ingestion - A live ransomware victim geo-map (heatmap of compromised orgs by country) - An infostealer encyclopedia covering 24 baseline families, extensible via Malpedia/ThreatFox - A threat-actor encyclopedia built from the full MITRE ATT&CK group catalogue + Malpedia + Maltrail + OTX - A CVE encyclopedia with CISA KEV + NVD 2.0 enrichment, auto-linked to actors when evidence connects them - A unified C2 inventory aggregating ThreatFox (firehose), Feodo Tracker, stealer IOCs, and actor IOCs - An AI-driven campaign generator that proposes named campaigns by actor or country and attaches your local IOCs as evidence - 15 enrichment backends, most of which work without API keys - A Diamond Model view, Pivot Matrix, Pivots Tab, and Admiralty Code grading on every IOC - Telegram channel monitoring without phone numbers, API keys, or session strings ## Quick Start ### Windows git clone https://github.com/yourname/hokage-intel.git cd hokage-intel scripts\start.bat The batch script creates a virtualenv, installs requirements, runs migrations, and starts the server on `http://localhost:8000`. ### Linux / macOS git clone https://github.com/yourname/hokage-intel.git cd hokage-intel python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python -m hokage_intel First boot ingests RSS sources, seeds 24 infostealer families, sets up the SQLite database under `data/platform.db`, and starts the scheduler. Subsequent boots resume where you left off. ## What's in the box ### Pages (17 total) | Path | What it does | |---|---| | `/` | Dashboard — feed counts, IOC counts, C2 count, sources active, ransomware victims, campaigns, IOC severity mix, ATT&CK heatmap | | `/feeds` | RSS + Telegram feed browser, filterable by severity / source / date / IOC presence | | `/iocs` | Extracted IOC table with type/severity filters and Admiralty grades | | `/enrich` | Run an IOC through 15 enrichment backends with per-backend formatted results + Pivot Matrix + Pivots Tab | | `/cert` | Certificate Transparency real-time stream (CertStream) with keyword highlighting | | `/recon` | OSINT toolkit — WHOIS, DNS, GitHub leak templates, etc. | | `/actors` | Threat actor encyclopedia — MITRE + Malpedia + Maltrail + OTX importers | | `/actors/{id}` | Per-actor profile: Diamond Model, IOCs, aliases, sectors, references, **Linked CVEs panel** | | `/stealers` | Infostealer encyclopedia — 24 families with ThreatFox/Maltrail IOC ingestion | | `/stealers/{id}` | Per-family detail with IOC table | | `/c2` | Aggregated C2 inventory across ThreatFox firehose, Feodo, actor_iocs, stealer_iocs | | `/campaigns` | Campaign browser + **AI campaign generator** (by APT or by country) | | `/campaigns/{id}` | Per-campaign detail with attached IOCs | | `/cves` | **CVE Encyclopedia** — CISA KEV + NVD enrichment + auto-actor mapping | | `/extras` | Defang/refang, IOC extraction, hash generators | | `/sources` | Manage feed sources: add, retire, mark active | | `/settings` | API keys, AI provider config, abuse.ch Auth-Key, OTX key, VT key, etc. | ### Threat actor encyclopedia (`/actors`) Four ingestion paths that compound: 1. **MITRE ATT&CK** — full Groups catalogue (~150 actors) with aliases, descriptions, motivation, target sectors, TTPs, software used, external references 2. **Malpedia** — enriches existing actors with vendor descriptions + malware family attributions + reference links; inserts new actors not in MITRE 3. **Maltrail** — walks every `apt_*.txt` in the [stamparm/maltrail](https://github.com/stamparm/maltrail) repo. Matches each trail file to an existing actor by name/alias. **Auto-creates a skeleton actor profile** if no match exists, so the IOCs always have a home. Re-running MITRE/Malpedia later enriches the skeletons. 4. **OTX (AlienVault)** — pulls pulse IOCs per actor via SSE streaming with per-actor progress. Fair queue rotation (actors with fewest existing OTX IOCs first), 60s per-actor timeout, 10-actor batches. "Pull for ALL actors" iterates batches client-side until everyone is covered. Each actor page shows the **Diamond Model** (Adversary / Capability / Infrastructure / Victim) as a proper 2D rhombus with center event marker. ### CVE Encyclopedia (`/cves`) Three ingest sources + automatic actor mapping: | Source | Coverage | API key | |---|---|---| | **CISA KEV** | ~1300 Known-Exploited Vulnerabilities | None | | **NVD 2.0** | Description / CVSS / CPE for any CVE | Optional (10× faster with one) | | **Feed regex** | Scans your feed history for `CVE-YYYY-NNNN` IDs | None | **Auto-linking to threat actors** happens from four signals, each with its own confidence score: - `source='feed'` — feed item mentions both a known actor and a CVE-ID (confidence 35) - `source='cisa_kev'` — CISA's `shortDescription` or `notes` mentions an actor (confidence 60) - `source='nvd'` — NVD description names an actor (confidence 65) - `source='otx'` — OTX pulse tags both an adversary and a CVE-ID (confidence 70) Multiple evidence rows for the same (CVE, actor) pair are deduplicated in the UI — you see one entry with aggregated sources (`feed,otx,cisa_kev`) and an evidence count. Stats panel breaks down link counts by source so you can diagnose at a glance which import path produced what. ### C2 Inventory (`/c2`) Aggregates **four sources** into a unified, filterable host listing: - **ThreatFox firehose** — pulls all recent C2 IOCs across all malware families (not just the 24 seeded ones) via `get_iocs?days=N`. Persisted to `c2_inventory` table. - **Feodo Tracker** — full active-C2 blocklist (~200-500 banking trojan C2s), persisted to `c2_inventory`. - **stealer_iocs** — anything Maltrail/ThreatFox imported per stealer family - **actor_iocs** — anything MITRE/Malpedia/Maltrail/OTX imported per APT Same host:port across sources collapses into one row with all source pills shown. Filterable by malware family, country, source, hosting company. Each host links to one-click enrichment. ### Campaign Generator (AI-driven) On `/campaigns` click **⚡ Generate**: - **By APT name** — type "Lazarus", LLM proposes named campaigns (Contagious Interview, AppleJeus, Operation Dream Job, etc.) - **By country** — type "United States", LLM proposes campaigns targeting USA across actors Each proposal carries attribution, sectors, countries, date window, confidence rating, and citation URLs. The platform attaches local evidence: actor IOCs, family IOCs, feed mentions. You review the cards, uncheck what you don't want, and accept — committed proposals become real campaigns with IOCs linked. Provider-agnostic via litellm. Configure on `/settings → AI providers`: - Anthropic (Claude Haiku 4.5) - OpenAI (GPT-4o-mini) - Google Gemini (2.5 Flash — best free tier) - Groq (Llama 3.3 70B) - xAI (Grok 4) - OpenRouter (any model) - Ollama (local) - Custom (any OpenAI-compatible endpoint) ### Enrichment (`/enrich`) 15 backends auto-selected by IOC type. Cache-aware (won't re-hit APIs for IOCs enriched in the last 24h). Per-backend formatted output — no more raw JSON dumps: | Backend | Coverage | API key | |---|---|---| | VirusTotal | IP / domain / URL / hash | Required | | AbuseIPDB | IP | Required | | AlienVault OTX | IP / domain / URL / hash / CVE | Optional | | Shodan InternetDB | IP | None | | IPinfo Lite | IP | Optional | | IP Detective | IP | None | | crt.sh | Domain | None | | URLhaus | URL / domain / IP / hash | Optional | | URLscan | URL / domain / IP | Optional | | Malware Bazaar | Hash | Optional | | ThreatFox | All types | Optional (shared abuse.ch key) | | YARAify | Hash | Optional | | Feodo Tracker | IP (auto-bulk) | None | | Ransomware.live | Domain / org | None | | SSL Labs | Domain | None | Each backend's result card surfaces the actually-useful fields (VT detection ratio with traffic-light color, AbuseIPDB confidence + reports, Shodan ports + CVEs, OTX pulses + adversaries, etc.). Raw JSON is one collapsible click away. ### Pivot Matrix + Pivots Tab Two complementary panels appear after every enrichment: - **Pivot Matrix** — "What should I do next?" Personalized investigation steps based on what was found and what your platform already knows. Action buttons either run a fresh pivot or open the relevant page. - **Pivots Tab** — "What fingerprints and related artifacts did we extract?" Each row is a pivotable artifact: SAN hostnames from crt.sh, related domains from URLscan, companion hashes from Malware Bazaar, ASN from VirusTotal. "→ Pivot" opens a fresh enrichment in-platform; "↗ Open" follows up externally. Honest about scope: no JARM/JA4 (we don't run an internet scanner), no "1m hosts share this fingerprint" counts (we don't have Censys-tier data). What's not there is explained in a "What's not here" expander. ### NATO Admiralty Code grading Every IOC is graded on a 6×6 reliability × credibility scale per the [NATO Admiralty System](https://en.wikipedia.org/wiki/Admiralty_code). Source-type is the ceiling (MITRE = A, abuse.ch = B, OSINT = C, social = D), artifact-type is the baseline (CVE/SHA256 = highest, IP = D cap because IPs rotate). The displayed grade is the minimum of those two — so an IP from MITRE shows as D4 (D ceiling because IPs rotate fast) rather than A1. ### Telegram monitoring ## Architecture Single FastAPI process ├── SQLite DB (data/platform.db, ~600 KB after bootstrap) ├── APScheduler — runs ingestion jobs every N minutes ├── CertStream WebSocket — live cert log monitoring ├── 15 enrichment backends (lazy-loaded) └── 17 server-rendered Jinja2 pages No Docker. No Redis. No Postgres. No Elasticsearch. No microservices. No message queue. ### Key design decisions - **All ingestion is idempotent.** Every insert is `INSERT OR IGNORE` with SHA-256 composite primary keys. Re-running an importer merges new data instead of duplicating. - **Cache before fetch.** Enrichment results cache for 24h, GitHub directory listings cache for 7 days, NVD records cache forever once enriched. - **SSE for slow imports.** OTX, Maltrail, and CVE-enrichment all stream per-record progress via Server-Sent Events. No browser timeouts on multi-minute work. - **Fair queue rotation.** OTX import orders by `(never-attempted, fewest existing IOCs, oldest attempt timestamp)` so stuck actors don't starve fresh ones. - **Auto-create over fail-silent.** When Maltrail has a trail file for an APT we don't have, we create a skeleton actor with `source_dataset='maltrail'` so the IOCs have a home. Later MITRE/Malpedia runs enrich the skeleton. - **Fallback ladders for fragile APIs.** Maltrail file discovery: GitHub Contents API → Trees API → on-disk cache → stale cache → hardcoded seed list. Each tier's success is reported to the user so they know which path won. ## API key management All keys are stored in the local `api_keys` table (SQLite, not env vars). Save them on `/settings` → Enrichment keys. No restart required. | Setting | What it unlocks | |---|---| | `virustotal` | VT enrichment | | `abuseipdb` | AbuseIPDB enrichment | | `shodan` | Shodan host queries (Shodan InternetDB stays free, no key needed) | | `urlscan` | URLscan submissions + private scans | | `alienvault_otx` | OTX pulse enrichment + per-actor IOC import | | `ipinfo` | IPinfo Lite ASN/country | | `abusech` | Shared Auth-Key for ThreatFox + URLhaus + MalwareBazaar + YARAify | | `nvd` | 10× faster NVD enrichment (5 → 50 req per 30s) | | (AI provider) | Pick provider + model + paste key in the AI tab on `/settings` | Most features work without keys — Hokage Intel falls back to no-key sources (Shodan InternetDB, IP Detective, Feodo, crt.sh, ransomware.live, ssl_labs, CISA KEV, NVD free tier) gracefully. ## Data sources (30+ ingestion pipelines) **RSS feeds** (auto-fetched every 30 min): The Hacker News, Bleeping Computer, Krebs on Security, SANS ISC, Securelist, CrowdStrike Adversary Universe, Mandiant blog, Microsoft Threat Intelligence, Google Threat Intelligence, Recorded Future Insikt, Unit 42, Proofpoint, Trustwave SpiderLabs, Cisco Talos, Trend Micro, ESET WeLiveSecurity, Symantec, Sophos, Check Point, FortiGuard, Lookout, ReversingLabs, Intezer, nao_sec, HACKMAGEDDON, abuse.ch blog, GreyNoise, CISA Alerts, government CERT alerts. **Telegram channels** (web-scraped): BreachForums-adjacent, ransomware tracking, leak monitoring. **Threat-actor data**: MITRE ATT&CK, Malpedia, Maltrail (75+ APT files), AlienVault OTX. **Malware/C2/IOC data**: ThreatFox (per-family + firehose), Feodo Tracker, URLhaus, Malware Bazaar, YARAify, Maltrail (stealers + APTs). **Vulnerability data**: NVD 2.0, CISA Known-Exploited Vulnerabilities (KEV). **Ransomware**: ransomware.live (victim feed + geo map). **Certificate transparency**: CertStream WebSocket (live). ## Smoke testing python -m scripts.smoke Wipes the database, bootstraps fresh, then exercises every page, every API endpoint, every ingestion path, and every dedup/aggregation behavior. **137 checks** at last count covering: - All 17 pages render - `/ai` and `/alerts` correctly 404 (removed in this build) - All API endpoints return the right shape - C2 aggregation dedup (host:port across sources) - Maltrail auto-create + source_dataset tagging - Maltrail force-refresh kwarg threading - Maltrail SSE streaming endpoint - OTX SSE streaming + no-key error path - Pivots tab fingerprint extraction (cert SANs, related hosts, ASN, companion hashes) - AI campaign generator with mocked LLM (skipped if litellm not installed) - Tolerant JSON parser recovers truncated LLM responses (tested with mid-record cutoff) - Admiralty grading (IP baseline caps at D, hash baseline B, CVE A, source-type ceilings) - CVE encyclopedia: seed-from-feeds discovers actor-CVE pairs - CVE detail dedupes actors across evidence sources - `actor_id` filter dedupes CVEs across evidence sources - Stats reports unique pairs vs raw evidence rows + per-source breakdown - AI config CRUD across all 8 providers Returns exit code 0 on success, 1 on any failure. Used as the pre-tarball gate. ## Project layout hokage_intel/ ├── app.py FastAPI app + routes + middleware ├── c2.py C2 listing aggregator ├── c2_ingest.py ThreatFox bulk + Feodo persist ├── campaigns_generator.py LLM-driven campaign proposals ├── cves.py NVD + CISA KEV + CVE-actor linking ├── config.py Defaults ├── actors/ │ ├── mitre.py MITRE ATT&CK importer │ ├── malpedia.py Malpedia enrichment │ ├── maltrail.py Maltrail apt_*.txt walker w/ auto-create │ ├── otx.py AlienVault OTX SSE importer │ └── browse.py Actor list/detail queries ├── ai/ │ └── providers.py Provider-agnostic config (8 providers) ├── api/ All FastAPI routers ├── cert/ │ └── certstream.py CertStream WebSocket ├── db/ │ ├── schema.py All CREATE TABLE statements │ ├── connection.py Singleton conn + migrations │ └── bootstrap.py First-boot seeders ├── enrichment/ 15 backends, all subclass EnrichmentBackend ├── feeds/ RSS + Telegram fetchers ├── ransomware/ ransomware.live ingester ├── stealers/ │ ├── seed.py 24 baseline families │ ├── threatfox.py Per-family ThreatFox queries │ ├── maltrail.py Per-family Maltrail trail walker │ └── browse.py ├── utils/ │ ├── admiralty.py NATO grading │ ├── dynamic_pivots.py Pivot Matrix │ ├── pivots_tab.py Pivots Tab fingerprint extractor │ ├── iocs.py IOC regex + extraction + defang │ ├── http.py httpx wrapper w/ rate limiting + caching │ ├── keys.py api_keys table helpers │ └── timing.py ISO timestamps + sha256_id helper templates/ Jinja2 templates (17 pages + base + macros) static/ CSS, JS helpers (H namespace), SVG icons data/ SQLite DB + cached files (gitignored) scripts/ ├── start.bat Windows launcher ├── start.sh Unix launcher ├── smoke.py 137-check test suite └── ... ## Roadmap (loose, not committed) - More actor importers (Mandiant APT report PDFs, Recorded Future taxonomy) - STIX 2.1 export for sharing IOCs upstream - Sigma + YARA rule library, optionally synced from a Git repo - Encrypted SQLite via SQLCipher (replace `data/platform.db` with `.db.enc`) ## License AGPL-3.0. See `LICENSE`. If you run a modified version as a network service, you must publish the modified source per the AGPL. ## Acknowledgements This wouldn't exist without the work of: - The MITRE ATT&CK team, for the actor + technique catalogue - [Malpedia](https://malpedia.caad.fkie.fraunhofer.de/) (Fraunhofer FKIE) for malware family attribution - [Maltrail](https://github.com/stamparm/maltrail) (Miroslav Stampar) for the curated APT IOC trail files - [abuse.ch](https://abuse.ch/) for ThreatFox, URLhaus, Feodo Tracker, Malware Bazaar, YARAify — the entire abuse.ch family is the spine of community CTI - AlienVault OTX for pulse-based IOC sharing - [ransomware.live](https://www.ransomware.live/) for the victim feed - CISA for the Known-Exploited Vulnerabilities catalogue - NIST NVD for the CVE registry - Calidog Security for the CertStream relay