siva404e/IOC_AUTOMATION

GitHub: siva404e/IOC_AUTOMATION

Stars: 0 | Forks: 0

# IOC Threat Intelligence Aggregator ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg) ![Python: 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg) ![Status: Production Ready](https://img.shields.io/badge/Status-Production%20Ready-brightgreen.svg) **Automated IOC threat intelligence aggregator** that collects, normalizes, deduplicates, and exports indicators of compromise from **29 open-source threat feeds** into Sumo Logic SIEM-compatible STIX 2.1 CSV format for enterprise threat detection. **Aggregates from:** AlienVault OTX, PhishTank, ThreatFox, Feodo, Spamhaus, URLhaus, MalwareBazaar, Pulsedive, Hybrid Analysis, CAPE Sandbox, OpenPhish, FireHOL, Botvrij, ThreatView, C2IntelFeeds, C2Tracker, DataPlane, AbuseSSL, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, MISP CERT-FR, Blocklist.de, EmergingThreats, and more. ## 📊 SOC Context A threat intelligence analyst runs this aggregator **weekly** to pull fresh IOCs from 29 feeds and upload to Sumo Logic so L1 analysts get **automatic alerts** when those indicators appear in live logs. This closes the gap between threat feed discovery and SIEM detection, enabling rapid response to known malicious infrastructure. ## 🎯 Key Features | Feature | Details | |---------|---------| | **Parallel Fetching** | 29 threat intelligence sources with 10 concurrent threads for speed | | **Confidence Scoring** | Per-source trust scores (range 65–95) based on feed reputation | | **IOC Deduplication** | Automatic deduplication keeping highest-confidence threat type | | **5-Week Rolling Window** | Master CSV maintains rolling history; auto-archives oldest week | | **Timestamp Refresh** | Re-seen IOCs get validity extended so they don't expire in Sumo Logic | | **STIX 2.1 Export** | Sumo Logic-compatible 10-column CSV format, no header | | **Smart Splitting** | Split output files at 9,999 rows per file for SIEM compatibility | | **Auto-Archiving** | Week 6+ IOCs move to permanent Archive_IOC.csv | | **Cross-Platform** | Runs on Windows (Task Scheduler) and Linux (Cron) | | **Multi-IOC Type** | IP addresses, domains, URLs, MD5/SHA-1/SHA-256 hashes | ## 🗺️ MITRE ATT&CK Coverage This aggregator enables detection across multiple attack phases: | Threat Category | Sources | MITRE ID | Tactic | Use Case | |-----------------|---------|----------|--------|----------| | **C2 Infrastructure** | Feodo, C2IntelFeeds, C2Tracker | T1071 | Command & Control | Block outbound comms to known C2 servers | | **Phishing URLs** | PhishTank, OpenPhish, PhishingArmy | T1566.002 | Initial Access | Alert on phishing landing page visits | | **Malware Hashes** | MalwareBazaar, Hybrid Analysis, CAPE | T1204.002 | Execution | Detect malware execution by file hash | | **Botnet IPs** | Spamhaus, FireHOL, DataPlane | T1583.005 | Resource Development | Block traffic from botnet source IPs | | **Malware Domains** | URLhaus, Botvrij, ThreatView | T1566.002 | Initial Access | Block DNS requests to malware domains | ## 📦 Threat Intelligence Sources | # | Source | IOC Types | Threat Category | Confidence | API Key? | |---|--------|-----------|-----------------|------------|----------| | 1 | **Feodo** | IP | C2 | 95 | No | | 2 | **PhishTank** | URL / Domain | Phishing | 90 | No | | 3 | **C2IntelFeeds** | IP | C2 | 90 | No | | 4 | **OpenPhish** | URL / Domain | Phishing | 88 | No | | 5 | **MalwareBazaar** | SHA-256 Hash | Malware | 88 | No | | 6 | **ThreatFox** | IP / Domain / URL / Hash | Malware | 85 | No | | 7 | **URLhaus** | URL / Domain | Malware | 85 | No | | 8 | **AbuseSSL** | IP / Hash | Malware | 83 | No | | 9 | **MISP CERT-FR** | Hash | Malware | 83 | No | | 10 | **Spamhaus** | IP | Botnet | 80 | No | | 11 | **OTX (AlienVault)** | IP / Domain / URL / Hash | Malware | 70 | **Yes** | | 12 | **Pulsedive** | IP / Domain / URL | Malware | 65 | **Yes** | | 13 | **Hybrid Analysis** | Hash | Malware | 82 | **Yes** | | 14 | **CAPE Sandbox** | Hash | Malware | 80 | **Yes** | | 15 | **Botvrij** | IP / Domain / URL / Hash | Malware | 70 | No | **+ 14 more sources** including FireHOL, Blocklist.de, C2Tracker, DataPlane, ThreatView (5 feeds), Bazaar, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, Botvrij Hashes ## 🚀 Quick Start ### Prerequisites - **Python:** 3.10 or higher - **OS:** Windows 10+ or Ubuntu 20.04+ - **Internet:** Required (fetches from 29 external sources) - **Disk:** ~500 MB for master + archive + weekly CSVs ### Installation # 1. Clone the repository git clone https://github.com/siva404e/IOC_AUTOMATION.git cd IOC_AUTOMATION # 2. Install Python dependencies pip install -r requirements.txt # 3. Create config file from template cp config.ini.example config.ini # 4. Add your API keys (optional but recommended) # Edit config.ini and fill in: # - otx_api_key (AlienVault OTX) # - pulsedive_api_key (Pulsedive) # - hybrid_analysis_api_key (Hybrid Analysis) # - cape_api_token (CAPE Sandbox) # 5. Run the aggregator python final_ioc_weekly_split.py ### First Run Output 19:12:40 INFO Config loaded from : /IOC_Scripts/config.ini 19:12:40 INFO Output directory : /home/analyst/IOC_Output 19:12:40 INFO Starting IOC aggregator — 29 sources configured 19:12:47 INFO [+] ThreatFox fetched 19:12:50 INFO [+] PhishTank fetched (56294 rows) 19:12:52 WARNING [-] CAPE skipped — CAPE_API_TOKEN not set 19:13:28 INFO Total unique IOCs fetched: 438977 19:13:31 INFO Master updated — 343349 new | 95628 re-seen | 438977 total 19:13:34 INFO IP 39379 rows → 4 part file(s) 19:13:34 INFO Domain 273135 rows → 28 part file(s) 19:13:34 INFO URL 20912 rows → 3 part file(s) 19:13:34 INFO Hash 9923 rows → 1 part file(s) ## 📂 Project Structure IOC_AUTOMATION/ ├── final_ioc_weekly_split.py ← Main aggregator script (1200+ lines) ├── config.ini ← Your API keys (git ignored) ├── config.ini.example ← Configuration template ├── requirements.txt ← Python dependencies ├── README.md ← This file ├── SOP_IOC_ThreatIntelligence_Aggregator.md ← Complete operational guide ├── .gitignore ← Prevents API key leaks └── LICENSE ← MIT License ## 📊 Output Files The aggregator produces Sumo Logic-compatible STIX 2.1 CSV files: | File | Format | Updated | Purpose | |------|--------|---------|---------| | **Master_IOC.csv** | CSV (8 cols) | Every run | Rolling 5-week IOC history with WeekTag | | **Archive_IOC.csv** | CSV (8 cols) | Week 6+ | Permanent archive of evicted batches | | **IOC_Weekly_IP_PartN_.csv** | CSV (10 cols) | Every run | IPv4 addresses (max 9,999 rows/file) | | **IOC_Weekly_Domain_PartN_.csv** | CSV (10 cols) | Every run | Domain names (max 9,999 rows/file) | | **IOC_Weekly_URL_PartN_.csv** | CSV (10 cols) | Every run | Malware/phishing URLs (max 9,999 rows/file) | | **IOC_Weekly_Hash_PartN_.csv** | CSV (10 cols) | Every run | MD5/SHA-1/SHA-256 hashes (max 9,999 rows/file) | ### CSV Format (Sumo Logic STIX 2.1) id,indicator,type,source,validFrom,validUntil,confidence,threatType,actors,killChain 0001,192.0.2.1,ipv4-addr,Feodo,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,95,malicious-activity,,command-and-control 0002,evil.com,domain-name,PhishTank,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,90,malicious-activity,,delivery 0003,https://malware.xyz/pay.html,url,URLhaus,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,85,malicious-activity,,initial-access ## 🔄 5-Week Rolling Window Architecture Master file automatically maintains a rolling window: | Week | Master Contains | Archive Action | |------|-----------------|-----------------| | 1 | W01 | — | | 2 | W01–W02 | — | | 3 | W01–W03 | — | | 4 | W01–W04 | — | | 5 | W01–W05 | — | | 6 | W02–W06 | **W01 → Archive** | | 7 | W03–W07 | **W02 → Archive** | Each IOC gets a `WeekTag` (e.g., `2026-W11`) so eviction is deterministic. **Re-seen IOCs get timestamp refresh** to prevent expiration in Sumo Logic. ## ⏱️ Scheduled Execution ### Windows (Task Scheduler) Program: C:\Python310\python.exe Arguments: C:\IOC_Scripts\final_ioc_weekly_split.py Start in: C:\IOC_Scripts Trigger: Weekly, Monday 07:00 ### Linux (Cron) crontab -e # Add line: 0 7 * * 1 cd /path/to/IOC_Scripts && python3 final_ioc_weekly_split.py >> ~/IOC_Scripts/cron.log 2>&1 ## 📤 Uploading to Sumo Logic 1. Run the aggregator → generates CSV files in `IOC_Output/` 2. Log in to **Sumo Logic** → **Security** → **Threat Intelligence** 3. Click **Add Source** → **Manual Upload** → **CSV** 4. Upload each `IOC_Weekly__Part.csv` file 5. Sumo Logic auto-detects indicators and creates detection rules **Verify:** Row count in Sumo Logic matches your CSV file row count. ## 📖 Complete Documentation For detailed setup, troubleshooting, and operational procedures, see: **👉 [SOP_IOC_ThreatIntelligence_Aggregator.md](./SOP_IOC_ThreatIntelligence_Aggregator.md)** Covers: - System overview & architecture - Pre-requisites & dependencies - Initial setup procedure - Running & scheduling (Windows + Linux) - Monitoring & log interpretation - 29 threat source details - Master file rolling window - Sumo Logic upload steps - Troubleshooting guide ## ⚠️ Limitations - **Internet Dependent:** Requires connectivity to all 29 source URLs - **Rate Limits:** Free API tiers have rate limits; may skip sources if throttled - **Batch Only:** Weekly batch aggregation, not real-time feed updates - **Manual Upload:** Sumo Logic upload requires manual CSV import (can automate with API) - **API Keys:** OTX, Pulsedive, Hybrid Analysis, CAPE require free registration for full functionality ## 🔮 Future Improvements - [ ] **GitHub Actions** – Scheduled weekly runs with auto-upload - [ ] **AbuseIPDB Integration** – Add IP reputation scoring - [ ] **Sumo Logic API** – Automated upload via API (no manual CSV import) - [ ] **Slack Alerts** – Notify SOC on completion with summary stats - [ ] **Database Backend** – PostgreSQL for historical queries - [ ] **Web Dashboard** – Real-time feed status & IOC analytics - [ ] **Elasticsearch Export** – Alternative to Sumo Logic ## 🛠️ Troubleshooting ### Issue: `FileNotFoundError: config.ini not found` **Solution:** Copy `config.ini.example` to `config.ini` and place in same directory as script. ### Issue: `ModuleNotFoundError: No module named 'requests'` **Solution:** Run `pip install -r requirements.txt` ### Issue: `[-] failed: Connection error` **Solution:** Check internet connection. Source failures are non-fatal; other sources continue. ### Issue: `No IOCs fetched — aborting` **Solution:** Check internet connection and verify firewall allows HTTPS outbound. For more troubleshooting, see **[SOP_IOC_ThreatIntelligence_Aggregator.md#11-troubleshooting](./SOP_IOC_ThreatIntelligence_Aggregator.md#11-troubleshooting)** ## 📊 Typical Weekly Statistics Total IOCs fetched: 438,977 - New IOCs: 343,349 - Re-seen IOCs: 95,628 - Unique IOCs: 438,977 (after dedup) Breakdown by type: - IP addresses: 39,379 (4 part files) - Domains: 273,135 (28 part files) - URLs: 20,912 (3 part files) - Hashes: 9,923 (1 part file) Confidence distribution: - 95 (Critical): 12,450 (Feodo, C2Intel) - 85-90 (High): 187,234 (PhishTank, ThreatFox, URLhaus) - 70-82 (Medium): 239,293 (OTX, Hybrid Analysis, others) Processing time: ~2-3 minutes Archive size: ~2.5 GB (cumulative) ## 📝 License This project is licensed under the **MIT License** — see [LICENSE](./LICENSE) for details. ## 👤 Author **Sivamuthu Selvadurai M** - GitHub: [@siva404e](https://github.com/siva404e) - Repository: [IOC_AUTOMATION](https://github.com/siva404e/IOC_AUTOMATION) ## 🙏 Acknowledgments - **Threat Feeds:** AlienVault OTX, abuse.ch, MalwareBazaar, Spamhaus, PhishTank, and 23+ open-source feeds - **Libraries:** requests, pandas, beautifulsoup4, python-whois - **SIEM Integration:** Sumo Logic STIX 2.1 format compliance **Last Updated:** May 2026 **Version:** 1.0 **Status:** ✅ Production Ready