siva404e/IOC_AUTOMATION
GitHub: siva404e/IOC_AUTOMATION
Stars: 0 | Forks: 0
# IOC Threat Intelligence Aggregator



**Automated IOC threat intelligence aggregator** that collects, normalizes, deduplicates, and exports indicators of compromise from **29 open-source threat feeds** into Sumo Logic SIEM-compatible STIX 2.1 CSV format for enterprise threat detection.
**Aggregates from:** AlienVault OTX, PhishTank, ThreatFox, Feodo, Spamhaus, URLhaus, MalwareBazaar, Pulsedive, Hybrid Analysis, CAPE Sandbox, OpenPhish, FireHOL, Botvrij, ThreatView, C2IntelFeeds, C2Tracker, DataPlane, AbuseSSL, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, MISP CERT-FR, Blocklist.de, EmergingThreats, and more.
## 📊 SOC Context
A threat intelligence analyst runs this aggregator **weekly** to pull fresh IOCs from 29 feeds and upload to Sumo Logic so L1 analysts get **automatic alerts** when those indicators appear in live logs. This closes the gap between threat feed discovery and SIEM detection, enabling rapid response to known malicious infrastructure.
## 🎯 Key Features
| Feature | Details |
|---------|---------|
| **Parallel Fetching** | 29 threat intelligence sources with 10 concurrent threads for speed |
| **Confidence Scoring** | Per-source trust scores (range 65–95) based on feed reputation |
| **IOC Deduplication** | Automatic deduplication keeping highest-confidence threat type |
| **5-Week Rolling Window** | Master CSV maintains rolling history; auto-archives oldest week |
| **Timestamp Refresh** | Re-seen IOCs get validity extended so they don't expire in Sumo Logic |
| **STIX 2.1 Export** | Sumo Logic-compatible 10-column CSV format, no header |
| **Smart Splitting** | Split output files at 9,999 rows per file for SIEM compatibility |
| **Auto-Archiving** | Week 6+ IOCs move to permanent Archive_IOC.csv |
| **Cross-Platform** | Runs on Windows (Task Scheduler) and Linux (Cron) |
| **Multi-IOC Type** | IP addresses, domains, URLs, MD5/SHA-1/SHA-256 hashes |
## 🗺️ MITRE ATT&CK Coverage
This aggregator enables detection across multiple attack phases:
| Threat Category | Sources | MITRE ID | Tactic | Use Case |
|-----------------|---------|----------|--------|----------|
| **C2 Infrastructure** | Feodo, C2IntelFeeds, C2Tracker | T1071 | Command & Control | Block outbound comms to known C2 servers |
| **Phishing URLs** | PhishTank, OpenPhish, PhishingArmy | T1566.002 | Initial Access | Alert on phishing landing page visits |
| **Malware Hashes** | MalwareBazaar, Hybrid Analysis, CAPE | T1204.002 | Execution | Detect malware execution by file hash |
| **Botnet IPs** | Spamhaus, FireHOL, DataPlane | T1583.005 | Resource Development | Block traffic from botnet source IPs |
| **Malware Domains** | URLhaus, Botvrij, ThreatView | T1566.002 | Initial Access | Block DNS requests to malware domains |
## 📦 Threat Intelligence Sources
| # | Source | IOC Types | Threat Category | Confidence | API Key? |
|---|--------|-----------|-----------------|------------|----------|
| 1 | **Feodo** | IP | C2 | 95 | No |
| 2 | **PhishTank** | URL / Domain | Phishing | 90 | No |
| 3 | **C2IntelFeeds** | IP | C2 | 90 | No |
| 4 | **OpenPhish** | URL / Domain | Phishing | 88 | No |
| 5 | **MalwareBazaar** | SHA-256 Hash | Malware | 88 | No |
| 6 | **ThreatFox** | IP / Domain / URL / Hash | Malware | 85 | No |
| 7 | **URLhaus** | URL / Domain | Malware | 85 | No |
| 8 | **AbuseSSL** | IP / Hash | Malware | 83 | No |
| 9 | **MISP CERT-FR** | Hash | Malware | 83 | No |
| 10 | **Spamhaus** | IP | Botnet | 80 | No |
| 11 | **OTX (AlienVault)** | IP / Domain / URL / Hash | Malware | 70 | **Yes** |
| 12 | **Pulsedive** | IP / Domain / URL | Malware | 65 | **Yes** |
| 13 | **Hybrid Analysis** | Hash | Malware | 82 | **Yes** |
| 14 | **CAPE Sandbox** | Hash | Malware | 80 | **Yes** |
| 15 | **Botvrij** | IP / Domain / URL / Hash | Malware | 70 | No |
**+ 14 more sources** including FireHOL, Blocklist.de, C2Tracker, DataPlane, ThreatView (5 feeds), Bazaar, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, Botvrij Hashes
## 🚀 Quick Start
### Prerequisites
- **Python:** 3.10 or higher
- **OS:** Windows 10+ or Ubuntu 20.04+
- **Internet:** Required (fetches from 29 external sources)
- **Disk:** ~500 MB for master + archive + weekly CSVs
### Installation
# 1. Clone the repository
git clone https://github.com/siva404e/IOC_AUTOMATION.git
cd IOC_AUTOMATION
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Create config file from template
cp config.ini.example config.ini
# 4. Add your API keys (optional but recommended)
# Edit config.ini and fill in:
# - otx_api_key (AlienVault OTX)
# - pulsedive_api_key (Pulsedive)
# - hybrid_analysis_api_key (Hybrid Analysis)
# - cape_api_token (CAPE Sandbox)
# 5. Run the aggregator
python final_ioc_weekly_split.py
### First Run Output
19:12:40 INFO Config loaded from : /IOC_Scripts/config.ini
19:12:40 INFO Output directory : /home/analyst/IOC_Output
19:12:40 INFO Starting IOC aggregator — 29 sources configured
19:12:47 INFO [+] ThreatFox fetched
19:12:50 INFO [+] PhishTank fetched (56294 rows)
19:12:52 WARNING [-] CAPE skipped — CAPE_API_TOKEN not set
19:13:28 INFO Total unique IOCs fetched: 438977
19:13:31 INFO Master updated — 343349 new | 95628 re-seen | 438977 total
19:13:34 INFO IP 39379 rows → 4 part file(s)
19:13:34 INFO Domain 273135 rows → 28 part file(s)
19:13:34 INFO URL 20912 rows → 3 part file(s)
19:13:34 INFO Hash 9923 rows → 1 part file(s)
## 📂 Project Structure
IOC_AUTOMATION/
├── final_ioc_weekly_split.py ← Main aggregator script (1200+ lines)
├── config.ini ← Your API keys (git ignored)
├── config.ini.example ← Configuration template
├── requirements.txt ← Python dependencies
├── README.md ← This file
├── SOP_IOC_ThreatIntelligence_Aggregator.md ← Complete operational guide
├── .gitignore ← Prevents API key leaks
└── LICENSE ← MIT License
## 📊 Output Files
The aggregator produces Sumo Logic-compatible STIX 2.1 CSV files:
| File | Format | Updated | Purpose |
|------|--------|---------|---------|
| **Master_IOC.csv** | CSV (8 cols) | Every run | Rolling 5-week IOC history with WeekTag |
| **Archive_IOC.csv** | CSV (8 cols) | Week 6+ | Permanent archive of evicted batches |
| **IOC_Weekly_IP_PartN_.csv** | CSV (10 cols) | Every run | IPv4 addresses (max 9,999 rows/file) |
| **IOC_Weekly_Domain_PartN_.csv** | CSV (10 cols) | Every run | Domain names (max 9,999 rows/file) |
| **IOC_Weekly_URL_PartN_.csv** | CSV (10 cols) | Every run | Malware/phishing URLs (max 9,999 rows/file) |
| **IOC_Weekly_Hash_PartN_.csv** | CSV (10 cols) | Every run | MD5/SHA-1/SHA-256 hashes (max 9,999 rows/file) |
### CSV Format (Sumo Logic STIX 2.1)
id,indicator,type,source,validFrom,validUntil,confidence,threatType,actors,killChain
0001,192.0.2.1,ipv4-addr,Feodo,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,95,malicious-activity,,command-and-control
0002,evil.com,domain-name,PhishTank,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,90,malicious-activity,,delivery
0003,https://malware.xyz/pay.html,url,URLhaus,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,85,malicious-activity,,initial-access
## 🔄 5-Week Rolling Window Architecture
Master file automatically maintains a rolling window:
| Week | Master Contains | Archive Action |
|------|-----------------|-----------------|
| 1 | W01 | — |
| 2 | W01–W02 | — |
| 3 | W01–W03 | — |
| 4 | W01–W04 | — |
| 5 | W01–W05 | — |
| 6 | W02–W06 | **W01 → Archive** |
| 7 | W03–W07 | **W02 → Archive** |
Each IOC gets a `WeekTag` (e.g., `2026-W11`) so eviction is deterministic. **Re-seen IOCs get timestamp refresh** to prevent expiration in Sumo Logic.
## ⏱️ Scheduled Execution
### Windows (Task Scheduler)
Program: C:\Python310\python.exe
Arguments: C:\IOC_Scripts\final_ioc_weekly_split.py
Start in: C:\IOC_Scripts
Trigger: Weekly, Monday 07:00
### Linux (Cron)
crontab -e
# Add line:
0 7 * * 1 cd /path/to/IOC_Scripts && python3 final_ioc_weekly_split.py >> ~/IOC_Scripts/cron.log 2>&1
## 📤 Uploading to Sumo Logic
1. Run the aggregator → generates CSV files in `IOC_Output/`
2. Log in to **Sumo Logic** → **Security** → **Threat Intelligence**
3. Click **Add Source** → **Manual Upload** → **CSV**
4. Upload each `IOC_Weekly__Part.csv` file
5. Sumo Logic auto-detects indicators and creates detection rules
**Verify:** Row count in Sumo Logic matches your CSV file row count.
## 📖 Complete Documentation
For detailed setup, troubleshooting, and operational procedures, see:
**👉 [SOP_IOC_ThreatIntelligence_Aggregator.md](./SOP_IOC_ThreatIntelligence_Aggregator.md)**
Covers:
- System overview & architecture
- Pre-requisites & dependencies
- Initial setup procedure
- Running & scheduling (Windows + Linux)
- Monitoring & log interpretation
- 29 threat source details
- Master file rolling window
- Sumo Logic upload steps
- Troubleshooting guide
## ⚠️ Limitations
- **Internet Dependent:** Requires connectivity to all 29 source URLs
- **Rate Limits:** Free API tiers have rate limits; may skip sources if throttled
- **Batch Only:** Weekly batch aggregation, not real-time feed updates
- **Manual Upload:** Sumo Logic upload requires manual CSV import (can automate with API)
- **API Keys:** OTX, Pulsedive, Hybrid Analysis, CAPE require free registration for full functionality
## 🔮 Future Improvements
- [ ] **GitHub Actions** – Scheduled weekly runs with auto-upload
- [ ] **AbuseIPDB Integration** – Add IP reputation scoring
- [ ] **Sumo Logic API** – Automated upload via API (no manual CSV import)
- [ ] **Slack Alerts** – Notify SOC on completion with summary stats
- [ ] **Database Backend** – PostgreSQL for historical queries
- [ ] **Web Dashboard** – Real-time feed status & IOC analytics
- [ ] **Elasticsearch Export** – Alternative to Sumo Logic
## 🛠️ Troubleshooting
### Issue: `FileNotFoundError: config.ini not found`
**Solution:** Copy `config.ini.example` to `config.ini` and place in same directory as script.
### Issue: `ModuleNotFoundError: No module named 'requests'`
**Solution:** Run `pip install -r requirements.txt`
### Issue: `[-] failed: Connection error`
**Solution:** Check internet connection. Source failures are non-fatal; other sources continue.
### Issue: `No IOCs fetched — aborting`
**Solution:** Check internet connection and verify firewall allows HTTPS outbound.
For more troubleshooting, see **[SOP_IOC_ThreatIntelligence_Aggregator.md#11-troubleshooting](./SOP_IOC_ThreatIntelligence_Aggregator.md#11-troubleshooting)**
## 📊 Typical Weekly Statistics
Total IOCs fetched: 438,977
- New IOCs: 343,349
- Re-seen IOCs: 95,628
- Unique IOCs: 438,977 (after dedup)
Breakdown by type:
- IP addresses: 39,379 (4 part files)
- Domains: 273,135 (28 part files)
- URLs: 20,912 (3 part files)
- Hashes: 9,923 (1 part file)
Confidence distribution:
- 95 (Critical): 12,450 (Feodo, C2Intel)
- 85-90 (High): 187,234 (PhishTank, ThreatFox, URLhaus)
- 70-82 (Medium): 239,293 (OTX, Hybrid Analysis, others)
Processing time: ~2-3 minutes
Archive size: ~2.5 GB (cumulative)
## 📝 License
This project is licensed under the **MIT License** — see [LICENSE](./LICENSE) for details.
## 👤 Author
**Sivamuthu Selvadurai M**
- GitHub: [@siva404e](https://github.com/siva404e)
- Repository: [IOC_AUTOMATION](https://github.com/siva404e/IOC_AUTOMATION)
## 🙏 Acknowledgments
- **Threat Feeds:** AlienVault OTX, abuse.ch, MalwareBazaar, Spamhaus, PhishTank, and 23+ open-source feeds
- **Libraries:** requests, pandas, beautifulsoup4, python-whois
- **SIEM Integration:** Sumo Logic STIX 2.1 format compliance
**Last Updated:** May 2026
**Version:** 1.0
**Status:** ✅ Production Ready