diljotkaur05/SOC-THREAT-INTELLIGENCE-

GitHub: diljotkaur05/SOC-THREAT-INTELLIGENCE-

Stars: 0 | Forks: 0

# 🛡️ Log Anomaly Detector A cybersecurity tool that automatically parses log files, detects threats using Machine Learning and rule-based analysis, and presents findings in an interactive dashboard. ## 📁 Project Structure log-anomaly-detector/ ├── app.py ← Streamlit dashboard (run this) ├── requirements.txt ← Python dependencies ├── README.md ← This file └── src/ ├── parser/ │ ├── log_parser.py ← Parses 5 log formats using regex │ └── normalizer.py ← Cleans timestamps, validates IPs ├── features/ │ └── engineer.py ← Builds numeric feature vectors for ML ├── models/ │ ├── isolation_forest.py ← ML: detects global anomalies │ ├── lof.py ← ML: detects local anomalies │ └── rule_engine.py ← 10 hardcoded security rules ├── scorer/ │ └── risk_scorer.py ← Combines all signals → final severity └── report/ └── pdf_export.py ← Generates PDF security report ## 🚀 Quick Start ### 1. Install dependencies pip install -r requirements.txt ### 2. Run the dashboard streamlit run app.py ### 3. Open in browser http://localhost:8501 ### 4. Try the demo Click **"Generate & Analyze Demo Logs"** in the sidebar to see the tool in action with simulated attacks. ## 🔍 Supported Log Formats | Format | Example Source | Auto-Detected? | |---|---|---| | Apache / Nginx | `/var/log/apache2/access.log` | ✅ Yes | | Syslog | `/var/log/syslog` | ✅ Yes | | Auth.log | `/var/log/auth.log` | ✅ Yes | | Firewall | iptables kernel logs | ✅ Yes | | JSON | Application structured logs | ✅ Yes | ## 🚨 Threats Detected | # | Threat | Method | Severity | |---|---|---|---| | 1 | Brute Force Login | Rule Engine | CRITICAL | | 2 | SQL Injection | Rule Engine | CRITICAL | | 3 | Port Scanning | Rule Engine | HIGH | | 4 | Directory Traversal | Rule Engine | HIGH | | 5 | Scanner Tool (Nikto etc.) | Rule Engine | HIGH | | 6 | Root Login Attempt | Rule Engine | HIGH | | 7 | Firewall Storm | Rule Engine | HIGH | | 8 | Off-Hours Admin Access | Rule Engine | MEDIUM | | 9 | High Error Rate | Rule Engine | MEDIUM | | 10 | Sensitive Path Access | Rule Engine | MEDIUM | | 11 | Statistical Anomalies | Isolation Forest | Varies | | 12 | Local Outlier Behavior | LOF Model | Varies | ## 🧪 Running Tests # Test individual modules python src/parser/log_parser.py python src/parser/normalizer.py python src/features/engineer.py python src/models/isolation_forest.py python src/models/lof.py python src/models/rule_engine.py python src/scorer/risk_scorer.py # Run with PYTHONPATH set PYTHONPATH=. python src/report/pdf_export.py ## 🛠️ Technology Stack | Component | Technology | |---|---| | Language | Python 3.10+ | | Dashboard | Streamlit | | ML Models | scikit-learn | | Data Processing | pandas, numpy | | Charts | Plotly | | PDF Export | ReportLab | | Log Parsing | re (regex) | ## 📊 How the Pipeline Works Log File (any format) ↓ log_parser.py → structured dicts ↓ normalizer.py → clean timestamps, validated IPs ↓ engineer.py → numeric feature vectors ↓ ┌─────────────────────────────────────┐ │ isolation_forest.py (ML model 1) │ │ lof.py (ML model 2) │ │ rule_engine.py (10 rules) │ └─────────────────────────────────────┘ ↓ risk_scorer.py → final severity score per IP ↓ app.py → interactive Streamlit dashboard ↓ pdf_export.py → downloadable PDF report ## ⚙️ Configuration Tunable thresholds are at the top of each relevant file: **rule_engine.py** BRUTE_FORCE_THRESHOLD = 5 # failed logins before alert BRUTE_FORCE_WINDOW_SEC = 60 # time window in seconds PORT_SCAN_THRESHOLD = 20 # unique ports before alert ERROR_RATE_THRESHOLD = 0.80 # 80% error rate threshold **isolation_forest.py** DEFAULT_CONTAMINATION = 0.05 # expected anomaly fraction (5%) N_ESTIMATORS = 100 # number of isolation trees **risk_scorer.py** RULE_WEIGHTS = {"CRITICAL": 40, "HIGH": 30, "MEDIUM": 20} ML_WEIGHTS = {"CRITICAL": 25, "HIGH": 20, "MEDIUM": 10} SCORE_THRESHOLDS = {"CRITICAL": 76, "HIGH": 51, "MEDIUM": 26} ## 📄 License Academic / Educational Project — Cybersecurity 2026