diljotkaur05/SOC-THREAT-INTELLIGENCE-
GitHub: diljotkaur05/SOC-THREAT-INTELLIGENCE-
Stars: 0 | Forks: 0
# 🛡️ Log Anomaly Detector
A cybersecurity tool that automatically parses log files, detects threats using
Machine Learning and rule-based analysis, and presents findings in an interactive dashboard.
## 📁 Project Structure
log-anomaly-detector/
├── app.py ← Streamlit dashboard (run this)
├── requirements.txt ← Python dependencies
├── README.md ← This file
└── src/
├── parser/
│ ├── log_parser.py ← Parses 5 log formats using regex
│ └── normalizer.py ← Cleans timestamps, validates IPs
├── features/
│ └── engineer.py ← Builds numeric feature vectors for ML
├── models/
│ ├── isolation_forest.py ← ML: detects global anomalies
│ ├── lof.py ← ML: detects local anomalies
│ └── rule_engine.py ← 10 hardcoded security rules
├── scorer/
│ └── risk_scorer.py ← Combines all signals → final severity
└── report/
└── pdf_export.py ← Generates PDF security report
## 🚀 Quick Start
### 1. Install dependencies
pip install -r requirements.txt
### 2. Run the dashboard
streamlit run app.py
### 3. Open in browser
http://localhost:8501
### 4. Try the demo
Click **"Generate & Analyze Demo Logs"** in the sidebar to see the tool in action with simulated attacks.
## 🔍 Supported Log Formats
| Format | Example Source | Auto-Detected? |
|---|---|---|
| Apache / Nginx | `/var/log/apache2/access.log` | ✅ Yes |
| Syslog | `/var/log/syslog` | ✅ Yes |
| Auth.log | `/var/log/auth.log` | ✅ Yes |
| Firewall | iptables kernel logs | ✅ Yes |
| JSON | Application structured logs | ✅ Yes |
## 🚨 Threats Detected
| # | Threat | Method | Severity |
|---|---|---|---|
| 1 | Brute Force Login | Rule Engine | CRITICAL |
| 2 | SQL Injection | Rule Engine | CRITICAL |
| 3 | Port Scanning | Rule Engine | HIGH |
| 4 | Directory Traversal | Rule Engine | HIGH |
| 5 | Scanner Tool (Nikto etc.) | Rule Engine | HIGH |
| 6 | Root Login Attempt | Rule Engine | HIGH |
| 7 | Firewall Storm | Rule Engine | HIGH |
| 8 | Off-Hours Admin Access | Rule Engine | MEDIUM |
| 9 | High Error Rate | Rule Engine | MEDIUM |
| 10 | Sensitive Path Access | Rule Engine | MEDIUM |
| 11 | Statistical Anomalies | Isolation Forest | Varies |
| 12 | Local Outlier Behavior | LOF Model | Varies |
## 🧪 Running Tests
# Test individual modules
python src/parser/log_parser.py
python src/parser/normalizer.py
python src/features/engineer.py
python src/models/isolation_forest.py
python src/models/lof.py
python src/models/rule_engine.py
python src/scorer/risk_scorer.py
# Run with PYTHONPATH set
PYTHONPATH=. python src/report/pdf_export.py
## 🛠️ Technology Stack
| Component | Technology |
|---|---|
| Language | Python 3.10+ |
| Dashboard | Streamlit |
| ML Models | scikit-learn |
| Data Processing | pandas, numpy |
| Charts | Plotly |
| PDF Export | ReportLab |
| Log Parsing | re (regex) |
## 📊 How the Pipeline Works
Log File (any format)
↓
log_parser.py → structured dicts
↓
normalizer.py → clean timestamps, validated IPs
↓
engineer.py → numeric feature vectors
↓
┌─────────────────────────────────────┐
│ isolation_forest.py (ML model 1) │
│ lof.py (ML model 2) │
│ rule_engine.py (10 rules) │
└─────────────────────────────────────┘
↓
risk_scorer.py → final severity score per IP
↓
app.py → interactive Streamlit dashboard
↓
pdf_export.py → downloadable PDF report
## ⚙️ Configuration
Tunable thresholds are at the top of each relevant file:
**rule_engine.py**
BRUTE_FORCE_THRESHOLD = 5 # failed logins before alert
BRUTE_FORCE_WINDOW_SEC = 60 # time window in seconds
PORT_SCAN_THRESHOLD = 20 # unique ports before alert
ERROR_RATE_THRESHOLD = 0.80 # 80% error rate threshold
**isolation_forest.py**
DEFAULT_CONTAMINATION = 0.05 # expected anomaly fraction (5%)
N_ESTIMATORS = 100 # number of isolation trees
**risk_scorer.py**
RULE_WEIGHTS = {"CRITICAL": 40, "HIGH": 30, "MEDIUM": 20}
ML_WEIGHTS = {"CRITICAL": 25, "HIGH": 20, "MEDIUM": 10}
SCORE_THRESHOLDS = {"CRITICAL": 76, "HIGH": 51, "MEDIUM": 26}
## 📄 License
Academic / Educational Project — Cybersecurity 2026