siva404e/phishing-detector
GitHub: siva404e/phishing-detector
Stars: 0 | Forks: 0
# PhishGuard — URL Threat Intelligence Analyzer
A phishing detection and threat intelligence dashboard built with **Flask** and **Python**. Analyzes suspicious URLs using multi-layer detection: SSL/TLS validation, WHOIS domain age analysis, VirusTotal API integration, homograph spoofing detection, and risk-based scoring — all surfaced through an interactive web dashboard.
## Dashboard Preview
URL: http://paypa1-secure-login.tk/verify
┌─────────────────────────────────────────────┐
│ VERDICT: 🚨 CRITICAL │ SCORE: 87 / 100 │
├─────────────────────────────────────────────┤
│ WHOIS │ Domain 3 days old (+50) │
│ SSL │ Self-signed certificate (+20) │
│ VirusTotal│ 12 engines flagged (+50) │
│ Structure │ Suspicious TLD (.tk) (+20) │
│ │ Homograph char detected (+30) │
└─────────────────────────────────────────────┘
## Features
- **WHOIS Domain Age Analysis** — flags newly registered domains, a key phishing indicator
- **SSL/TLS Certificate Inspection** — validates issuer, expiry, and detects self-signed certificates
- **VirusTotal API Integration** — cross-references URLs against 70+ security engines
- **URL Structure Analysis** — detects IP-based URLs, suspicious TLDs (.tk, .ml, .xyz), excessive subdomains, and redirect tricks
- **Homograph / Unicode Spoofing Detection** — catches lookalike characters (е vs e, о vs o) used in domain impersonation
- **Keyword Analysis** — flags phishing-associated terms in the URL path
- **Risk Scoring Engine** — weighted scoring system producing SAFE / LOW / MODERATE / HIGH / CRITICAL verdicts
- **Scan History Dashboard** — tracks and displays last 50 scans with score bars and timestamps
- **CSV Export** — exports scan results for documentation and incident reporting
## MITRE ATT&CK Coverage
| Detection Technique | MITRE ID | Tactic |
|-----------------------------|------------|---------------------|
| Suspicious URL analysis | T1566.002 | Initial Access |
| Homograph domain spoofing | T1036.003 | Defense Evasion |
| Newly registered domain | T1583.001 | Resource Development|
| Invalid/self-signed SSL | T1566.002 | Initial Access |
| IP address in URL | T1036 | Defense Evasion |
## Tech Stack
| Component | Technology |
|------------------|-----------------------------------|
| Backend | Python 3.x, Flask |
| Threat Intel | VirusTotal API v3 |
| Domain Analysis | python-whois |
| SSL Inspection | Python ssl, socket (stdlib) |
| Frontend | HTML/CSS/JS (inline Flask template)|
| Config | python-dotenv (.env) |
## Setup & Installation
### 1. Clone the repository
git clone https://github.com/siva404e/phishing-detector.git
cd phishing-detector
### 2. Install dependencies
pip install -r requirements.txt
### 3. Configure API keys
cp .env.example .env
Edit `.env` and add your VirusTotal API key:
VIRUSTOTAL_API_KEY=your_api_key_here
Get a free API key at [virustotal.com](https://www.virustotal.com) — the free tier allows 4 requests/minute.
### 4. Run the dashboard
python dashboard.py
Open your browser at **http://127.0.0.1:5000**
## Project Structure
phishing-detector/
├── dashboard.py # Flask app — routes, analysis logic, HTML template
├── utils.py # Helper classes: URLValidator, RiskScorer, DomainAnalyzer, PatternDetector
├── config.py # Environment variable loading (API keys)
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── .gitignore # Excludes .env and sensitive files
└── LICENSE
## How the Risk Scoring Works
Each detection layer contributes a weighted score. Scores are capped at 100.
| Score Range | Verdict | Typical Indicators |
|-------------|--------------|--------------------------------------------------|
| 70 – 100 | 🚨 CRITICAL | Brand new domain + flagged by VirusTotal + no SSL|
| 45 – 69 | ⚠️ HIGH | Suspicious TLD + homograph + multiple keywords |
| 25 – 44 | 🔍 MODERATE | Young domain or expired cert |
| 10 – 24 | 🔎 LOW | Minor URL anomalies |
| 0 – 9 | ✅ SAFE | Passes all checks |
## Usage Examples
**Scan a known phishing pattern:**
Input: http://paypa1-secure-login.tk/verify/account
Result: CRITICAL (score: 87) — suspicious TLD, homograph 'l→1', HTTP only, 3-day-old domain
**Scan a legitimate site:**
Input: https://github.com
Result: SAFE (score: 2) — established domain, valid SSL, no threat indicators
## Limitations & Known Gaps
- VirusTotal free tier is rate-limited to 4 requests/minute; scans may take 15–30 seconds
- WHOIS data can be incomplete or unavailable for some TLDs
- Scan history resets on server restart (file-based persistence planned)
- Not a substitute for full sandbox analysis (e.g., ANY.RUN, Hybrid Analysis)
## Future Improvements
- [ ] Persistent scan history (JSON / SQLite)
- [ ] Bulk URL scanning from CSV input
- [ ] WHOIS registrant abuse contact lookup
- [ ] Integration with AbuseIPDB for IP reputation
- [ ] Dockerised deployment
## Author
**Sivamuthu Selvadurai M**
Cybersecurity enthusiast focused on SOC operations, threat intelligence, and blue team tooling.
## License
MIT License — see [LICENSE](LICENSE) for details.