Waqarahmd222/phishguard-ai
GitHub: Waqarahmd222/phishguard-ai
Stars: 0 | Forks: 0
# 🛡️ PhishGuard AI
**AI-Powered Phishing Detection & Threat Intelligence Platform**
[](https://python.org)
[](LICENSE)
[](#quick-start)
A standalone phishing detection platform that analyzes URLs in real-time using multiple detection engines — URL heuristics, content analysis, ML-based scoring, threat intelligence lookups, and custom detection rules. Built as a single Python file with a full web dashboard.

## Quick Start
python phishguard.py
That's it. Opens automatically at **http://localhost:5000**. No Docker, no database setup, no config files — everything is self-contained.
## How It Works
Paste any URL into the scanner. PhishGuard runs it through **4 analysis engines simultaneously**:
### 1. URL Heuristic Analysis (15 checks)
- **Suspicious TLD detection** — flags `.tk`, `.ml`, `.ga`, `.cf`, `.gq` and 15+ high-risk TLDs
- **Brand impersonation** — catches typosquatting against 25+ brands (Google, PayPal, Microsoft, Amazon, etc.) with character substitution matching (`0→o`, `1→l`, `3→e`)
- **Homograph attacks** — detects Punycode/IDN domains and non-ASCII Unicode characters used to spoof legitimate domains
- **IP-based URLs** — flags raw IP addresses instead of domain names
- **URL shortener detection** — identifies bit.ly, tinyurl, t.co and other shortener services
- **Shannon entropy analysis** — measures randomness in URL paths to detect auto-generated phishing URLs
- **Suspicious keyword detection** — flags `login`, `verify`, `secure`, `account`, `password`, `banking` and more
- **@ sign redirect technique** — catches `https://google.com@evil.tk` style attacks
- **Excessive subdomains** — flags 3+ levels of subdomains
- **Missing HTTPS**, unusual ports, excessive hyphens, long URLs, hex encoding
### 2. Live Content Analysis
Actually fetches the target webpage and inspects the HTML:
- **Credential harvesting forms** — password fields submitting to external domains
- **Hidden iframes** — zero-size or display:none iframes (tracking/exploit delivery)
- **Suspicious JavaScript** — `eval()`, `atob()`, `document.cookie` access, auto-submit, redirects
- **Meta refresh redirects** — auto-redirect to malicious destinations
- **Right-click disabling** — anti-inspection technique used by phishing kits
### 3. ML-Based Scoring
Weighted ensemble scoring that combines signals from all engines with correlation bonuses when multiple engines flag the same URL.
### 4. Threat Intelligence
Cross-references against the local IOC (Indicator of Compromise) database. Every scan automatically extracts and stores new IOCs — domains, IPs — building an ever-growing threat knowledge base.
### Detection Rules Engine
8 pre-built YARA-style rules that automatically trigger during scans:
| Rule | Severity | What It Catches |
|------|----------|-----------------|
| Suspicious TLD | HIGH | `.tk`, `.ml`, `.ga`, `.cf`, `.gq`, `.top`, `.xyz` domains |
| Homograph Attack | CRITICAL | Unicode lookalike characters in domains |
| Credential Harvesting | CRITICAL | Login forms submitting to external endpoints |
| Brand Impersonation | HIGH | URLs mimicking Google, Microsoft, PayPal, Apple, Amazon |
| IP-Based URL | MEDIUM | Raw IP addresses instead of domains |
| URL Shortener Chain | MEDIUM | Multiple shortener redirects |
| Excessive Subdomains | MEDIUM | 4+ subdomain levels |
| Data Exfiltration JS | CRITICAL | JavaScript capturing/exfiltrating form data |
Rules can be toggled on/off from the dashboard. Trigger counts are tracked per rule.
## Dashboard Features
| Page | Description |
|------|-------------|
| **Dashboard** | Live stats (total scans, threats, IOCs, rule triggers), scan trend chart, verdict distribution pie chart, recent detections, top threat indicators |
| **Scan** | URL input with real-time multi-engine analysis, detailed per-engine score breakdown, indicator list with severity levels, matched detection rules |
| **IOC Database** | All tracked indicators with type, severity, source, hit count. Searchable. Auto-populated from scans + pre-loaded seed data |
| **Rules** | 8 detection rules with toggle switches, descriptions, and trigger counters |
| **Alerts** | Auto-generated when scans score ≥45. Severity-tagged with acknowledge workflow |
## Try These Test URLs
# Safe — should score low
https://www.google.com
# Suspicious TLD + brand impersonation — should score high
https://secure-paypal-verify.tk/login.php
# IP-based URL — should flag
http://185.220.101.42/banking/login
# Multiple red flags — should score critical
https://login.secure.verify.micr0soft-account.gq/auth?token=abc123
# URL shortener
https://bit.ly/3xAbCdE
# @ sign redirect attack
https://google.com@evil.tk/phish
## Architecture
phishguard.py (single file)
├── Auto-dependency installer
├── SQLite database (auto-created)
│ ├── scans — scan history with full results
│ ├── iocs — indicators of compromise
│ ├── rules — detection rules + trigger counts
│ └── alerts — auto-generated threat alerts
├── Analysis Engines
│ ├── URL Heuristic Analyzer (15 checks)
│ ├── Content Analyzer (HTML/JS inspection)
│ ├── ML Scoring Engine (weighted ensemble)
│ └── Threat Intel (IOC cross-reference)
├── Detection Rules Engine (8 YARA-style rules)
├── REST API (14 endpoints)
│ ├── GET /api/stats
│ ├── POST /api/scan
│ ├── POST /api/scan/bulk
│ ├── GET /api/scans
│ ├── GET /api/iocs
│ ├── POST /api/iocs
│ ├── DELETE /api/iocs/:id
│ ├── GET /api/rules
│ ├── POST /api/rules/:id/toggle
│ ├── GET /api/alerts
│ ├── POST /api/alerts/:id/ack
│ ├── GET /api/top-threats
│ ├── GET /api/chart/trend
│ └── GET /api/chart/verdicts
└── Web Dashboard (embedded HTML/CSS/JS)
├── Dashboard — charts + stats
├── Scanner — URL analysis interface
├── IOC Database — searchable threat data
├── Rules Manager — toggle detection rules
└── Alerts Feed — threat notifications
## Tech Stack
| Component | Technology |
|-----------|-----------|
| Language | Python 3.10+ (stdlib + 2 packages) |
| Web Server | `http.server` (stdlib) |
| Database | SQLite3 (stdlib, auto-created) |
| URL Parsing | `tldextract` |
| HTML Analysis | `BeautifulSoup4` |
| Frontend | Vanilla HTML/CSS/JS (embedded) |
| Charts | Canvas API |
| Fonts | DM Sans + DM Mono (Google Fonts) |
## API Usage
The REST API is fully functional — you can integrate it with other tools:
# Scan a URL
curl -X POST http://localhost:5000/api/scan \
-H "Content-Type: application/json" \
-d '{"url": "https://suspicious-site.tk/login"}'
# Bulk scan
curl -X POST http://localhost:5000/api/scan/bulk \
-H "Content-Type: application/json" \
-d '{"urls": ["https://google.com", "https://evil.tk/phish"]}'
# Get all IOCs
curl http://localhost:5000/api/iocs
# Search IOCs
curl http://localhost:5000/api/iocs?q=paypal
# Get dashboard stats
curl http://localhost:5000/api/stats
## Requirements
- Python 3.10 or higher
- Internet connection (for fetching URL content and Google Fonts)
- That's it
## License
MIT License — see [LICENSE](LICENSE) for details.
PhishGuard AI — because every click counts.