Waqarahmd222/phishguard-ai

GitHub: Waqarahmd222/phishguard-ai

Stars: 0 | Forks: 0

# 🛡️ PhishGuard AI **AI-Powered Phishing Detection & Threat Intelligence Platform** [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776ab.svg)](https://python.org) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) [![Single File](https://img.shields.io/badge/Setup-Zero%20Config-22d3a7.svg)](#quick-start) A standalone phishing detection platform that analyzes URLs in real-time using multiple detection engines — URL heuristics, content analysis, ML-based scoring, threat intelligence lookups, and custom detection rules. Built as a single Python file with a full web dashboard. ![Dashboard](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/46845a30e6003458.png) ## Quick Start python phishguard.py That's it. Opens automatically at **http://localhost:5000**. No Docker, no database setup, no config files — everything is self-contained. ## How It Works Paste any URL into the scanner. PhishGuard runs it through **4 analysis engines simultaneously**: ### 1. URL Heuristic Analysis (15 checks) - **Suspicious TLD detection** — flags `.tk`, `.ml`, `.ga`, `.cf`, `.gq` and 15+ high-risk TLDs - **Brand impersonation** — catches typosquatting against 25+ brands (Google, PayPal, Microsoft, Amazon, etc.) with character substitution matching (`0→o`, `1→l`, `3→e`) - **Homograph attacks** — detects Punycode/IDN domains and non-ASCII Unicode characters used to spoof legitimate domains - **IP-based URLs** — flags raw IP addresses instead of domain names - **URL shortener detection** — identifies bit.ly, tinyurl, t.co and other shortener services - **Shannon entropy analysis** — measures randomness in URL paths to detect auto-generated phishing URLs - **Suspicious keyword detection** — flags `login`, `verify`, `secure`, `account`, `password`, `banking` and more - **@ sign redirect technique** — catches `https://google.com@evil.tk` style attacks - **Excessive subdomains** — flags 3+ levels of subdomains - **Missing HTTPS**, unusual ports, excessive hyphens, long URLs, hex encoding ### 2. Live Content Analysis Actually fetches the target webpage and inspects the HTML: - **Credential harvesting forms** — password fields submitting to external domains - **Hidden iframes** — zero-size or display:none iframes (tracking/exploit delivery) - **Suspicious JavaScript** — `eval()`, `atob()`, `document.cookie` access, auto-submit, redirects - **Meta refresh redirects** — auto-redirect to malicious destinations - **Right-click disabling** — anti-inspection technique used by phishing kits ### 3. ML-Based Scoring Weighted ensemble scoring that combines signals from all engines with correlation bonuses when multiple engines flag the same URL. ### 4. Threat Intelligence Cross-references against the local IOC (Indicator of Compromise) database. Every scan automatically extracts and stores new IOCs — domains, IPs — building an ever-growing threat knowledge base. ### Detection Rules Engine 8 pre-built YARA-style rules that automatically trigger during scans: | Rule | Severity | What It Catches | |------|----------|-----------------| | Suspicious TLD | HIGH | `.tk`, `.ml`, `.ga`, `.cf`, `.gq`, `.top`, `.xyz` domains | | Homograph Attack | CRITICAL | Unicode lookalike characters in domains | | Credential Harvesting | CRITICAL | Login forms submitting to external endpoints | | Brand Impersonation | HIGH | URLs mimicking Google, Microsoft, PayPal, Apple, Amazon | | IP-Based URL | MEDIUM | Raw IP addresses instead of domains | | URL Shortener Chain | MEDIUM | Multiple shortener redirects | | Excessive Subdomains | MEDIUM | 4+ subdomain levels | | Data Exfiltration JS | CRITICAL | JavaScript capturing/exfiltrating form data | Rules can be toggled on/off from the dashboard. Trigger counts are tracked per rule. ## Dashboard Features | Page | Description | |------|-------------| | **Dashboard** | Live stats (total scans, threats, IOCs, rule triggers), scan trend chart, verdict distribution pie chart, recent detections, top threat indicators | | **Scan** | URL input with real-time multi-engine analysis, detailed per-engine score breakdown, indicator list with severity levels, matched detection rules | | **IOC Database** | All tracked indicators with type, severity, source, hit count. Searchable. Auto-populated from scans + pre-loaded seed data | | **Rules** | 8 detection rules with toggle switches, descriptions, and trigger counters | | **Alerts** | Auto-generated when scans score ≥45. Severity-tagged with acknowledge workflow | ## Try These Test URLs # Safe — should score low https://www.google.com # Suspicious TLD + brand impersonation — should score high https://secure-paypal-verify.tk/login.php # IP-based URL — should flag http://185.220.101.42/banking/login # Multiple red flags — should score critical https://login.secure.verify.micr0soft-account.gq/auth?token=abc123 # URL shortener https://bit.ly/3xAbCdE # @ sign redirect attack https://google.com@evil.tk/phish ## Architecture phishguard.py (single file) ├── Auto-dependency installer ├── SQLite database (auto-created) │ ├── scans — scan history with full results │ ├── iocs — indicators of compromise │ ├── rules — detection rules + trigger counts │ └── alerts — auto-generated threat alerts ├── Analysis Engines │ ├── URL Heuristic Analyzer (15 checks) │ ├── Content Analyzer (HTML/JS inspection) │ ├── ML Scoring Engine (weighted ensemble) │ └── Threat Intel (IOC cross-reference) ├── Detection Rules Engine (8 YARA-style rules) ├── REST API (14 endpoints) │ ├── GET /api/stats │ ├── POST /api/scan │ ├── POST /api/scan/bulk │ ├── GET /api/scans │ ├── GET /api/iocs │ ├── POST /api/iocs │ ├── DELETE /api/iocs/:id │ ├── GET /api/rules │ ├── POST /api/rules/:id/toggle │ ├── GET /api/alerts │ ├── POST /api/alerts/:id/ack │ ├── GET /api/top-threats │ ├── GET /api/chart/trend │ └── GET /api/chart/verdicts └── Web Dashboard (embedded HTML/CSS/JS) ├── Dashboard — charts + stats ├── Scanner — URL analysis interface ├── IOC Database — searchable threat data ├── Rules Manager — toggle detection rules └── Alerts Feed — threat notifications ## Tech Stack | Component | Technology | |-----------|-----------| | Language | Python 3.10+ (stdlib + 2 packages) | | Web Server | `http.server` (stdlib) | | Database | SQLite3 (stdlib, auto-created) | | URL Parsing | `tldextract` | | HTML Analysis | `BeautifulSoup4` | | Frontend | Vanilla HTML/CSS/JS (embedded) | | Charts | Canvas API | | Fonts | DM Sans + DM Mono (Google Fonts) | ## API Usage The REST API is fully functional — you can integrate it with other tools: # Scan a URL curl -X POST http://localhost:5000/api/scan \ -H "Content-Type: application/json" \ -d '{"url": "https://suspicious-site.tk/login"}' # Bulk scan curl -X POST http://localhost:5000/api/scan/bulk \ -H "Content-Type: application/json" \ -d '{"urls": ["https://google.com", "https://evil.tk/phish"]}' # Get all IOCs curl http://localhost:5000/api/iocs # Search IOCs curl http://localhost:5000/api/iocs?q=paypal # Get dashboard stats curl http://localhost:5000/api/stats ## Requirements - Python 3.10 or higher - Internet connection (for fetching URL content and Google Fonts) - That's it ## License MIT License — see [LICENSE](LICENSE) for details.

PhishGuard AI — because every click counts.