Ramanpreet21/phishguard
GitHub: Ramanpreet21/phishguard
Stars: 0 | Forks: 0
# 🛡️ Phishing Detection System
**Group 32 · Dept. of AI & Emerging Technologies**
Six-model ensemble (3 classical ML + 3 deep learning) served via FastAPI,
packaged in Docker, with a Chrome extension for real-time tab analysis.
## Architecture
┌─────────────────────────────────────────────────────────┐
│ Chrome Extension │
│ popup.html / popup.js → background.js (SW) │
└────────────────────┬────────────────────────────────────┘
│ POST /predict
┌────────────────────▼────────────────────────────────────┐
│ FastAPI (api.py) │
│ Latency middleware · Request/Prediction/Error logs │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ PhishingPredictor (predict.py) │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────┐ │
│ │ Structured ML │ │ Deep Learning │ │
│ │ (ARFF features) │ │ (URL char sequences) │ │
│ │ ── Random Forest │ │ ── LSTM (BiDir) │ │
│ │ ── XGBoost │ │ ── Character CNN │ │
│ │ ── SVM (RBF) │ │ ── Transformer encoder │ │
│ └──────────┬──────────┘ └────────────┬─────────────┘ │
│ └──────────────┬────────────┘ │
│ Weighted Fusion │
│ (F1-proportional) │
│ │ │
│ SHAP Explainability │
│ Top-N Feature Report │
└─────────────────────────────────────────────────────────┘
## Repo layout
phishing-detector/
├── train.py ← train all 6 models
├── predict.py ← inference engine (importable)
├── api.py ← FastAPI app
├── benchmark.py ← latency benchmark suite
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .gitignore / .dockerignore
│
├── src/
│ ├── features.py ← URL / WHOIS / DNS / SSL / HTML features
│ └── models/
│ ├── dl_models.py ← LSTM · CNN · Transformer (PyTorch)
│ └── artifacts/ ← saved .pkl / .pt (git-ignored)
│
├── data/ ← put your CSV + ARFF here (git-ignored)
│ ├── phishing_site_urls.csv
│ └── Training_Dataset.arff
│
├── logs/ ← JSONL request / prediction / error logs
│ ├── requests.jsonl
│ ├── predictions.jsonl
│ └── errors.jsonl
│
└── extension/ ← Chrome / Edge extension (MV3)
├── manifest.json
├── background.js
├── popup.html
├── popup.js
└── icons/
## Quick start
### 1 · Install dependencies
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
### 2 · Place datasets
data/phishing_site_urls.csv (549k URLs, columns: URL, Label)
data/Training_Dataset.arff (11k samples, 30 features + Result)
### 3 · Train all models
python train.py \
--csv data/phishing_site_urls.csv \
--arff data/Training_Dataset.arff \
--sample 50000 \
--epochs 10 \
--device cpu
Artifacts saved to `src/models/artifacts/`.
### 4 · Run the API
uvicorn api:app --host 0.0.0.0 --port 8000 --reload
### 5 · Test a prediction
curl -s -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"url":"http://login-paypal-verify.com/update?account=true"}' \
| python -m json.tool
### 6 · Run benchmarks
python benchmark.py --requests 200 --concurrency 8
## Docker
### Build & run
# Build
docker build -t phishing-detector:latest .
# Run
docker run -d -p 8000:8000 \
-v $(pwd)/src/models/artifacts:/app/src/models/artifacts:ro \
-v $(pwd)/logs:/app/logs \
--name phishing-api \
phishing-detector:latest
### With Compose
docker compose up -d
docker compose logs -f api
## API reference
| Method | Endpoint | Description |
|--------|------------|--------------------------------------|
| POST | `/predict` | Classify a URL (full ensemble) |
| GET | `/health` | Liveness check |
| GET | `/metrics` | Aggregate latency stats (last 1000) |
### `POST /predict` — request body
{
"url": "https://example.com",
"include_shap": true,
"fetch_html": false
}
### Response schema
{
"url": "...",
"label": "phishing | safe",
"is_phishing": true,
"confidence": 0.87,
"model_votes": {
"rf": {"label":"phishing","confidence":0.91},
"xgb": {"label":"phishing","confidence":0.85},
"svm": {"label":"phishing","confidence":0.79},
"lstm": {"label":"phishing","confidence":0.88},
"cnn": {"label":"phishing","confidence":0.86},
"transformer": {"label":"phishing","confidence":0.92}
},
"top_features": [
{"feature":"has_suspicious_words","value":1.0,"importance":0.23}
],
"shap_values": {"has_suspicious_words": 0.18, "...": "..."},
"metadata": {
"domain": "login-paypal-verify.com",
"domain_age_days": 12,
"ssl_valid": false,
"has_mx": false
},
"latency_ms": 14.3,
"request_id": "a1b2c3d4"
}
## Chrome extension
1. Open Chrome → `chrome://extensions`
2. Enable **Developer mode**
3. Click **Load unpacked** → select the `extension/` folder
4. Add icons to `extension/icons/` (icon16/48/128.png)
5. Change `API_BASE` in `background.js` to match your server
Live feed alternatives: [OpenPhish](https://openphish.com) · [PhishTank](https://phishtank.org)