Ramanpreet21/Phishguard

GitHub: Ramanpreet21/Phishguard

结合经典机器学习与深度学习的六模型集成实时钓鱼 URL 检测系统，通过 Chrome 扩展在浏览器边缘提供亚秒级威胁识别。

Stars: 0 | Forks: 0

# 🛡️ 网络钓鱼检测系统 **人工智能与新兴技术系** 六模型集成（3 个经典 ML + 3 个深度学习）通过 FastAPI 提供服务，打包在 Docker 中，并附带一个用于实时标签页分析的 Chrome 扩展。 ## 架构 ``` ┌─────────────────────────────────────────────────────────┐ │ Chrome Extension │ │ popup.html / popup.js → background.js (SW) │ └────────────────────┬────────────────────────────────────┘ │ POST /predict ┌────────────────────▼────────────────────────────────────┐ │ FastAPI (api.py) │ │ Latency middleware · Request/Prediction/Error logs │ └────────────────────┬────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────────────────┐ │ PhishingPredictor (predict.py) │ │ │ │ ┌─────────────────────┐ ┌──────────────────────────┐ │ │ │ Structured ML │ │ Deep Learning │ │ │ │ (ARFF features) │ │ (URL char sequences) │ │ │ │ ── Random Forest │ │ ── LSTM (BiDir) │ │ │ │ ── XGBoost │ │ ── Character CNN │ │ │ │ ── SVM (RBF) │ │ ── Transformer encoder │ │ │ └──────────┬──────────┘ └────────────┬─────────────┘ │ │ └──────────────┬────────────┘ │ │ Weighted Fusion │ │ (F1-proportional) │ │ │ │ │ SHAP Explainability │ │ Top-N Feature Report │ └─────────────────────────────────────────────────────────┘ ``` ## 仓库结构 ``` phishing-detector/ ├── train.py ← train all 6 models ├── predict.py ← inference engine (importable) ├── api.py ← FastAPI app ├── benchmark.py ← latency benchmark suite ├── requirements.txt ├── Dockerfile ├── docker-compose.yml ├── .gitignore / .dockerignore │ ├── src/ │ ├── features.py ← URL / WHOIS / DNS / SSL / HTML features │ └── models/ │ ├── dl_models.py ← LSTM · CNN · Transformer (PyTorch) │ └── artifacts/ ← saved .pkl / .pt (git-ignored) │ ├── data/ ← put your CSV + ARFF here (git-ignored) │ ├── phishing_site_urls.csv │ └── Training_Dataset.arff │ ├── logs/ ← JSONL request / prediction / error logs │ ├── requests.jsonl │ ├── predictions.jsonl │ └── errors.jsonl │ └── extension/ ← Chrome / Edge extension (MV3) ├── manifest.json ├── background.js ├── popup.html ├── popup.js └── icons/ ``` ## 快速开始 ### 1 · 安装依赖 ``` python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt ``` ### 2 · 放置数据集 ``` data/phishing_site_urls.csv (549k URLs, columns: URL, Label) data/Training_Dataset.arff (11k samples, 30 features + Result) ``` ### 3 · 训练所有模型 ``` python train.py \ --csv data/phishing_site_urls.csv \ --arff data/Training_Dataset.arff \ --sample 50000 \ --epochs 10 \ --device cpu ``` 产物将保存到 `src/models/artifacts/`。 ### 4 · 运行 API ``` uvicorn api:app --host 0.0.0.0 --port 8000 --reload ``` ### 5 · 测试预测 ``` curl -s -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{"url":"http://login-paypal-verify.com/update?account=true"}' \ | python -m json.tool ``` ### 6 · 运行基准测试 ``` python benchmark.py --requests 200 --concurrency 8 ``` ## Docker ### 构建与运行 ``` # Build docker build -t phishing-detector:latest . # Run docker run -d -p 8000:8000 \ -v $(pwd)/src/models/artifacts:/app/src/models/artifacts:ro \ -v $(pwd)/logs:/app/logs \ --name phishing-api \ phishing-detector:latest ``` ### 使用 Compose ``` docker compose up -d docker compose logs -f api ``` ## API 参考 | 方法 | Endpoint | 描述 | |--------|------------|--------------------------------------| | POST | `/predict` | 对 URL 进行分类（完整集成模型） | | GET | `/health` | 存活状态检查 | | GET | `/metrics` | 汇总延迟统计（最近 1000 次） | ### `POST /predict` — 请求体 ``` { "url": "https://example.com", "include_shap": true, "fetch_html": false } ``` ### 响应结构 ``` { "url": "...", "label": "phishing | safe", "is_phishing": true, "confidence": 0.87, "model_votes": { "rf": {"label":"phishing","confidence":0.91}, "xgb": {"label":"phishing","confidence":0.85}, "svm": {"label":"phishing","confidence":0.79}, "lstm": {"label":"phishing","confidence":0.88}, "cnn": {"label":"phishing","confidence":0.86}, "transformer": {"label":"phishing","confidence":0.92} }, "top_features": [ {"feature":"has_suspicious_words","value":1.0,"importance":0.23} ], "shap_values": {"has_suspicious_words": 0.18, "...": "..."}, "metadata": { "domain": "login-paypal-verify.com", "domain_age_days": 12, "ssl_valid": false, "has_mx": false }, "latency_ms": 14.3, "request_id": "a1b2c3d4" } ``` response

## Chrome 扩展 1. 打开 Chrome → `chrome://extensions` 2. 启用 **开发者模式** 3. 点击 **加载已解压的扩展程序** → 选择 `extension/` 文件夹 4. 将图标添加到 `extension/icons/` (icon16/48/128.png) 5. 更改 `background.js` 中的 `API_BASE` 以匹配您的服务器 ## Chrome 扩展界面 phishguard_extention

实时数据源替代方案：[OpenPhish](https://openphish.com) · [PhishTank](https://phishtank.org)

标签：Apex, AV绕过, FastAPI, 凭据扫描, 威胁情报, 开发者工具, 机器学习, 深度学习, 请求拦截, 逆向工具, 钓鱼检测