Vibhav-j/phishguard

GitHub: Vibhav-j/phishguard

基于 XGBoost 机器学习与实时威胁情报的钓鱼 URL 检测系统，通过浏览器扩展提供实时防护，专注于识别传统黑名单无法覆盖的新注册恶意域名。

Stars: 0 | Forks: 0

``` ██████╗ ██╗ ██╗██╗███████╗██╗ ██╗ ██████╗ ██╗ ██╗ █████╗ ██████╗ ██████╗ ██╔══██╗██║ ██║██║██╔════╝██║ ██║██╔════╝ ██║ ██║██╔══██╗██╔══██╗██╔══██╗ ██████╔╝███████║██║███████╗███████║██║ ███╗██║ ██║███████║██████╔╝██║ ██║ ██╔═══╝ ██╔══██║██║╚════██║██╔══██║██║ ██║██║ ██║██╔══██║██╔══██╗██║ ██║ ██║ ██║ ██║██║███████║██║ ██║╚██████╔╝╚██████╔╝██║ ██║██║ ██║██████╔╝ ╚═╝ ╚═╝ ╚═╝╚═╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ``` ### 基于机器学习和网络威胁情报的钓鱼 URL 检测 [![Python](https://img.shields.io/badge/Python-3.10+-blue?style=flat-square&logo=python)](https://python.org) [![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-green?style=flat-square&logo=fastapi)](https://fastapi.tiangolo.com) [![XGBoost](https://img.shields.io/badge/XGBoost-ML_Engine-orange?style=flat-square)](https://xgboost.readthedocs.io) [![React](https://img.shields.io/badge/React-Dashboard-61DAFB?style=flat-square&logo=react)](https://react.dev) [![License](https://img.shields.io/badge/License-MIT-purple?style=flat-square)](LICENSE) [![Status](https://img.shields.io/badge/Status-Active_Development-yellow?style=flat-square)]()

## 什么是 PhishGuard？钓鱼攻击通过使用尚未被标记的**新注册域名**来绕过传统的黑名单。PhishGuard 通过结合以下技术解决了这个问题： - 🤖 **统计机器学习分类** — 基于超过 20 个 URL 特征训练的 XGBoost - 🔍 **实时网络威胁情报** — 实时查询 VirusTotal 和 URLhaus - 🕵️ **DNS/WHOIS 取证** — 域名年龄、注册商、被动 DNS 分析 - 🔗 **域名仿冒检测** — 捕获外观相似的域名 - 🧩 **Chrome 扩展程序** — 在您浏览时提供实时保护 - 📊 **分析师仪表盘** — SOC 风格的扫描历史记录和威胁报告 ## 系统架构 ``` ┌─────────────────────────────────────────────────────────────┐ │ Chrome Extension │ │ (Intercepts URLs in real-time) │ └─────────────────────────┬───────────────────────────────────┘ │ POST /analyze ▼ ┌─────────────────────────────────────────────────────────────┐ │ FastAPI Backend │ │ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ │ │ │ Feature │ │ XGBoost ML │ │ Threat Intel │ │ │ │ Extractor │→ │ Classifier │ │ (VT + URLhaus) │ │ │ └─────────────┘ └──────────────┘ └───────────────────┘ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ WHOIS / DNS Forensics Engine │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────┬───────────────────────────────────┘ │ ┌───────────────┴───────────────┐ ▼ ▼ ┌──────────────────┐ ┌──────────────────────┐ │ Unified Threat │ │ React Analyst │ │ Response (JSON) │ │ Dashboard │ └──────────────────┘ └──────────────────────┘ ``` ## 功能特性 | 功能 | 描述 | |---|---| | 🎯 机器学习检测 | XGBoost，包含 20 多个 URL 特征，目标准确率 ≥95% | | ⚡ 实时响应 | 浏览器集成响应时间低于 500ms | | 🌐 实时 CTI | VirusTotal + URLhaus API 集成 | | 🔎 WHOIS 取证 | 提取域名年龄、注册商、国家/地区信息 | | 🛡️ Chrome 扩展 | 被动实时 URL 扫描 | | 📈 仪表盘 | 扫描历史、风险分布图表、IoC 导出 | | 📄 威胁报告 | 可下载的结构化 IoC 报告 | ## 技术栈 ``` ML Pipeline → Python, scikit-learn, XGBoost, Pandas, NumPy Backend API → Python, FastAPI, Uvicorn Threat Intel → VirusTotal API, URLhaus API Forensics → python-whois, dnspython, urllib Frontend UI → React, Tailwind CSS, Recharts Extension → JavaScript, HTML, CSS Datasets → PhishTank, URLhaus, Tranco Top 1M, ISCX URL 2016 ``` ## 数据集 | 数据集 | 用途 | |---|---| | [PhishTank](https://phishtank.org) | 经过验证的、每日更新的钓鱼 URL | | [URLhaus](https://urlhaus.abuse.ch) | 实时恶意 URL 数据库 (API) | | [Tranco Top 1M](https://tranco-list.eu) | 研究级合法域名 | | [ISCX URL 2016](https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset) | Kaggle 基准数据集 | ## 入门指南 ### 前置条件 ``` Python 3.10+ Node.js 18+ Git ``` ### 安装说明 ``` # Clone 这个 repo git clone https://github.com/yourusername/phishguard.git cd phishguard # 创建 virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 安装 dependencies pip install -r requirements.txt ``` ### 环境设置 ``` # 复制示例 env cp .env.example .env # 将你的 API keys 添加到 .env VIRUSTOTAL_API_KEY=your_key_here URLHAUS_API_KEY=your_key_here ``` ### 运行后端 ``` cd backend uvicorn main:app --reload # API 运行在 http://localhost:8000 # Docs 位于 http://localhost:8000/docs ``` ### 运行前端 ``` cd frontend npm install npm run dev # Dashboard 位于 http://localhost:5173 ``` ## API 参考 ### `GET /health` ``` { "status": "ok" } ``` ### `POST /analyze` **请求：** ``` { "url": "http://suspicious-site.com/login" } ``` **响应：** ``` { "url": "http://suspicious-site.com/login", "prediction": "phishing", "confidence": 0.94, "features": { "url_length": 38, "has_ip": false, "entropy": 3.8, "...": "..." }, "threat_intel": { "virustotal_score": "12/90 engines flagged", "urlhaus_status": "malicious" }, "whois": { "domain_age_days": 3, "registrar": "NameCheap", "country": "US" }, "risk_level": "HIGH" } ``` ## 路线图 - [x] 项目结构 + 环境搭建 - [x] 特征工程流水线 (20+ 个特征) - [ ] XGBoost 模型训练 + 评估 - [ ] 带有 `/analyze` 路由的 FastAPI 后端 - [ ] VirusTotal + URLhaus 集成 - [ ] WHOIS/DNS 取证模块 - [ ] React 分析师仪表盘 - [ ] Chrome 扩展程序 - [ ] 部署 (Render + HuggingFace Spaces) - [ ] 可导出的 IoC 报告 ## 项目结构 ``` phishguard/ ├── data/ │ ├── raw/ # Original datasets │ └── processed/ # Feature matrices ├── ml/ │ ├── feature_extractor.py │ ├── threat_intel.py │ ├── train.py │ └── models/ # Saved model files ├── backend/ │ └── main.py # FastAPI app ├── frontend/ # React dashboard ├── extension/ # Chrome extension ├── docs/ # Architecture, API spec, notes ├── requirements.txt ├── .env.example └── README.md ``` ## 贡献指南这是一个活跃的个人项目。如果您发现问题或有想法： 1. Fork 本仓库 2. 创建分支：`git checkout -b feat/your-feature` 3. 提交您的更改 4. 发起 Pull Request

由 IIT Jodhpur CSE 的 [Srivibhav](https://github.com/Vibhav-j) 用 🛡️ 构建

标签：Apex, API安全, Ask搜索, AV绕过, FastAPI, Go语言工具, JSON输出, Python, React, Sigma 规则, Syscalls, URLhaus, URL安全检测, VirusTotal, Web安全, WHOIS取证, XGBoost, 信标检测, 域名分析, 域名情报, 安全防护, 恶意URL识别, 数据可视化, 数据科学, 新注册恶意域名, 无后门, 机器学习, 欺诈防护, 浏览器插件, 网络威胁情报, 网络安全, 网络安全防御, 蓝队分析, 资源验证, 逆向工具, 钓鱼攻击检测, 隐私保护, 黑名单绕过