amritanshuroy28/phishguard

GitHub: amritanshuroy28/phishguard

PhishGuard 是一个结合机器学习分类与威胁情报的网络钓鱼URL检测系统，通过Chrome扩展和分析师仪表盘提供实时防护与取证调查能力。

Stars: 1 | Forks: 0

# PhishGuard ## 概述 PhishGuard 是一个综合性的网络钓鱼检测系统，它将统计机器学习分类与主动的网络威胁情报 (CTI) 相结合。它作为一个实时的 Chrome 浏览器扩展运行，能够即时分析 URL 并提供结构化的威胁报告。 ### 主要功能 - **基于 ML 的检测**：基于 35 多个 URL 特征训练的 XGBoost 分类器 - **实时保护**：带有实时 URL 监控的 Chrome 扩展 - **威胁情报**：集成 VirusTotal 和 URLhaus - **取证调查**：DNS/WHOIS 查询、域名抢注 (typosquatting) 检测 - **分析师仪表盘**：基于 React 的 UI，提供可视化图表和 IoC 导出功能 ## 架构 ``` ┌─────────────────────────────────────────────────────────────────┐ │ PhishGuard Architecture │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Chrome │ │ React │ │ │ │ Extension │────────▶│ Dashboard │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ REST API │ REST API │ │ ▼ ▼ │ │ ┌─────────────────────────────────────────┐ │ │ │ FastAPI Backend │ │ │ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ │ │ │ │ │Feature │ │ ML │ │ CTI │ │ │ │ │ │Extractor│ │ Classifier│ │ Service │ │ │ │ │ └─────────┘ └──────────┘ └─────────┘ │ │ │ └─────────────────────────────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Local │ │ External │ │ │ │ Model │ │ APIs (VT, │ │ │ │ (XGBoost) │ │ URLhaus) │ │ │ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## 快速开始 ### 前置条件 - Python 3.9+ - Node.js 18+ - npm 或 yarn ### 1. ML 模型训练（可选 - 模型已预训练） ``` cd ml_pipeline pip install -r requirements.txt python train_model.py ``` ### 2. 启动后端 API ``` cd backend pip install -r requirements.txt # 设置环境变量（可选） export VIRUSTOTAL_API_KEY="your-api-key" # 启动服务器 uvicorn main:app --reload --host 0.0.0.0 --port 8000 ``` API 访问地址：http://localhost:8000 API 文档地址：http://localhost:8000/docs ### 3. 启动 React 仪表盘 ``` cd frontend npm install npm run dev ``` 仪表盘访问地址：http://localhost:5173 ### 4. 加载 Chrome 扩展 1. 打开 Chrome 并导航至 `chrome://extensions/` 2. 启用“开发者模式”（右上角的开关） 3. 点击“加载已解压的扩展程序” 4. 选择 `chrome_extension` 文件夹 ## 项目结构 ``` phishguard/ ├── ml_pipeline/ # ML training pipeline │ ├── features/ # Feature extraction code │ ├── train_model.py # Model training script │ └── dataset_loader.py # Data loading utilities ├── backend/ # FastAPI backend │ ├── services/ # ML and CTI services │ ├── routers/ # API endpoints │ ├── schemas/ # Pydantic models │ └── main.py # Application entry point ├── chrome_extension/ # Browser extension │ ├── background.js # Service worker │ ├── popup.html/js # Popup UI │ └── manifest.json # Extension manifest ├── frontend/ # React dashboard │ └── src/ │ ├── pages/ # Dashboard pages │ ├── components/ # Shared components │ └── utils/ # API client, helpers └── docs/ # Documentation ``` ## API 参考 ### POST /api/v1/analyze 分析单个 URL。 ``` curl -X POST http://localhost:8000/api/v1/analyze \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' ``` **响应：** ``` { "id": "uuid", "url": "https://example.com", "risk_level": "safe", "risk_score": 15.2, "is_malicious": false, "ml_confidence": 0.95, "threats": [], "ctis": [...], "processing_time_ms": 142.5 } ``` ### GET /api/v1/history 检索扫描历史记录。 ``` curl http://localhost:8000/api/v1/history?limit=50 ``` ### GET /api/v1/iocs/export 将 IoC 导出为 CSV 或 JSON 格式。 ``` curl -O http://localhost:8000/api/v1/iocs/export?format=csv ``` ## 配置 ### 环境变量 | 变量 | 默认值 | 描述 | |----------|---------|-------------| | `VIRUSTOTAL_API_KEY` | - | 用于威胁情报的 VirusTotal API 密钥 | | `VIRUSTOTAL_API_URL` | https://www.virustotal.com | VirusTotal API 端点 | | `DEBUG` | false | 启用调试日志记录 | | `HOST` | 0.0.0.0 | 服务器绑定主机 | | `PORT` | 8000 | 服务器端口 | ### 特征提取提取的特征包括： - URL 长度、路径深度、熵 - 混淆指标（十六进制编码、IP 地址） - 可疑的路径模式 - 域名抢注 (typosquatting) 检测（Levenshtein 距离） - 域名年龄指标 ## 测试 ### 后端 ``` cd backend pytest ``` ### Chrome 扩展 1. 打开 `chrome://extensions/` 2. 找到 PhishGuard 并点击“错误” 3. 或者，在后台页面打开 DevTools 控制台 ## 性能 - API 响应时间：< 500ms（目标） - 特征提取：每个 URL 约 5ms - ML 推理：< 10ms - CTI 查询：可配置的超时时间（默认为 2s） ## 部署 ### 后端（生产环境） ``` cd backend pip install -r requirements.txt gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 ``` ### 前端（生产环境） ``` cd frontend npm run build # 使用 nginx 或类似工具托管 dist/ ``` ## 致谢 - PhishTank 提供网络钓鱼 URL 数据 - URLhaus 提供恶意软件 URL 数据源 - Tranco 提供合法域名列表

标签：Apex, MITM代理, XGBoost, 威胁情报, 开发者工具, 机器学习, 逆向工具, 钓鱼检测