nishan-dhakal/FORENSIX

GitHub: nishan-dhakal/FORENSIX

基于纯 Python 启发式规则引擎的钓鱼 URL 与邮件检测 CLI 工具，通过多特征加权评分输出风险等级与 JSON 取证报告。

Stars: 0 | Forks: 0

# 🛡️ AI 驱动的钓鱼检测系统 ![Python](https://img.shields.io/badge/Python-3.10+-blue?style=for-the-badge&logo=python) ![安全](https://img.shields.io/badge/Security-Cybersecurity-red?style=for-the-badge&logo=shield) ![许可证](https://img.shields.io/badge/License-MIT-green?style=for-the-badge) ![级别](https://img.shields.io/badge/Level-Master's%20Research-purple?style=for-the-badge) ![测试](https://img.shields.io/badge/Tests-Passing-brightgreen?style=for-the-badge) ## 📸 演示 ``` ██████╗ ██╗ ██╗██╗███████╗██╗ ██╗ ██████╗ ███████╗████████╗███████╗ ██████╗████████╗ ██╔══██╗██║ ██║██║██╔════╝██║ ██║ ██╔══██╗██╔════╝╚══██╔══╝██╔════╝██╔════╝╚══██╔══╝ ... ════════════════════════════════════════════════════════════════════════ ANALYSIS RESULT — URL ════════════════════════════════════════════════════════════════════════ Target: http://paypal-account-verify.tk/login?confirm=true Hash: a3f9c2d1... Risk Score: 87.5/100 [████████████████████████████████░░░░░░░░] Risk Level: 🔴 CRITICAL Confidence: 96.0% THREAT INDICATORS DETECTED ⚠ Suspicious TLD: .tk (+20) ⚠ Brand name used in subdomain (impersonation) (+30) ⚠ No HTTPS encryption (+10) ⚠ Multiple suspicious keywords (3) (+15) ⚠ Unusually long URL (52 chars) (+15) 🚨 PHISHING DETECTED — DO NOT CLICK / OPEN THIS ``` ## 🚀 功能特性 | 功能 | 描述 | |---|---| | 🔗 **URL 分析** | 域名熵、TLD 评分、品牌冒充、IP 检测 | | 📧 **电子邮件分析** | 请求头解析、发件人伪造、紧急语言检测 | | 🤖 **AI 评分引擎** | 加权多特征风险评分 (0-100) | | 📦 **批量扫描** | 从文件中分析数百个 URL | | 💾 **JSON 报告** | 每次扫描生成带有 SHA-256 哈希的取证报告 | | 🎨 **富文本 CLI** | 带有实时进度的彩色终端界面 | | 🧪 **完整测试套件** | 使用 pytest 编写的 20 多个单元测试 | ## 🔬 工作原理本工具实现了一个 **5 阶段的检测管道**： ``` Input URL/Email │ ▼ [1] Feature Extraction ├── URL parsing (domain, TLD, path, subdomain depth) ├── Entropy calculation (detects randomized domains) ├── Keyword scanning (paypal, verify, urgent, etc.) └── Brand impersonation check │ ▼ [2] Heuristic Scoring Engine ├── 15+ weighted risk rules ├── Score range: 0-100 └── Confidence estimation │ ▼ [3] Risk Classification ├── LOW (0-24) ├── MEDIUM (25-49) ├── HIGH (50-74) └── CRITICAL (75-100) │ ▼ [4] Indicator Reporting └── Human-readable explanation of each threat signal │ ▼ [5] JSON Report Generation └── SHA-256 hashed, timestamped forensic output ``` ### 风险评分规则 | 指标 | 权重 | |---|---| | IP 地址作为域名 | +25 | | 子域名中包含品牌名称 | +30 | | 可疑的 TLD (.tk, .ml, .xyz…) | +20 | | URL 中包含 @ 符号 | +20 | | 伪造发件人 (电子邮件) | +35 | | 检测到 URL 缩短服务 | +15 | | 高域名熵 | +12 | | 无 HTTPS | +10 | | URL 长度 > 100 个字符 | +15 | | 紧急语言 (电子邮件) | +25 | ## 🛠️ 安装说明 ``` # Clone 仓库 git clone https://github.com/YOUR_USERNAME/phishing-detector.git cd phishing-detector # 安装依赖 pip install -r requirements.txt # 运行工具 python main.py ``` **环境要求：** Python 3.10+，colorama，pytest ## 📖 使用方法 ### 交互模式 (推荐) ``` python main.py ``` ### 分析单个 URL ``` python main.py --url "http://paypal-verify.tk/login" ``` ### JSON 输出 (用于脚本/集成) ``` python main.py --url "http://suspicious-site.tk" --json ``` ### 批量扫描 (文件中每行一个 URL) ``` python main.py --batch data/sample_urls.txt ``` ### 运行演示样本 ``` python main.py --demo ``` ### 分析电子邮件 (交互模式 → 选项 2) 粘贴包含请求头的完整电子邮件： ``` From: security@paypa1-alert.com Subject: URGENT: Verify your account now! Your account will be suspended. Click: http://phish.tk/login END ``` ## 🧪 运行测试 ``` # 运行所有测试 pytest tests/ -v # Run with coverage pytest tests/ -v --tb=short ``` **测试覆盖率：** - URL 特征提取 (10 个测试) - 电子邮件分析 (5 个测试) - 特征提取器内部机制 (5 个测试) - 结果结构与边界 (4 个测试) ## 📁 项目结构 ``` phishing-detector/ ├── main.py # CLI entry point ├── src/ │ └── detector.py # Core detection engine ├── tests/ │ └── test_detector.py # Full test suite ├── data/ │ └── sample_urls.txt # Sample URLs for batch testing ├── reports/ # Auto-generated JSON reports ├── requirements.txt └── README.md ``` ## 🔭 未来增强计划 - [ ] 在 PhishTank 数据集上训练合适的机器学习模型 (Random Forest / XGBoost) - [ ] 通过 API 进行 WHOIS 域名年龄查询 - [ ] 集成 VirusTotal API - [ ] 实时浏览器扩展 (Chrome/Firefox) - [ ] Web 仪表板 (Flask/FastAPI) - [ ] 基于 BERT 的电子邮件正文分类器 - [ ] MITRE ATT&CK T1566 技术映射 ## 📚 研究参考资料 - APWG eCrime 研究数据集 - PhishTank 开放社区数据集 - MITRE ATT&CK：网络钓鱼 (T1566) - ISCX URL 2016 数据集 - 《使用机器学习技术进行钓鱼检测》— IEEE 2020 ## ⚠️ 免责声明 ## 📄 许可证 MIT 许可证 — 详见 [LICENSE](LICENSE)。 ## 👤 作者 **网络安全专业硕士生** 正在构建 GitHub 作品集 | LinkedIn: [Your Profile] *“安全不是产品，而是一个过程。” — Bruce Schneier*

标签：AI驱动, DAST, DeepSeek, DNS信息、DNS暴力破解, Flask, IOC提取, Python, SHA-256, URL分析, YARA规则, 元数据分析, 单元测试, 取证时间线, 威胁情报, 安全规则引擎, 开发者工具, 恶意软件分析, 搜索语句（dork）, 数字取证, 文件检查器, 无后门, 无服务器架构, 欺诈检测, 熵值分析, 电子邮件安全, 网络安全, 网络钓鱼检测, 自动化脚本, 逆向工具, 钓鱼防御, 隐私保护, 风险评分