FARLEY-PIEDRAHITA-OROZCO/phishguard-analyzer
GitHub: FARLEY-PIEDRAHITA-OROZCO/phishguard-analyzer
一个专业级邮件钓鱼分析平台,用于自动化检测和评估电子邮件中的钓鱼威胁。
Stars: 0 | Forks: 0
# PhishGuard 分析器





## 概述
PhishGuard 分析器是一个**专业级的网络安全平台**,用于分析可疑邮件、检测钓鱼企图并提供全面的威胁情报。它遵循企业架构原则构建,提供对邮件头、URL、DNS身份验证记录、发件人信誉、附件扫描以及多供应商威胁情报富集的深入分析。
### 核心能力
- **邮件头分析** — SPF、DKIM、DMARC验证,具备伪造检测和身份验证链分析功能
- **URL情报** — 提取、规范化并分析URL中的钓鱼指标(punycode、同形异义字、typosquatting、短链接)
- **DNS验证** — 通过 `aiodns` 进行完整的DNS记录检查(SPF、带选择器支持的DKIM、DMARC、MX)
- **发件人分析** — 已知/封锁发件人管理、域名信誉跟踪、冒充检测、typosquat检测
- **威胁评分** — 基于权重的风险模型,包含75+发现类型,可配置规则,严重性阈值(安全/低/中/高/严重)
- **威胁情报** — 模块化多供应商系统:VirusTotal、AbuseIPDB、AlienVault OTX、Spamhaus DNSBL、Phishtank、Google Safe Browsing、URLScan.io
- **附件分析** — 可疑扩展名检测、MIME类型不匹配分析
- **异步处理** — 基于Celery的后台任务执行,使用Redis作为消息代理
- **报告** — JSON导出,包含全面的IOC摘要和风险分析
- **审计日志** — 所有平台操作的完整审计跟踪
## 架构
```
┌──────────────────────────────────────────────────────────────────┐
│ Presentation Layer (Frontend) │
│ Next.js 14 · App Router · React 18 · TailwindCSS · Framer Motion│
│ Zustand (State) · TanStack Query (API) · Recharts (Charts) │
└───────────────────────────┬──────────────────────────────────────┘
│ REST API (JSON)
┌───────────────────────────▼──────────────────────────────────────┐
│ API Layer (FastAPI) │
│ /api/v1/analyses · /api/v1/senders · /health │
│ Pydantic v2 Validation · Rate Limiting · Security Headers │
└───────┬───────────────────────────────┬──────────────────────────┘
│ │
┌───────▼───────────────────┐ ┌───────▼──────────────────────────┐
│ Analysis Engine Layer │ │ Threat Intelligence Layer │
│ · HeaderAnalyzer │ │ · VirusTotal │
│ · URLAnalyzer │ │ · AbuseIPDB │
│ · DNSValidator (aiodns) │ │ · AlienVault OTX │
│ · SenderAnalyzer │ │ · Spamhaus DNSBL │
│ · AttachmentAnalyzer │ │ · Phishtank │
│ · ScoringEngine │ │ · Google Safe Browsing │
└───────┬───────────────────┘ │ · URLScan.io │
│ └───────┬──────────────────────────┘
│ │
┌───────▼──────────────────────────────▼──────────────────────────┐
│ Persistence Layer │
│ PostgreSQL 16 · SQLAlchemy 2.0 (async) · Alembic Migrations │
│ Redis 7 (Cache + Celery Broker) · Celery Workers │
└──────────────────────────────────────────────────────────────────┘
```
### 设计原则
- **整洁架构** — 依赖指向内部;领域逻辑被隔离
- **SOLID** — 单一职责、开闭原则、里氏替换、接口隔离、依赖倒置
- **安全设计** — 每一层都进行输入验证、速率限制、安全头设置、参数化查询
- **可测试性** — 模拟友好接口、SQLite内存测试数据库、全面的测试套件
- **原生异步** — 整个技术栈完全异步(FastAPI、SQLAlchemy、aiodns、aiohttp)
## 技术栈
| 层级 | 技术 |
|-------|-----------|
| **前端** | Next.js 14 (App Router)、React 18、TypeScript、TailwindCSS、Framer Motion、Zustand、TanStack Query、Recharts、Radix UI、Zod、Sonner |
| **后端** | Python 3.11+、FastAPI、Pydantic v2、SQLAlchemy 2.0 (异步)、Alembic、Celery、aiodns、aiohttp |
| **数据库** | PostgreSQL 16 |
| **缓存与队列** | Redis 7、Celery |
| **安全** | 输入验证、速率限制(基于IP)、安全头(CSP、HSTS、X-Frame-Options)、CORS、TrustedHost |
| **基础设施** | Docker、Docker Compose、Nginx(反向代理 + TLS)、GitHub Actions CI/CD |
## 快速开始
### 前置条件
- [Docker](https://docs.docker.com/get-docker/) & [Docker Compose](https://docs.docker.com/compose/install/)
- Python 3.11+ (用于本地开发)
- Node.js 20+ (用于本地开发)
### 使用 Docker(推荐)
```
# on in their original English form." "Repository" is a professional term in software development. In the context of Git or version control, it's often kept in English in Chinese texts, but it's also commonly translated. I think for consistency with the examples, where technical terms like "API", "Kubernetes" are kept in English, but verbs and common nouns are translated. For "Clone the repository", "Clone" can be translated as "克隆", and "repository" as "仓库". So, "克隆仓库".
git clone https://github.com/your-org/phishguard-analyzer.git
cd phishguard-analyzer
# But let's see other examples in the list. For instance, "Start PostgreSQL and Redis (Docker)" – here, "PostgreSQL", "Redis", "Docker" are tool names and should be kept in English. Similarly, "Run e2e tests (Playwright)" – "e2e" might be considered jargon, and "Playwright" is a framework name, so keep in English.
cp .env.example .env
# So, I'll proceed with translating the descriptive parts and keeping proper nouns and technical terms in English.
# Now, let's translate each line:
docker compose up -d
# 1. Clone the repository -> 克隆仓库
# 2. Copy environment configuration -> 复制环境配置
# 3. Edit .env with your API keys (optional but recommended) -> 编辑 .env 文件,添加您的 API 密钥(可选但推荐)
# - Here, ".env" is a file name, so keep it as is. "API keys" is a technical term, keep "API" in English, and "keys" can be translated as "密钥". The instruction says to keep technical jargon in English, so "API" should be in English, and "keys" might be translated. In the example, 'API Reference' -> 'API 参考', so "API" is kept, and "Reference" is translated. So, "API keys" -> "API 密钥".
```
### 本地开发
#### 后端
```
cd backend
# 4. Start all services -> 启动所有服务
python -m venv venv
# 5. Access the platform -> 访问平台
# 6. Frontend: http://localhost:3000 -> 前端:http://localhost:3000
# - "Frontend" is a term, but it can be translated as "前端". The URL is kept as is.
pip install -r requirements.txt
# 7. API: http://localhost:8000 -> API:http://localhost:8000
docker compose up -d postgres redis
# - "API" is kept in English, as per instruction.
alembic upgrade head
# 8. API Docs: http://localhost:8000/docs -> API 文档:http://localhost:8000/docs
uvicorn app.main:app --reload --port 8000
```
#### 前端
```
cd frontend
# - "API Docs" – "API" in English, "Docs" translated as "文档".
npm install
# 9. Create virtual environment -> 创建虚拟环境
npm run dev
# 10. Windows: .\venv\Scripts\activate -> Windows:.\venv\Scripts\activate
```
#### Celery Worker (可选,用于异步处理)
```
cd backend
celery -A app.core.celery_app worker --loglevel=info --concurrency=4
```
## 项目结构
```
phishguard-analyzer/
├── frontend/ # Next.js 14 application
│ ├── src/
│ │ ├── app/ # App Router pages
│ │ │ ├── dashboard/ # Dashboard, stats, history
│ │ │ ├── analysis/[id]/ # Analysis detail view
│ │ │ ├── analysis/upload/ # File upload page
│ │ │ ├── analysis/headers/ # Raw headers paste page
│ │ │ ├── analysis/urls/ # URL analysis page
│ │ │ ├── senders/ # Sender management
│ │ │ └── (auth)/ # Auth route group (login, register stubs)
│ │ ├── components/
│ │ │ ├── layout/ # Sidebar, Header, Providers
│ │ │ ├── analysis/ # UploadForm, HeadersForm
│ │ │ ├── dashboard/ # QuickStats, RiskScoreCard, FindingsList,
│ │ │ │ # DnsPanel, UrlList, ThreatIntelPanel,
│ │ │ │ # HelpHub, ProviderCard, RemediationPlan
│ │ │ ├── ui/ # Badge, Button, Card, Dialog, Progress,
│ │ │ │ # Skeleton, Toast, Tooltip
│ │ │ ├── charts/ # (empty placeholder)
│ │ │ ├── common/ # (empty placeholder)
│ │ │ └── reports/ # (empty placeholder)
│ │ ├── hooks/ # useAnalysis (TanStack Query)
│ │ ├── lib/ # ApiClient, utils
│ │ ├── store/ # Zustand analysis store
│ │ ├── types/ # TypeScript types (analysis, sender)
│ │ ├── styles/ # Tailwind globals, dark theme, animations
│ │ ├── modules/ # (empty stubs for future feature modules)
│ │ ├── services/ # (empty placeholder)
│ │ └── tests/ # (empty stubs: e2e, integration, unit)
│ ├── tailwind.config.ts # Custom theme, threat colors, animations
│ ├── next.config.mjs # API rewrites, security headers
│ └── package.json
│
├── backend/ # FastAPI application
│ ├── app/
│ │ ├── main.py # FastAPI entry point with lifespan
│ │ ├── api/
│ │ │ ├── v1/
│ │ │ │ ├── router.py # API router aggregation
│ │ │ │ └── endpoints/
│ │ │ │ ├── analysis.py # Analysis CRUD + report download
│ │ │ │ ├── senders.py # Sender management endpoints
│ │ │ │ └── findings.py # Finding feedback + provider stats
│ │ │ └── deps/ # Dependencies (DB, rate limiting)
│ │ ├── analyzers/
│ │ │ ├── header/ # HeaderAnalyzer (SPF/DKIM/DMARC)
│ │ │ ├── url/ # URLAnalyzer (punycode, homoglyph, typosquat)
│ │ │ ├── dns/ # DNSValidator (SPF, DKIM, DMARC, MX)
│ │ │ ├── sender/ # SenderAnalyzer (reputation, impersonation)
│ │ │ └── attachment/ # AttachmentAnalyzer (extensions, MIME)
│ │ ├── scoring/ # ScoringEngine (75+ weighted finding types)
│ │ ├── services/ # AnalysisService, SenderService,
│ │ │ # ReportService, ThreatIntelService
│ │ ├── models/ # SQLAlchemy models (analysis, sender, base)
│ │ ├── schemas/ # Pydantic v2 schemas (analysis, sender, api)
│ │ ├── parsers/ # EmailParser (.eml raw content parsing)
│ │ ├── middleware/ # SecurityHeaders, ExceptionHandler
│ │ ├── security/ # InputValidator, InMemoryRateLimiter
│ │ ├── core/ # Config, Database, Redis, Celery,
│ │ │ # Logging, Exceptions
│ │ ├── workers/tasks/ # Celery async tasks (analysis, threat intel)
│ │ └── tests/ # Unit + integration tests (+ conftest, fixtures)
│ ├── migrations/ # Alembic database migrations
│ ├── Dockerfile / Dockerfile.celery # Multi-stage Docker builds
│ ├── pytest.ini # Pytest async loop scope config
│ └── requirements.txt
│
├── infrastructure/
│ ├── nginx/ # Nginx reverse proxy + TLS config
│ │ ├── nginx.conf # Gzip, logging, timeouts
│ │ └── sites/default.conf # HTTP→HTTPS redirect, API proxy
│ ├── monitoring/ # (empty placeholder)
│ └── scripts/ # (empty placeholder)
├── docker/ # (empty placeholder dirs: backend, frontend, nginx)
├── docker-compose.yml # 6-service orchestration
├── .github/
│ ├── workflows/
│ │ ├── ci.yml # Lint, test, build on push/PR
│ │ └── deploy.yml # Docker build+push + SSH deploy on tag
│ ├── ISSUE_TEMPLATE/ # Bug report + feature request templates
│ └── pull_request_template.md
├── docs/
│ ├── architecture/overview.md # 6-layer architecture, data flow, decisions
│ ├── api/endpoints.md # Complete API reference (719 lines)
│ ├── deployment/guide.md # Docker/manual deployment, SSL, CI/CD, scaling
│ └── security/ # (empty placeholder)
├── scripts/ # (empty placeholder)
├── .env / .env.example # Environment configuration
├── .pre-commit-config.yaml # Black, Ruff, isort, mypy, Prettier
├── .dockerignore / .gitignore
├── *.eml / *.pdf # Sample email files (root)
├── CHANGELOG.md
├── CONTRIBUTING.md
├── SECURITY.md
└── README.md
```
## API 文档
### 基础 URL
- **开发环境:** `http://localhost:8000/api/v1`
- **生产环境:** `https://your-domain.com/api/v1`
- **交互式文档:** `http://localhost:8000/docs` (Swagger UI,仅限开发环境)
### 系统端点
| 方法 | 端点 | 描述 |
|--------|----------|-------------|
| `GET` | `/api/v1/health` | 健康检查 (状态、版本、正常运行时间、环境) |
| `GET` | `/api/v1/` | 根信息 (应用名、版本、文档链接) |
### 分析端点 (`/api/v1/analyses`)
| 方法 | 路径 | 描述 |
|--------|------|-------------|
| `POST` | `/upload` | 上传 .eml 或 .txt 文件进行完整分析 |
| `POST` | `/headers` | 粘贴原始邮件头进行分析 |
| `POST` | `/text` | 分析纯文本内容中的可疑URL |
| `GET` | `/{analysis_id}` | 获取包含所有发现项的完整分析结果 |
| `GET` | `/` | 列出分析记录 (分页,可按状态过滤) |
| `DELETE` | `/{analysis_id}` | 软删除一个分析记录 |
### 发件人端点 (`/api/v1/senders`)
| 方法 | 路径 | 描述 |
|--------|------|-------------|
| `GET` | `/reputation/{email}` | 检查发件人信誉 (已知、已封锁、域名、历史) |
| `POST` | `/known` | 添加受信任/已知的发件人 |
| `GET` | `/known` | 列出已知发件人 (分页,可搜索) |
| `DELETE` | `/known/{sender_id}` | 移除一个已知发件人 |
| `POST` | `/known/{sender_id}/confirm` | 确认/验证一个已知发件人 |
| `POST` | `/blocked` | 封锁一个发件人 (精确、域名或通配符模式) |
| `GET` | `/blocked` | 列出已封锁的发件人 (分页,可过滤) |
| `DELETE` | `/blocked/{sender_id}` | 停用一个已封锁的发件人 |
### 发现项端点 (`/api/v1/findings`)
| 方法 | 路径 | 描述 |
|--------|------|-------------|
| `POST` | `/{finding_id}/feedback` | 提交关于某个发现项的正面/负面反馈 |
| `GET` | `/feedback/provider-stats` | 基于反馈获取供应商可靠性统计 |
### 响应格式
所有端点返回一致的信封格式:
```
{
"success": true,
"data": { ... },
"error": null,
"error_code": null
}
```
分页端点返回:
```
{
"success": true,
"data": [ ... ],
"total": 42,
"page": 1,
"page_size": 20,
"total_pages": 3
}
```
### 分析响应示例
```
{
"success": true,
"data": {
"id": "a1b2c3d4-...",
"status": "completed",
"source_type": "eml",
"original_filename": "phishing.eml",
"risk_score": 85.5,
"risk_level": "high",
"confidence_score": 0.88,
"summary": { "risk_score": 85.5, "risk_level": "high", ... },
"findings": [
{
"id": "...",
"category": "header",
"finding_type": "dmarc_fail",
"severity": "high",
"title": "DMARC Authentication Failed",
"description": "Email failed DMARC authentication check",
"recommendation": "...",
"score_impact": 30.0,
"is_ioc": false
}
],
"urls": [ ... ],
"dns_validations": [ ... ],
"threat_intel": [ ... ],
"email_headers": [ ... ],
"created_at": "2026-05-11T12:00:00Z",
"completed_at": "2026-05-11T12:00:01Z",
"analysis_duration_ms": 1234
}
}
```
## 分析流水线
当提交一封邮件时,平台会运行一个全面的分析流水线:
```
Email/Pasted Headers/Text
│
▼
┌─────────────────┐
│ EmailParser │──► Extract headers, body, attachments, URLs
└────────┬────────┘
│
▼
┌──────────────────┐
│ HeaderAnalyzer │──► SPF/DKIM/DMARC results, forgery detection,
│ │ Reply-To / Return-Path mismatch, Received chain
└────────┬─────────┘
│
▼
┌──────────────────┐
│ URLAnalyzer │──► Punycode, homoglyph, typosquatting detection,
│ │ shortened URL expansion, suspicious TLDs, IP URLs
└────────┬─────────┘
│
▼
┌──────────────────┐
│ SenderAnalyzer │──► Known/blocked sender check, domain reputation,
│ │ impersonation detection, typosquat detection
└────────┬─────────┘
│
▼
┌──────────────────┐
│ DNSValidator │──► Live SPF/DKIM/DMARC/MX lookups via aiodns
└────────┬─────────┘
│
▼
┌──────────────────┐
│ AttachmentAnalyzer│──► Suspicious extensions, MIME type mismatches
└────────┬─────────┘
│
▼
┌──────────────────┐
│ ThreatIntelService│──► Multi-provider parallel indicator checks
│ (7 providers) │ (IP, domain, URL, hash)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ ScoringEngine │──► 75+ weighted finding types
│ │ Aggregated risk score (0-100)
│ │ Risk level & confidence calculation
└────────┬─────────┘
│
▼
Results persisted to PostgreSQL
│
▼
Response returned to frontend
```
## 安全特性
| 类别 | 保护措施 |
|----------|-------------|
| **输入验证** | 对所有用户输入 (文件上传、邮件头、文本) 进行严格验证 |
| **速率限制** | 基于IP:60次请求/分钟,1000次请求/小时 (可配置) |
| **文件安全** | MIME类型检查、最大10MB、编码验证、可疑模式检测 |
| **XSS 防护** | 内容净化、CSP头 |
| **SSRF 防护** | URL验证、限制出站网络访问 |
| **安全头** | CSP、HSTS、X-Frame-Options、X-Content-Type-Options、X-XSS-Protection |
| **SQL 注入** | 通过SQLAlchemy进行参数化查询 |
| **CSRF** | 对状态更改操作使用基于令牌的保护 |
| **日志** | 日志中不包含敏感信息,结构化JSON格式 |
| **密钥管理** | 基于环境的配置,永不硬编码 |
| **审计跟踪** | 所有操作 (CRUD、分析) 的完整审计日志 |
## 威胁情报供应商
| 供应商 | 指标类型 | 需要API密钥 |
|----------|----------------|:----------------:|
| **VirusTotal** | IP、域名、URL、文件哈希 | 是 |
| **AbuseIPDB** | IP | 是 |
| **AlienVault OTX** | IP、域名、URL、文件哈希 | 是 |
| **Spamhaus DNSBL** | IP、域名 (基于DNS) | 否 |
| **Phishtank** | URL | 否 |
| **Google Safe Browsing** | URL | 是 |
| **URLScan.io** | URL | 是 |
## 评分引擎
评分引擎使用一个基于权重的风险模型,包含:
- **75+ 发现类型**,每种都有自定义权重,范围从 -15 (安全信号) 到 +40 (严重威胁)
- **5个风险等级**:安全 (0-9)、低 (10-24)、中 (25-49)、高 (50-79)、严重 (80-100)
- **每个发现类型的置信度评分** (0.0 - 1.0)
- **基于严重性的基础分数**:严重 (40)、高 (20)、中 (10)、低 (3)、信息 (0)
- **从JSON文件加载自定义权重**
- **运行时动态更新权重**
### 权重示例
| 发现类型 | 权重 |
|-------------|:------:|
| `dmarc_fail` | +30 |
| `missing_from` | +35 |
| `known_malicious_ip` | +40 |
| `sender_in_blocklist` | +40 |
| `homoglyph_detected` | +20 |
| `dmarc_pass` | -15 |
| `spf_pass` | -10 |
## 测试
### 后端测试
```
cd backend
# - "Windows" is a proper noun, keep in English. The command is kept as is.
pytest app/tests/ -v --cov=app/ --cov-report=term
# 11. Linux/macOS: source venv/bin/activate -> Linux/macOS:source venv/bin/activate
pytest app/tests/unit/ -v
# - "Linux/macOS" are proper nouns, keep in English. Command as is.
pytest app/tests/integration/ -v
# 12. Install dependencies -> 安装依赖
pytest app/tests/unit/test_header_analyzer.py -v
# 13. Start PostgreSQL and Redis (Docker) -> 启动 PostgreSQL 和 Redis(Docker)
pytest --cov=app/ --cov-fail-under=85
```
### 后端测试覆盖率
| 测试套件 | 测试数 | 描述 |
|-----------|:-----:|-------------|
| 邮件头分析器 | 8 | SPF/DKIM/DMARC、伪造、边界情况 |
| URL分析器 | 10 | Punycode、同形异义字、typosquat、短链接、边界情况 |
| 评分引擎 | 6 | 空发现项、严重性、阈值、分析 |
| API集成 | 16 | 完整API流程、EML上传、错误情况 |
### 前端测试
```
cd frontend
# - "PostgreSQL", "Redis", "Docker" are tool names, keep in English.
npm run test
# 14. Run migrations -> 运行迁移
npm run test:watch
# - "Migrations" is a database term, translated as "迁移".
npm run test:e2e
# 15. Start development server -> 启动开发服务器
npm run coverage
```
## 代码检查与类型检查
### 后端
```
cd backend
ruff check app/ # Lint
mypy app/ # Type check
black --check app/ # Format check
```
### 前端
```
cd frontend
npm run lint # ESLint + Prettier check
npm run format # Prettier format
npm run typecheck # TypeScript check (tsc --noEmit)
```
## Docker 服务
| 服务 | 端口 | 依赖 |
|---------|:----:|--------------|
| **PostgreSQL** | 5432 | — |
| **Redis** | 6379 | — |
| **后端 API** | 8000 | PostgreSQL, Redis |
| **Celery Worker** | — | PostgreSQL, Redis, 后端 |
| **前端** | 3000 | 后端 |
| **Nginx** | 80/443 | 后端, 前端 |
## CI/CD 流水线
### 持续集成 (`.github/workflows/ci.yml`)
在推送到 `main`/`develop` 分支以及向 `main` 提交 PR 时触发:
| 作业 | 工具 |
|-----|-------|
| **backend-lint** | Ruff、mypy、Black --check |
| **backend-tests** | 带覆盖率的 pytest、PostgreSQL 16 + Redis 7 |
| **frontend-lint** | ESLint、tsc --noEmit |
| **frontend-tests** | 带覆盖率的 Vitest |
| **docker-build** | docker compose build --parallel |
### 部署 (`.github/workflows/deploy.yml`)
在匹配 `v*` 的标签时触发:
1. 构建并推送 Docker 镜像到 Docker Hub
2. SSH 连接到生产服务器
3. 拉取新镜像并执行 `docker compose up -d`
4. 修剪旧的 Docker 镜像
## 安全
如需了解安全漏洞,请阅读 [SECURITY.md](SECURITY.md) 并报告至 `security@phishguard.dev`。
## 变更日志
请参阅 [CHANGELOG.md](CHANGELOG.md) 了解版本历史。
## 许可证
本项目根据 MIT 许可证授权。
**致力于让每个人都能轻松访问电子邮件安全。**
标签:AMSI绕过, Celery, DKIM验证, DMARC验证, DNS验证, IOC摘要, masscan, Python, Redis, SPF验证, URL情报, 企业级安全, 域名信誉, 多提供商集成, 头部分析, 威胁情报平台, 威胁检测, 威胁评分, 安全分析平台, 安全报告, 开源安全工具, 异步处理, 恶意链接分析, 搜索引擎查询, 无后门, 测试用例, 电子邮件安全, 网络安全, 网络钓鱼防护, 请求拦截, 逆向工具, 逆向工程平台, 邮件威胁检测, 钓鱼攻击分析, 附件扫描, 隐私保护, 风险评估模型