Pingu314/email_header_analyzer

GitHub: Pingu314/email_header_analyzer

一款面向 SOC 场景的邮件头深度分析工具，通过 28 种检测信号识别钓鱼邮件并自动映射 MITRE ATT&CK 技术。

Stars: 0 | Forks: 0

# email-header-analyzer ![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/8c35501a6b072910.svg) ![Coverage](https://img.shields.io/badge/coverage-93%25-brightgreen) ![Python](https://img.shields.io/badge/python-3.9%2B-blue) ![License](https://img.shields.io/badge/license-MIT-green) ## 功能简介解析 `.eml` 文件，并通过多阶段检测管道进行处理：身份验证（SPF/DKIM/DMARC）、路由分析、结构化邮件头检查、MIME/编码异常检测以及链接分析。每个触发的信号都会对标准化的风险评分产生影响，并最终得出 `CLEAN`、`SUSPICIOUS` 或 `PHISHING` 的判定结果。检测结果会映射到 MITRE ATT&CK 子技术，并支持导出为 JSON、CSV 或 PDF 格式。 **范围：** 本工具主要针对电子邮件伪造和基础设施滥用行为——例如 DMARC 不一致、伪造邮件头、托管在云端的钓鱼页面以及垃圾邮件规避技术。对于来自配置正确的 ESP 且通过所有身份验证检查的商业垃圾邮件，不包含在本工具的检测范围内；请使用基于内容的过滤器（如 SpamAssassin、rspamd）来处理此类情况。 ## 检测信号（共 28 种） | 类别 | 信号 | |---|---| | **Authentication** | `dmarc_fail`, `dmarc_policy_none`, `spf_fail`, `dkim_fail`, `dkim_sha1`, `no_auth_results` | | **Routing** | `no_received_headers`, `delivery_delta_high`, `high_hop_count`, `private_ip_in_chain`, `via_bulk_esp` | | **Headers** | `return_path_mismatch`, `to_delivered_mismatch`, `sender_from_mismatch`, `fabricated_x_header`, `duplicate_critical_headers`, `bulk_precedence_on_alert` | | **MIME / Encoding** | `encoding_mismatch`, `non_standard_encoding`, `anomalous_content_type`, `html_only_body`, `subject_obfuscation`, `labeled_negative_content`, `nonsense_html_tags`, `mixedcase_html_tags`, `tokenization_breakers` | | **Links** | `cloud_storage_link`, `link_domain_mismatch`, `url_shortener`, `misleading_link_text` | 信号权重可在 `settings.yaml` 中配置。阈值：`< 0.30` → CLEAN，`0.30–0.60` → SUSPICIOUS，`≥ 0.60` → PHISHING。 ## MITRE ATT&CK 覆盖范围 | 技术 | 名称 | 置信度触发条件 | |---|---|---| | T1566.001 | Phishing: Spearphishing Attachment | `dmarc_fail` + `return_path_mismatch` | | T1566.002 | Phishing: Spearphishing Link | `cloud_storage_link` + `dmarc_fail` | | T1583.006 | Acquire Infrastructure: Web Services | `cloud_storage_link` | | T1598.003 | Phishing for Information: Spearphishing Link | `link_domain_mismatch` + `dmarc_fail` | | T1036.005 | Masquerading: Match Legitimate Name or Location | `fabricated_x_header` | | T1027 | Obfuscated Files or Information | `encoding_mismatch` + evasion signals | | T1027.001 | Obfuscated Files or Information: Binary Padding | `tokenization_breakers` + `labeled_negative_content` | | T1600 | Weaken Encryption | `dkim_sha1` | ## 处理管道 ``` .eml file / raw string │ ▼ parse_email() -> RFC 5322 parsing, header extraction, body decoding │ ▼ run_all_checks() -> auth - routing - headers - mime - links (independent modules) │ ▼ score() -> normalized risk score, verdict │ ▼ enrich() -> IP enrichment (ipinfo.io) + static domain analysis │ ▼ map_techniques() -> MITRE ATT&CK mapping with confidence levels │ ▼ build_report() -> EmailReport dataclass │ ┌────┴────┬──────────┐ ▼ ▼ ▼ JSON CSV PDF ``` ## 快速开始 ``` git clone https://github.com/Pingu314/email_header_analyzer.git cd email_header_analyzer pip install -e . # 分析单封电子邮件 eha analyze email.eml # JSON 输出 eha analyze email.eml --json # 保存报告 + IOC CSV eha analyze email.eml --save --ioc-csv ``` 复制 `settings.yaml` 并添加你的 [ipinfo.io](https://ipinfo.io) token 以启用 IP 扩展功能（免费额度：每月 50k 次请求）。即使没有 token，所有的检查仍会正常运行——仅跳过 IP 地理位置/ASN 查询功能。 ## 命令行界面 ``` eha analyze Analyze a single .eml file --json Output raw JSON to stdout --save Write report to reports/ --ioc-csv Also write IOC CSV --no-color Disable colored output eha bulk Analyze all .eml files in a folder --workers N Thread count (default: 4) --recursive Recurse into subdirectories --save Write individual JSON reports --ioc-csv Write combined bulk IOC CSV --json Print JSON summary to stdout eha serve Start Flask dashboard (default: http://127.0.0.1:5000) --host HOST --port PORT --debug eha version Print version ``` ### 输出示例 ``` ──────────────────────────────────────────────────────────── EMAIL HEADER ANALYZER - SUSPICIOUS ──────────────────────────────────────────────────────────── File : tests/samples/suspicious_unknown_sender.eml Score : 34.4% Analyzed: 2026-05-11 10:05 UTC ──────────────────────────────────────────────────────────── AUTHENTICATION SPF : pass Return-gpdyrty@dkio509.ulasatimu.web.id DKIM : pass @kiefbyevgjn.dkio509.ulasatimu.web.id [rsa-sha1] DMARC: unknown p= ──────────────────────────────────────────────────────────── TRIGGERED SIGNALS [0.35] dmarc_fail DMARC unknown on domain ''. Sender domain alignment check failed [0.25] cloud_storage_link Links to cloud storage detected: [storage.googleapis.com/...] [0.20] bulk_precedence_on_alert Precedence: bulk on security-themed subject [0.20] nonsense_html_tags Non-standard HTML tags: ['f3z709nplm', 'ed7afhsvvf', ...] [0.15] dkim_sha1 DKIM signature uses rsa-sha1 (deprecated since RFC 8301, 2018) [0.15] to_delivered_mismatch To: 'me@aol.com' does not match Delivered-To: 'michael...@gmail.com' [0.15] anomalous_content_type Content-Type 'multipart/digest' is anomalous for a security alert [0.15] mixedcase_html_tags Mixed-case HTML tags: ['ObjecT']. Evades case-sensitive signatures. [0.08] html_only_body HTML body with no plain text alternative. Typical of bulk campaigns. ──────────────────────────────────────────────────────────── MITRE ATT&CK [HIGH ] T1566.002 Phishing: Spearphishing Link [HIGH ] T1583.006 Acquire Infrastructure: Web Services [HIGH ] T1600 Weaken Encryption [MEDIUM] T1027 Obfuscated Files or Information [MEDIUM] T1027.001 Obfuscated Files or Information: Binary Padding [LOW ] T1036.005 Masquerading: Match Legitimate Name or Location [LOW ] T1566.001 Phishing: Spearphishing Attachment ──────────────────────────────────────────────────────────── DOMAIN ANALYSIS storage.googleapis.com ↳ cloud_storage_host ↳ lookalike_brand:g[o0]{2}g[l1]e ──────────────────────────────────────────────────────────── IOCs IPS 216.244.76.116 / 98.126.16.142 DOMAINS storage.googleapis.com ptupqjolqeb.dkio509.ulasatimu.web.id URLS https://storage.googleapis.com/oliiiseur/ozeutizeptzir.html#... EMAIL_ADDRESSES ujaizae@ptupqjolqeb.dkio509.ulasatimu.web.id return-gpdyrty@dkio509.ulasatimu.web.id ──────────────────────────────────────────────────────────── ``` ## 仪表盘 ``` eha serve # 打开 http://127.0.0.1:5000 ``` - 通过拖放上传单个或多个 `.eml` 文件 - 单报告视图：身份验证结果、路由链、触发的信号、MITRE 映射表、域名分析、IOC 列表 - 批量视图：批量处理的判定结果摘要 - 历史记录：所有缓存的报告（保存在内存中，重启后重置） - JSON API：`GET /api/report/` ## 导出格式 | 格式 | 命令 | 内容 | |---|---|---| | JSON | `--save` 或 `/api/report/` | 完整报告 - 所有信号、IOC、MITRE 映射、域名分析 | | CSV | `--ioc-csv` | IOC 数据行：IP、域名、URL、电子邮件、MITRE 技术、域名风险信号 | | PDF | 仪表盘结果页面 | 包含所有部分的深色主题报告，适合用于事件文档记录 | CSV 列名：`type`、`value`、`verdict`、`score`、`source`、`analyzed_at` - 兼容 Splunk ES 显著事件和 TheHive observables。 ## 项目结构 ``` email_header_analyzer/ ├── cli.py Entry point (eha) ├── settings.yaml All 28 signal weights + thresholds ├── src/ │ ├── checks/ Detection modules (auth, routing, headers, mime, links) │ ├── dashboard/ Flask app + templates + in-memory cache │ ├── export/ JSON / CSV / PDF writers │ ├── pipeline.py Orchestrates the full analysis chain │ ├── scorer.py Normalized scoring + verdict │ ├── mitre_mapper.py ATT&CK technique mapping with confidence │ └── enrichment.py IP (ipinfo.io) + static domain analysis └── tests/ 383 tests, 93% coverage └── samples/ Real-world .eml fixtures (legit + suspicious) ``` ## 开发环境配置 ``` pip install -e ".[dev]" pytest # 383 tests, ~78s ruff check src/ tests/ cli.py ``` CI 通过 GitHub Actions 在 Python 3.9 - 3.14 环境下运行。 ## 作品集背景本项目是 SOC 分析师三件套作品集的一部分： | 项目 | 描述 | |---|---| | [soc_threat_analyzer](https://github.com/Pingu314/soc_threat_analyzer) | 基于日志的威胁检测管道 — SIGMA 规则，MITRE ATT&CK，pytest | | [phishing_url_analyzer](https://github.com/Pingu314/phishing_url_analyzer) | URL 信誉与结构分析 | | **email_header_analyzer** | 本项目 | ## 许可证 MIT

标签：Cloudflare, DKIM, DMARC, EDR, JSON导出, MIME解析, MITRE ATT&CK, Object Callbacks, PDF报告, Python, SOC工具, SPF, 威胁情报, 安全取证, 安全运营, 开发者工具, 异常检测, 恶意链接检测, 扫描框架, 文档结构分析, 无后门, 漏洞发现, 电子邮件分析, 电子邮件网关, 网关安全, 网络安全, 网络钓鱼, 脆弱性评估, 自动化分析, 跨站脚本, 逆向工具, 邮件头分析, 邮件安全, 邮件路由分析, 钓鱼检测, 防伪造, 隐私保护, 风险评分