thunderstornX/credential-leak-scanner

GitHub: thunderstornX/credential-leak-scanner

一个多源被动凭证泄露扫描器，通过聚合 HIBP 和 GitHub 等数据源生成防御性安全报告，并以优雅降级机制避免静默遗漏。

Stars: 0 | Forks: 0

``` ██████╗██████╗ ███████╗██████╗ ██╗ ███████╗ █████╗ ██╗ ██╗ ██╔════╝██╔══██╗██╔════╝██╔══██╗ ██║ ██╔════╝██╔══██╗██║ ██╔╝ ██║ ██████╔╝█████╗ ██║ ██║ ██║ █████╗ ███████║█████╔╝ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██╔══╝ ██╔══██║██╔═██╗ ╚██████╗██║ ██║███████╗██████╔╝ ███████╗███████╗██║ ██║██║ ██╗ ╚═════╝╚═╝ ╚═╝╚══════╝╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ─── multi-source passive credential exposure ─── ``` [![tests](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/142b30a90f000611.svg)](https://github.com/thunderstornX/credential-leak-scanner/actions/workflows/tests.yml) [![Bandit](https://img.shields.io/badge/bandit-0%20issues-brightgreen)](results/security_scan.md) [![pip-audit](https://img.shields.io/badge/pip--audit-0%20vulns-brightgreen)](results/security_scan.md) [![Semgrep](https://img.shields.io/badge/semgrep-0%20findings-brightgreen)](results/security_scan.md) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.20480452.svg)](https://doi.org/10.5281/zenodo.20480452) [![License: MIT](https://img.shields.io/badge/license-MIT-blue)](LICENSE) `credential-leak-scanner` 是一个小型 Python pipeline，它将四个结构不同的被动数据源组合成一份防御性的凭证泄露报告： 1. **HIBP Pwned Passwords** k-anonymity endpoint（无需 API key）——离开本机的唯一数据是密码 SHA-1 的前 5 个十六进制字符。 2. **HIBP Breached Accounts** v3 endpoint（key 可选，缺失时优雅跳过）。 3. 针对目标域名的 GitHub code-search dorks，搜索 `.env`、`password`、`api_key`、`secret`（token 可选）。 4. 用于交叉引用的本地合成泄露 CSV（附带 100 行虚假数据，根据 RFC 6761 均为 `@example.invalid`）。如果缺少 key，相关模块会向报告中输出一个明确的 `SOURCE_UNAVAILABLE` 状态，而不是静默退出——防御者可以确切地看到哪些数据源做出了贡献，哪些没有。我们将此特性称为*优雅的 API 降级*（graceful API degradation），[论文](paper/paper.tex)中对这一设计进行了明确说明。 ## 快速开始 ``` git clone https://github.com/thunderstornX/credential-leak-scanner.git cd credential-leak-scanner python -m venv .venv .venv/bin/pip install -r requirements.txt # Bare run (no keys at all). Pwned Passwords endpoint runs; HIBP # breached-accounts and GitHub dork modules report `no_key`. .venv/bin/python -m cli.main \ --domain example.com \ --password 'CorrectHorseBatteryStaple' \ --password 'hunter2' \ --output report.json # With every key set: copy .env.example to .env and fill it in. .venv/bin/python -m cli.main \ --domain my-org.com \ --account 'soc@my-org.com' \ --account 'webmaster@my-org.com' \ --password 'leaked-on-stage-2018' \ --output report.json --ai-summary ``` CLI 会输出每个数据源的状态行，因此在打开 JSON 之前就能看到覆盖范围的缺口： ``` [*] credential-scan starting for 'example.com' at 2026-05-06T12:34:56+00:00 [*] hibp_passwords: checking 2 candidate password(s) [*] hibp_accounts: no --account flags given; skipping module [*] github_dorks: --skip-github set; skipping [*] source status: [+] hibp_passwords ok hits=2 [○] hibp_accounts no_key (skipped: HIBP_API_KEY not set …) [+] wrote report to /tmp/report.json [+] headline severity: critical ``` ## 架构 ``` ┌─────────────────────┐ │ hibp_passwords │ k-anonymity, no key required ├─────────────────────┤ │ hibp_accounts │ keyed (optional, graceful skip) ├─────────────────────┤ ──► aggregator ──► reporter ──► report.json │ github_dorks │ PAT (optional, graceful skip) ├─────────────────────┤ │ breach_csv │ local synthetic fixture └─────────────────────┘ ``` 每个 scanner 模块都会返回一个携带六种 `SourceStatus` 值之一的 `SourceReport`： | 状态 | 含义 | |-----------------------|-----------------------------------------------------| | `ok` | 运行完成，至少返回一个发现 | | `not_found` | 运行完成，返回无风险结果 | | `no_key` | 因未设置相关 API key 而跳过 | | `local_file_missing` | 因缺少本地泄露 CSV 而跳过 | | `http_error` | 上游返回 4xx/5xx | | `network_error` | DNS / TLS / 超时失败 | 将这四种“跳过”/“错误”状态中的任何一种合并为一个空的发现列表，正是防御者无法承受的静默退出类型。`tests/` 中的每个测试都明确地针对其中一种状态进行了验证。 ## 复现评估 ``` .venv/bin/python eval/run_eval.py ``` 测试工具使用 25 个标记过的候选字符串（15 个已知泄露的“常见”密码，10 个合成的“唯一”字符串）查询实时的 HIBP Pwned Passwords endpoint，并写入： * `results/eval_summary.json` —— 准确率、精确率、召回率、F1、延迟平均值 / 中位数 / p95 / 最大值，以及总挂钟时间。 * `results/eval_raw.csv` —— 每次调用的观测数据。 **最新测量数据**（2026-05-06，住宅网络）： | 指标 | 数值 | |-----------------|-----------:| | n | 25 | | accuracy | 1.0000 | | precision | 1.0000 | | recall | 1.0000 | | F1 | 1.0000 | | latency mean | 286.06 ms | | latency p95 | 281.72 ms | | latency max | 786.32 ms | | wall-clock | 12.34 s | | upstream errors | 0 | “常见”候选密码的解析泄露次数范围从 1,406,394（`letmein`）到 209,972,844（`123456`）。所有“唯一”候选字符串的解析泄露次数均为 0。有关每次调用的完整 CSV 以及此项评估*未*包含内容的明确列表，请参阅 [results/README.md](results/README.md)。 ## 测试 ``` .venv/bin/pytest -q ``` 覆盖四个 scanner 模块、aggregator 和 reporter 的 40 个测试。HTTP 请求使用 [`respx`](https://lundberg.github.io/respx/) 进行 mock，对于 httpx 而言，这比 `unittest.mock.patch` 更加精确——每个测试仍然会组装真实的 URL、真实的 headers，并解析真实的 response body。 | 模块 | 测试数量 | |----------------------|------:| | `hibp_passwords.py` | 16 | | `hibp_accounts.py` | 5 | | `github_dorks.py` | 5 | | `breach_csv.py` | 6 | | `aggregator.py` | 3 | | `reporter.py` | 5 | | **总计** | **40** | （`hibp_passwords` 包含 7 个参数化的严重级别测试用例。） ## 安全态势 | 检查门 | 发现数量 | 备注 | |-----------:|:--------:|--------------------------------------------------| | Bandit | 0 | 1 个已记录的豁免（B324，HIBP 协议） | | pip-audit | 0 | - | | Semgrep | 0 | 1 个已记录的豁免（sha1，HIBP 协议） | 有关完整的报告，请参阅 [results/security_scan.md](results/security_scan.md)。唯一的豁免位于 `scanner/hibp_passwords.py:46` 的 SHA-1 调用处；SHA-1 是 HIBP Pwned Passwords k-anonymity 协议所必需的，在此代码库中的任何位置都未将其用作密码存储原语。 ## 道德使用请参阅 [ETHICAL_USE.md](ETHICAL_USE.md)。简而言之：仅扫描您拥有或获得书面授权的域名。该工具严格属于被动工具——不进行身份验证探测，不进行撞库（credential stuffing），不抓取 paste 站点或暗网市场，也不创建账户。 ## 引用如果您在学术工作中使用了本软件，请引用 [CITATION.cff](CITATION.cff) 记录。配套的 [IEEE 论文](paper/paper.tex) 描述了该设计并报告了实际测量结果。 ## 许可证 MIT。请参阅 [LICENSE](LICENSE)。

标签：API集成, ESC4, meg, OSINT, Python, 代码示例, 信息安全, 凭证泄露检测, 可观测性, 安全规则引擎, 数据分析, 无后门, 运行时操纵, 逆向工具