noptrix/httpgrep

GitHub: noptrix/httpgrep

一款基于 Python asyncio 的快速异步 HTTP(S) 扫描器，用于在大规模主机范围内通过字符串或正则匹配响应头和响应体内容。

Stars: 36 | Forks: 7

# 描述一款快速、异步的 Python 工具，用于扫描 HTTP(S) 服务器，并通过 grep 匹配 HTTP 响应头和响应体中的字符串或 regex 模式。它支持单个主机、URL、CIDR 范围、IP 范围或文件作为输入；支持为每个目标扫描多个端口（单个、逗号分隔列表或范围，并根据端口自动检测 TLS 或明文）；可以直接从 TLS 证书中提取并扫描基于名称的 (v)host；实时在终端流式输出匹配结果；并可将结果写入文本、CSV 或 JSONL 日志文件。专为大规模扫描而构建：异步核心可驱动数千个并发连接，TCP 预检能以极低成本跳过无响应的端口，基于主机和全局的 timeout 可防止在缓慢/无响应的目标上挂起，并且中断的运行可以恢复。 # 环境要求 - POSIX 系统（Linux、\*BSD、macOS - 使用 `termios` 和 asyncio 的 Unix 信号处理）上的 Python 3.11+ - [httpx](https://pypi.org/project/httpx/) - `pip install -r requirements.txt` （或 `pip install httpx`） - 可选，存在则自动使用：`uvloop`（更快的 event loop）、`aiodns` （用于 `-r` 的非阻塞 DNS）、`httpx[socks]` / socksio（SOCKS proxies） httpgrep 是一个独立自包含的脚本 - 只需运行 `./httpgrep.py` 即可。 # 用法 ``` $ httpgrep -H __ __ __ / /_ / /_/ /_____ ____ _________ ____ / __ \/ __/ __/ __ \/ __ `/ ___/ _ \/ __ \ / / / / /_/ /_/ /_/ / /_/ / / / __/ /_/ / /_/ /_/\__/\__/ .___/\__, /_/ \___/ .___/ /_/ /____/ /_/ --== [ by nullsecurity.net ] ==-- usage httpgrep -h -s [opts] | target options -h - single host/url or host-/cidr-range or file containing hosts or file containing URLs, e.g.: foobar.net, 192.168.0.1-192.168.0.254, 192.168.0.0/24, /tmp/hosts.txt NOTE: hosts can also contain ':' on cmdline or in file, where is a single port, comma-list or range, e.g.: foo.net:8080, foo.net:80,443, 10.0.0.1:1-1024 -p - port(s) to connect to: single port, comma-separated list, range, or a file with one spec per line, e.g.: 80, 80,443,8080, 8000-8100, /tmp/ports.txt (default: 80, or 443 when -t is given) -t - force TLS/SSL on all ports. by default the scheme is auto-detected per port (plain http, switching to TLS if the port speaks it) -u - URI or comma-separated URIs or file with URIs (one per line) to search given strings in, e.g.: /foobar/, /foo.html, /admin,/login, /tmp/paths.txt (default: /) -r - perform reverse dns lookup for given IPv4 addresses (resolved concurrently before scanning) http options -X - specify HTTP request method to use (default: get). use '?' to list available methods. -a - http auth credentials (format: 'user:pass') -U - set custom User-Agent (default: latest ms edge, windows) -A - use random user-agent per request -R - set custom headers (format: 'foo=bar;lol=lulz;...') -C - set cookies (format: 'foo=bar;lol=lulz;...') -F - don't follow HTTP redirects -L - max redirects to follow (default: 10; ignored with -F) -E - verify TLS/SSL certificates (default: no verification) -P - use proxy (format: '[http|https|socks4|socks5]://host:port') (socks needs the 'httpx[socks]' / socksio package) -f - only report responses with given HTTP status codes, e.g.: '200', '200,301,302' -e - exclude responses with given HTTP status codes, e.g.: '404', '403,404,500' search options -s - a single string/regex or multile strings/regex in a file to find in given URIs and HTTP response headers, e.g.: 'tomcat 8', '/tmp/igot0daysforthese.txt' -S - invert (grep -v): drop ALL matches of a response if this string/regex (or file) appears anywhere in its body or headers, e.g. to filter out dynamic error / 404 pages -w - search strings in given places (default: headers,body) -b - num bytes of context to show from a body match (default: 64) -m - max body to read + search; suffix b/kb/mb, no suffix = kb, e.g.: 512, 1mb, 262144b (default: 256kb) -i - use case-insensitive search -I - use case-insensitive invert (for -S) scan options -x - max concurrent connections (async; default: 1000). raise ulimit -n accordingly for very high values -c - per-host connect + read timeout in seconds, also caps body read time (default: 3.0) -G - global timeout: hard-stop the whole scan after N seconds (safety net against any hang; default: none) -1 - once a host has a match, skip its not-yet-started probes (best-effort; in-flight requests still finish, so under high -x you may still see a few matches per host) -z - scan targets in random order within a memory-bounded window of ram (suffix b/kb/mb/gb), e.g.: -z 1gb. keeps huge ranges/files from exhausting memory -W - save/resume: on ctrl+c write progress to httpgrep.session; rerun with -W to resume from it (else start fresh) -T <0|1> - pull (v)hosts from the TLS cert (CN + SAN) and scan them. 0 = in-scope only (via host header on the scanned IP); 1 = also scan each vhost by name (dns-resolved, MAY LEAVE the scanned scope). needs TLS (-t or a *443 port). output options -l - log found matches to . per chosen -O format (e.g. -l out -O csv,jsonl => out.csv, out.jsonl) -O - log file format(s), comma-list of: txt, csv, jsonl (default: txt; use '?' to list). terminal output always stays human-readable. -v - verbose: print each url as it gets scanned misc options -H - print help -V - print version information examples # grep for 'apache' in headers and body of a single host $ httpgrep -h foobar.net -s apache # scan a CIDR range on port 8080, search for 'tomcat' in body only $ httpgrep -h 192.168.0.0/24 -p 8080 -s tomcat -w body # scan a host across multiple ports and a port range for 'jenkins' $ httpgrep -h 192.168.0.10 -p 80,443,8080,8000-8100 -s jenkins -i # scan host list, search string file, log matches (-> /tmp/out.txt) $ httpgrep -h /tmp/hosts.txt -s /tmp/strings.txt -x 200 -l /tmp/out # grep for 'admin' case-insensitively across multiple URIs via TLS $ httpgrep -h foobar.net -t -u /admin,/login,/dashboard -s admin -i # scan IP range, reverse DNS, only report 200 responses $ httpgrep -h 10.0.0.1-10.0.0.254 -s 'powered by' -r -f 200 # search headers only, don't follow redirects, verbose output $ httpgrep -h foobar.net -s 'X-Powered-By' -w headers -F -v # grep for 'admin', but drop dynamic error pages (invert, case-insensitive) $ httpgrep -h 192.168.0.0/24 -s admin -i -S 'error|not found' -I # route through proxy, custom UA, search for version strings $ httpgrep -h /tmp/hosts.txt -s 'nginx/1\.' -P http://127.0.0.1:8080 -U 'curl/8.0' # big resumable scan: ctrl+c saves state, rerun with -W to continue; also # cap the whole run at 1 hour as a hang safety net $ httpgrep -h 10.0.0.0/16 -p 80,443 -s admin -W -G 3600 ``` # 输出匹配结果会实时逐行打印： ``` [*] | [vhost] | | ``` - `` - 扫描的 URL (`scheme://host:port/uri`)。 - `` - 仅在使用 `-T` 时出现：通过 `Host` header 尝试的证书 (v)host（直接扫描时为空）。 - `` - `body` 或 `header`。 - `` - 响应体命中：匹配处附近的简短 repr 窗口（`-b` 字节）；响应头命中：`name: value`。终端始终显示这种人类可读的格式。使用 `-l ` 时，相同的匹配结果会镜像写入到对应每个 `-O` 格式的 `.` 文件中 - `txt`（这些文本行）、`csv`（带有表头的 `url,vhost,type,match` 行）、`jsonl`（每次匹配输出一个 JSON 对象）。 # 作者 noptrix # 备注 - 简单粗糙的代码 - httpgrep 已经打包并可供 [BlackArch Linux](https://www.blackarch.org/) 使用 - 我的 master 分支始终是稳定的；会为当前工作创建 dev 分支。 - 你能找到的所有我公开的内容都是通过 [nullsecurity.net](https://www.nullsecurity.net) 官方宣布和发布的。 # 许可证请查阅 docs/LICENSE。 # 免责声明我们在此强调，在 [nullsecurity.net](http://nullsecurity.net) 上找到的与黑客相关的内容仅供学习使用。我们对任何损害概不负责。你需要对自己的行为负责。

标签：Python, 主机安全, 密码管理, 异步IO, 无后门, 网络扫描器, 计算机取证, 运行时操纵, 逆向工具