timokoessler/easy-waf-data

GitHub: timokoessler/easy-waf-data

提供易用 WAF 所需的爬虫 IP 白名单数据，解决假爬虫识别与性能优化问题。

Stars: 3 | Forks: 0

# 易用 WAF 数据本仓库提供 Web 应用程序防火墙 [EasyWAF](https://github.com/timokoessler/easy-waf) 使用的数据。每 12 小时，GitHub Action 会抓取搜索引擎爬虫及其他大型平台爬虫的 IP 地址范围白名单，并存储在本仓库中。该列表用于 Easy WAF 的“假爬虫”模块，以拦截假爬虫。如需了解更多关于 WAF 的信息，请访问 [EasyWAF 仓库](https://github.com/timokoessler/easy-waf)。大多数爬虫的真实性可以通过反向 DNS 查询来确定，但额外的 IP 白名单可以提高性能。此外，某些爬虫（如 Facebook 爬虫）的真实性只能通过 IP 来判断。 **但为什么 IP 范围列表不由 EasyWAF 本地生成？** 主要原因是 BGP 路由表分析的下载需要一些时间，尤其是在网络连接较差的情况下。如果应用程序以并行方式多次启动（例如 Node.js 集群模式），这种影响会被放大。此外，无需更新 EasyWAF 即可更快地应对所使用数据源的变更。 **为什么这不是安全问题？** 这些数据仅用于假爬虫模块的白名单，因此添加恶意 IP 并不能绕过这些 IP 地址的 WAF 防护。此数据源的中断或故障目前只会导致 Facebook 爬虫出现问题，并略微降低 EasyWAF 的性能。 ## 数据源 ### Google - 文档：[检查 Googlebot 及其他 Google 爬虫](https://support.google.com/webmasters/answer/80553) - JSON 直接链接：[Google IP 范围](https://www.gstatic.com/ipranges/goog.json) ### Bing - JSON 直接链接：[Bing IP 范围](https://www.bing.com/toolbox/bingbot.json) ### Facebook - 文档：[Facebook 爬虫](https://developers.facebook.com/docs/sharing/webmasters/crawler/) - IP 范围从 BGP 路由表分析中抓取 ### Twitter - 文档：[Twitterbot](https://developer.twitter.com/en/docs/twitter-for-websites/cards/guides/troubleshooting-cards) - IP 范围从 BGP 路由表分析中抓取 ### DuckDuckGo - 文档：[DuckDuckBot 与 DuckDuckGo 有关吗？](https://raw.githubusercontent.com/duckduckgo/duckduckgo-help-pages/master/_docs/results/duckduckbot.md) ### Pinterest - 文档：[Pinterest 爬虫](https://help.pinterest.com/en/business/article/pinterest-crawler) ### BGP 路由表分析 - 网站：[BGP 路由表分析](https://thyme.apnic.net/) - IPv4 前缀直接链接：[IPv4 前缀](https://thyme.apnic.net/current/data-raw-table) - IPv6 前缀直接链接：[IPv6 前缀](https://thyme.apnic.net/current/ipv6-raw-table) ## 联系方式如果公开 GitHub 问题或讨论不适合您的关注点，您可以与我直接联系： - 电子邮件：[inf***@timokoessler.de](mailto:info@timokoessler.de) - 我的网站：[timokoessler.de](https://timokoessler.de)

标签：AppImage, BGP路由分析, Bingbot, EasyWAF, Facebook爬虫, GitHub Action, Googlebot, Homebrew安装, IP白名单, IP范围, WAF, Web应用防火墙, 反向DNS, 安全合规, 定时任务, 开源库, 性能优化, 搜索引擎爬虫, 数据更新, 数据来源, 检测绕过, 爬虫识别, 网络代理, 网络威胁情报, 自动化攻击