soxoj/marple

GitHub: soxoj/marple

通过多个搜索引擎聚合检索目标用户名，收集并分析个人资料链接的开源情报工具。

Stars: 315 | Forks: 24

# Marple **通过 10+ 个搜索引擎根据用户名收集个人资料链接（[见下方完整列表](#supported-sources))。**

*想法源自 [Cyber Detective](https://cybdetective.com/)* 功能特点： - 多个搜索引擎 - 代理支持 - 导出 CSV 文件 - 插件 - PDF 元数据提取 - 社交媒体信息[提取](socid_extractor) ## 快速开始 ``` ./marple.py soxoj ```

### 结果 ``` https://t.me/soxoj Contact @soxoj - Telegram https://github.com/soxoj soxoj - GitHub https://coder.social/soxoj soxoj - Coder Social https://gitmemory.com/soxoj soxoj ... PDF files https://codeby.net/attachments/v-0-0-1-social-osint-fundamentals-pdf.45770 Social OSINT fundamentals - Codeby.net /Creator: Google ... Links: total collected 111 / unique with username in URL 97 / reliable 38 / documents 3 ``` 高级用法： ``` ./marple.py soxoj --plugins metadata ./marple.py smirnov --engines google baidu -v ``` ## 安装您只需要 Python3。以及 pip。当然，还有依赖项。 ``` pip3 install -r requirements.txt ``` 某些搜索引擎需要 API 密钥（请参阅[支持的数据源](#supported-sources)中的要求）。密钥应通过以下方式导出到环境变量中： ``` export YANDEX_KEY=key ``` ## 选项您可以使用 `-t` 或 `--threshold` 选项（默认为 300）指定“垃圾阈值”，以获取更多或更少可靠的结果。垃圾分数是由链接 URL 的长度以及作为 URL 一部分的用户名旁边的符号累加计算得出的。您还可以使用 `--results-count` 选项（默认为 1000）增加搜索引擎返回的结果数量。目前该限制仅适用于 Google。其他选项： ``` -h, --help show this help message and exit -t THRESHOLD, --threshold THRESHOLD Threshold to discard junk search results --results-count RESULTS_COUNT Count of results parsed from each search engine --no-url-filter Disable filtering results by usernames in URLs --engines {baidu,dogpile,google,bing,ask,aol,torch,yandex,naver,paginated,yahoo,startpage,duckduckgo,qwant} Engines to run (you can choose more than one) --plugins {socid_extractor,metadata,maigret} [{socid_extractor,metadata,maigret} ...] Additional plugins to analyze links -v, --verbose Display junk score for each result -d, --debug Display all the results from sources and debug messages -l, --list Display only list of all the URLs --proxy PROXY Proxy string (e.g. https://user:pass@1.2.3.4:8080) --csv CSV Save results to the CSV file ``` ## 支持的数据源 | 名称 | 方法 | 要求 | | ------------------- | --------------------------------------| ----------------- | | [Google](http://google.com/) | 爬取 | 无，开箱即用；经常遇到验证码 | | [DuckDuckGo](https://duckduckgo.com/) | 爬取 | 无，开箱即用 | | [Yandex](https://yandex.ru/) | XML API | [注册并获取 YANDEX_USER/YANDEX_KEY 令牌](https://github.com/fluquid/yandex-search) | | [Naver](https://www.naver.com/) | SerpApi | [注册并获取 SERPAPI_KEY 令牌](https://serpapi.com/) | | [Baidu](https://www.baidu.com/) | SerpApi | [注册并获取 SERPAPI_KEY 令牌](https://serpapi.com/) | | [Aol](https://search.aol.com/) | 爬取 | 无，分页爬取 | | [Ask](https://www.ask.com/) | 爬取 | 无，分页爬取 | | [Bing](https://www.bing.com/) | 爬取 | 无，分页爬取 | | [Startpage](https://www.startpage.com/) | 爬取 | 无，分页爬取 | | [Yahoo](https://yahoo.com/) | 爬取 | 无，分页爬取 | | [Mojeek](https://www.mojeek.com) | 爬取 | 无，分页爬取 | | [Dogpile](https://www.dogpile.com/) | 爬取 | 无，分页爬取 | | [Torch](http://torchdeedp3i2jigzjdmfpn5ttjhthh5wbmda2rr3jvqjg5p77c54dqd.onion) | 爬取 | Tor 代理（默认为 socks5://localhost:9050），分页爬取 | | [Qwant](https://www.qwant.com/) | Qwant API | 检查您的出口 IP 国家[是否支持搜索](https://www.qwant.com/)，分页爬取 | ## 开发与测试 ``` $ python3 -m pytest tests ``` ## 待办事项 - [x] 代理支持 - [ ] 通过参数选择搜索引擎 - [ ] 精确搜索过滤 - [ ] 特定引擎的过滤器 - [ ] 检查“标题中的用户名” ## 提及相关与文章 [Sector035 - Week in OSINT #2021-50](https://sector035.nl/articles/2021-50) [OS2INT - MARPLE: IDENTIFYING AND EXTRACTING SOCIAL MEDIA USER LINKS](https://os2int.com/toolbox/identifying-and-extracting-social-media-user-links-with-marple/) [Cyber Detective - X](https://threadreaderapp.com/thread/1532094437027102721.html) [OSINT Ambition - X post](https://twitter.com/osintambition/status/1725011306947006797) [Offensive Security Cheatsheet - Usernames](https://cheatsheet.haax.fr/open-source-intelligence-osint/human-recon/username/)

标签：BeEF, ESC4, OSINT, PDF元数据提取, Python, Unix, 个人信息收集, 二进制模式, 代理支持, 代码示例, 告警, 实时处理, 情报收集, 搜索引擎, 数字足迹, 数据分析, 数据导出, 无后门, 漏洞研究, 爬虫, 用户名枚举, 用户画像, 社交媒体, 社交网络分析, 网络侦查, 网络安全, 网络追踪, 账户发现, 隐私保护, 隐私合规