smixs/osint-skill

GitHub: smixs/osint-skill

一款面向AI编程助手的开源情报技能模块，能从姓名或账号自动生成包含心理画像、职业路径和置信度评分的完整调查档案。

Stars: 60 | Forks: 11

# OSINT 技能 [![Early Beta](https://img.shields.io/badge/status-early%20beta-orange)](https://github.com/smixs/osint-skill/releases) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![Apify Actors](https://img.shields.io/badge/Apify%20actors-55%2B-green)](https://apify.com) 针对个人的系统性情报收集。从姓名或账号Handle到包含心理画像、职业路径和切入点的评分档案。 ### 兼容的 Agent 适用于任何支持 SKILL.md 格式的 AI 编码 Agent： | Agent | 安装方式 | |-------|---------| | **[Claude Code](https://claude.com/product/claude-code)** | `cp -r osint/ ~/.claude/skills/osint/` | | **[OpenClaw](https://github.com/openclaw/openclaw)** | 复制到工作区的 `skills/` 目录 | | **[Codex](https://developers.openai.com/codex/app)** | 在 Agent 配置中指向 `osint/SKILL.md` | | **[OpenCode](https://github.com/anomalyco/opencode)** | 复制到项目的 `skills/` 目录 | | **任意 SKILL.md Agent** | 将 `osint/` 文件夹放置在你的 Agent 读取技能的位置 | 该技能使用标准工具（`bash`, `curl`, `node`, `python3`）并以 Markdown 格式编写指令——无厂商锁定。 ## 功能特性 - **分阶段 Pipeline** (0 → 1 → 1.5 → 2 → 3 → 4 → 5 → 6)：从快速搜索到深度研究 - **Swarm Mode**：在 Sonnet 上协调 3-5 个并行子 Agent 以提升速度 - **嵌入式 55+ Apify actors**：Instagram (12), Facebook (14), TikTok (14), YouTube (5), Google Maps (4), LinkedIn 等 - **心理画像**：基于内容分析（YouTube 字幕、Telegram 消息、博客）的 MBTI / Big Five - **置信度评分**：每个事实根据独立确证的数量被评为 A/B/C/D 级 - **内部情报**：在求助于外部之前，检查 Telegram 历史、电子邮件、vault 联系人 - **研究升级**：4 个等级，从免费到 $0.50，从几秒到几分钟 - **预算追踪**：花费 ≤$0.50 无需询问，超过此数额需征求许可 ## 快速开始 ``` # 克隆 repo git clone https://github.com/smixs/osint-skill.git cd osint-skill # 复制到你的 agent 的 skills 目录（Claude Code 示例） cp -r osint/ ~/.claude/skills/osint/ # 运行 self-diagnostics 检查可用内容 bash osint/scripts/diagnose.sh ``` ## 系统要求 ### 必需 | 工具 | 用途 | 安装方式 | |------|---------|---------| | **curl** | 向 API 发起 HTTP 请求 | macOS/Linux 预装 | | **python3** | JSON 解析, MCP 客户端 | macOS/Linux 预装 | | **jq** | JSON 处理 | `brew install jq` / `apt install jq` | ### 用于 Apify actors (55+ 平台) | 工具 | 用途 | 安装方式 | |------|---------|---------| | **Node.js 18+** | 运行 `run_actor.js` (嵌入式 Apify 运行器) | [nodejs.org](https://nodejs.org) | ### 可选 | 工具 | 用途 | 安装方式 | |------|---------|---------| | **mcpc** | 在 Apify Store 中动态发现 Actor | `npm install -g @apify/mcpc` | ## API Keys & 服务该技能采用**优雅降级**机制——提供的 API Key 越多，挖掘深度越深。你需要**至少一个**搜索 API 才能开始。 ### 免费层级 | 服务 | 环境变量 | 功能描述 | 获取方式 | |---------|-------------|--------------|--------| | **Brave Search** | _(内置于 Claude Code)_ | 2,000 次查询/月，基础网页搜索 | 内置，无需设置 | | **Jina AI** | `JINA_API_KEY` | URL → Markdown 阅读器，搜索，deepsearch | [jina.ai/api-key](https://jina.ai/api-key) | | **Apify** | `APIFY_API_TOKEN` | Instagram, TikTok, YouTube, LinkedIn 抓取。免费额度约 $5/月 | [console.apify.com](https://console.apify.com/account/integrations) | | **Parallel AI** | `PARALLEL_API_KEY` | AI 驱动的搜索，包含推理和引用 | [platform.parallel.ai](https://platform.parallel.ai) | ### 付费（推荐） | 服务 | 环境变量 | 功能描述 | 成本 | 获取方式 | |---------|-------------|--------------|------|--------| | **Perplexity API** | `PERPLEXITY_API_KEY` | Sonar (快速 AI 回答), Deep Research | ~$5/月 | [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api) | | **Exa AI** | `EXA_API_KEY` | 语义搜索，人员/公司研究 | ~$5/月 | [dashboard.exa.ai](https://dashboard.exa.ai) | | **Tavily** | `TAVILY_API_KEY` | Agent 优化搜索，基础请求 $0.005/次 | ~$5/月 | [app.tavily.com](https://app.tavily.com/home) | ### 高级 | 服务 | 环境变量 | 功能描述 | 成本 | 获取方式 | |---------|-------------|--------------|------|--------| | **Bright Data** | `BRIGHTDATA_MCP_URL` | CAPTCHA 绕过, authwall 绕过, Facebook 抓取, Yandex 搜索 | ~$10/月+ | [brightdata.com/mcp](https://brightdata.com/products/web-scraper/mcp) | ### 设置 Keys **选项 1 — 环境变量（推荐）：** ``` export PERPLEXITY_API_KEY="pplx-..." export EXA_API_KEY="exa-..." export APIFY_API_TOKEN="apify_api_..." export JINA_API_KEY="jina_..." export TAVILY_API_KEY="tvly-..." export PARALLEL_API_KEY="..." export BRIGHTDATA_MCP_URL="https://mcp.brightdata.com/..." ``` **选项 2 — 文件回退**（部分脚本支持）： ``` /scripts/apify-api-token.txt /scripts/jina-api-key.txt /scripts/parallel-api-key.txt /scripts/brightdata-mcp-url.txt ``` ## 工作原理 ### 研究阶段 ``` Phase 0: Tooling Self-Check → diagnose.sh, check available tools Phase 1: Seed Collection → parallel search across all engines Phase 1.5: Internal Intelligence → Telegram, email, vault (BEFORE external sources) Phase 2: Platform Extraction → LinkedIn, Instagram, Facebook, TikTok, YouTube... Phase 3: Cross-Reference → facts verified, graded A/B/C/D Phase 4: Psychoprofile → MBTI, Big Five, communication style Phase 5: Completeness Check → 9 mandatory checks + Depth Score 1-10 Phase 6: Dossier Output → formatted dossier from template ``` ### 研究升级（便宜 → 昂贵） ``` Level 1: Quick Answers → Perplexity Sonar, Brave, Tavily, Exa (~$0.00) Level 2: Source Verification → Jina read, Parallel extract (~$0.01) Level 3: Social Media → Apify scrapers, Bright Data (~$0.01-0.10) Level 4: Deep Research → Perplexity Deep, Exa Deep, Jina Deep (~$0.05-0.50) ``` ### 内嵌脚本 | 脚本 | 用途 | |--------|---------| | `diagnose.sh` | 所有工具和 API 的自检诊断 | | `perplexity.sh` | search / sonar / deep research | | `tavily.sh` | search / deep / extract | | `exa.sh` | search / company / people / crawl / deep | | `first-volley.sh` | 跨所有引擎并行搜索 | | `merge-volley.sh` | 去重并分组搜索结果 | | `apify.sh` | LinkedIn / Instagram / 任意 actor / store 搜索 | | `run-actor.sh` | 通用 Apify 运行器 (55+ actors, 轮询, CSV/JSON 导出) | | `run_actor.js` | 驱动 run-actor.sh 的 Node.js 引擎 | | `jina.sh` | read URL / search / deepsearch | | `parallel.sh` | search / extract | | `brightdata.sh` | scrape / search / search-geo / search-yandex | | `mcp-client.py` | 用于 Bright Data 的轻量级 MCP 客户端 (仅使用标准库) | ## 项目结构 ``` osint/ ├── SKILL.md # Main skill file (452 lines) ├── references/ │ ├── tools.md # Full catalog of 55+ Apify actors + all tools │ ├── platforms.md # Platform-specific extraction guide │ ├── content-extraction.md # YouTube/podcast/blog extraction │ └── psychoprofile.md # MBTI/Big Five methodology ├── assets/ │ └── dossier-template.md # Output dossier template └── scripts/ ├── diagnose.sh # Self-check ├── run-actor.sh # Universal Apify runner (bash wrapper) ├── run_actor.js # Apify runner engine (Node.js, embedded) ├── package.json # ESM support for run_actor.js ├── apify.sh # Apify shortcuts ├── perplexity.sh # Perplexity API ├── tavily.sh # Tavily API ├── exa.sh # Exa AI API ├── jina.sh # Jina AI API ├── parallel.sh # Parallel AI API ├── brightdata.sh # Bright Data MCP ├── mcp-client.py # MCP client (Python, stdlib only) ├── first-volley.sh # Parallel first search └── merge-volley.sh # Result merging ``` ## 已知问题 (Beta) - **Shell 注入**：用户输入在未使用 `jq` 转义的情况下插入 JSON。请勿使用不受信任的输入运行。 - **macOS**：`first-volley.sh` 使用了 `tail --pid` (仅限 Linux)。并行搜索在 macOS 上可用，但超时逻辑可能无法触发。 - **Apify actors**：Actor ID 可能会更改或在没有通知的情况下被移除。请使用 `apify.sh store-search` 查找替代方案。 - **Key 加载不一致**：Perplexity, Tavily, 和 Exa 仅从环境变量加载（没有文件回退，与 Apify/Jina/Parallel 不同）。 ## 致谢 - **Apify Actor Runner** (`run_actor.js`) 嵌入自 [apify/agent-skills](https://github.com/apify/agent-skills) (MIT License) - Actor 目录基于 [apify-ultimate-scraper](https://github.com/apify/agent-skills/tree/main/skills/apify-ultimate-scraper) v1.3.0 ## 许可证 [MIT](LICENSE)

标签：Apify, Claude Code, ESC4, ESC8, Facebook抓取, GitHub, GNU通用公共许可证, Instagram抓取, LinkedIn分析, MBTI分析, MITM代理, Node.js, OpenAI Codex, OSINT, Python, SKILL.md, Swarm Mode, Telegram监控, URL抓取, 二进制发布, 人员侦察, 人员调查, 人物画像, 内存扫描绕过, 分布式采集, 大语言模型工具, 威胁情报, 实时处理, 密码管理, 应用安全, 开发者工具, 开源工具, 开源网络情报, 心理侧写, 搜索引擎API, 数字取证, 数据抓取, 无后门, 社交媒体监控, 社会工程学, 网络安全, 职业生涯图谱, 背景调查, 自动化情报收集, 自动化脚本, 隐私保护, 隐私挖掘