hashbulla/deep-research

GitHub: hashbulla/deep-research

一款面向 Claude Code 的人工把关式多源深度研究技能，通过七阶段流水线、来源分级和确定性质量门限，生成每个主张均可溯源和置信度标注的研究报告。

Stars: 1 | Forks: 0

Deep Research

作为 Claude Code 技能的情报级多源研究——来源分级、人工把关、证据锚定。

将其指向一个研究问题。它会将该问题分解为正交的子问题，通过 Tavily 在分层的域名注册表中进行检索，根据 NATO Admiralty 2×6 矩阵对每个来源进行评级，并交给你一份报告，其中每个主张都能追溯到具体的 URL 和明确的置信度标签。详尽模式的运行可触及 **100+ 个来源**。质量受到绝对门限的严格控制——接地性、来源质量、交叉印证以及去相关蕴含判断器——每次运行均会进行验证，而非依赖外部产品。 ## 你将获得什么在你的调用目录中会有四个产出物，在运行结束时以原子方式写入。 ``` research-plan.md # approved by you at the human gate research-report.md # final synthesis, inline citations, confidence tags research-sources.json # every cited source, Admiralty-graded research-evidence.json # claim → source mapping, credibility 1–6 ``` 正好四个——始终如此。可选的 [`--suggest-tooling`](#companion-skill-tooling-recommender) 标志会添加第五个文件（`research-toolbox.md`），该文件由*单独的*配套技能编写，而非引擎本身；四个产出物的契约保持不变。 ### `research-report.md` 摘录 ``` # 2026年 EU AI Act 对开源 model 提供者的影响 > Research date: 2026-04-17 · Length: short > Source count: 16/28 · Tier 1/2 share: 94% · Median date: 2025-09-08 ## 执行摘要 - GPAI obligations under Articles 53–55 entered application on 2 August 2025, with systemic-risk provisions applying above the 10²⁵ FLOPs threshold.[^1][^2] - The open-source exemption (Article 2(5g)) excludes free and open-source GPAI models from several transparency obligations unless they meet the systemic-risk threshold.[^1][^3] [CONFIRMED] ## 1. 2026年生效的 GPAI 条款 Under Article 53 of Regulation (EU) 2024/1689 [...].[^1][^4] The European AI Office published its Code of Practice on 2025-07-10 [...].[^5] [CONFIRMED] ## 矛盾与公开辩论 The scope of "sufficiently detailed summary" of training data remains disputed. The Commission's July 2025 template[^5] is interpreted by Meta[^6] as [...], while Mozilla[^7] argues [...]. [POSSIBLY TRUE — contested] ## 需要验证 - Claim that compliance costs exceed €1M for small open-source providers — rests on a single trade-press source[^12] without regulatory corroboration. ## 来源 [^1]: Regulation (EU) 2024/1689 — eur-lex.europa.eu — Tier 1, Admiralty A1 [^2]: European AI Office GPAI guidance — digital-strategy.ec.europa.eu — Tier 1, A1 [...] ``` ### `research-evidence.json` schema ``` { "claim_id": "C001", "claim_text": "GPAI obligations under Articles 53–55 entered application on 2 August 2025.", "supporting_source_ids": ["S001", "S002"], "contradicting_source_ids": [], "admiralty_credibility": 1, "label": "CONFIRMED", "corroboration_count": 2, "independent_tier12_count": 2, "primary_source_present": true } ``` 报告中的每一个主张都有对应的记录。没有 URL 是捏造的；没有任何主张会对其来源保持沉默。 ## 快速开始 ### 安装 ``` gh repo clone hashbulla/deep-research ~/.claude/skills/deep-research ``` Claude Code 会自动发现该技能。无需重启。 ### 运行 ``` # 标准运行，从问题中推断语言 /deep-research impact of EU AI Act on open-source model providers in 2026 # 使用法语进行详尽运行，包含 recency 和自定义 domains /deep-research --length exhaustive --lang fr \ --since 2025 --domains anthropic.com,mistral.ai \ comparaison LangGraph / CrewAI / AutoGen / Claude Agent SDK # 具有 recency gate 的窄事实 /deep-research --since 2025 prompt caching cost-performance tradeoffs ``` ### 前置条件 | 要求 | 原因 | 检查方式 | |:------------|:----|:------| | ![Claude Code](https://img.shields.io/badge/Claude_Code-required-7C3AED?style=flat-square) | 技能的运行时环境 | `claude --version` | | ![Opus](https://img.shields.io/badge/Opus/Sonnet_4.6%2B-recommended-E04E2A?style=flat-square) | 顶尖的推理能力有助于提升综合质量和 Admiralty 严谨度 | `/model opus` | | ![Tavily MCP](https://img.shields.io/badge/Tavily_MCP-required-1F2328?style=flat-square) | 用于每次检索调用。`WebSearch` 仅作为备选方案。 | 在 `/mcp` 中可见 | | ![gh CLI](https://img.shields.io/badge/gh_CLI-optional-6B7280?style=flat-square) | 从此仓库安装，以及进行 GitHub 深度研究（SOTA 代码库发现）。若缺失 → 平滑回退至 Tavily | `gh auth status` | | ![python3](https://img.shields.io/badge/Python_3.10%2B-required-3776AB?style=flat-square) | 运行 `scripts/verify_gates.py`（仅依赖标准库，零网络请求）以进行确定性的门限验证 | `python3 --version` | | ![Context7 MCP](https://img.shields.io/badge/Context7_MCP-optional-6B7280?style=flat-square) | 在指定了依赖项的技术运行中获取最新版本的库文档。若缺失 → 平滑回退至 Tavily | 在 `/mcp` 中可见 | | ![Newsletter corpus](https://img.shields.io/badge/Newsletter_corpus-optional-6B7280?style=flat-square) | 在与工作相关的运行中，将维护者精选的每日简报作为路由信号折叠进来。若缺失 → 平滑回退至 Tavily | `ls ~/.claude/deep-research/newsletter-corpus/` | ### 调用标志 | 标志 | 值 | 默认值 | 效果 | |------|--------|---------|--------| | `--length` | `short` \| `standard` \| `exhaustive` | `standard` | 校准子问题数量、召回广度、来源目标（15–25 / 35–60 / **100+**） | | `--lang` | ISO 639-1 | 推断得出 | `research-report.md` 的输出语言 | | `--since` | `YYYY` 或 `YYYY-MM-DD` | 推断得出 | 来源发布日期的下限 | | `--domains` | 逗号分隔列表 | 分层配置文件 | 附加到分层配置文件中的额外允许列表 | | `--exclude` | 逗号分隔列表 | 分层屏蔽列表 | 额外的屏蔽列表 | | `--profile` | `academic` \| `technical` \| `current-affairs` \| `mixed` | 推断得出 | 从分层注册表中选择 `include_domains` 基线 | | `--min-corroboration` | 整数 ≥ 1 | `2` | 将主张标记为 CONFIRMED 所需的最少独立 Tier 1/2 来源数量 | | `--model` | `opus` \| `fable` | `opus` | 综合层级——Claude-Code 原生（会话模型 + 子智能体覆盖，零 API 密钥）。Fable 5 为可选启用，成本约为 2 倍（[详情](references/model-tiers.md)） | | `--confidential` | 标志 | 关闭 | 机密路径运行：子智能体仅接收中立的引用；严谨度提升（[详情](references/model-tiers.md)） | | `--rigor` | `standard` \| `critical` | `standard`（`--confidential` 暗示使用 `critical`） | 验证深度——蕴含判断器范围、无来源则拒绝、强制锚点、阿谀奉承探测（[详情](references/quality-gate.md)） | | `--suggest-tooling` | 标志 | 关闭 | 在阶段 6 完成后，将完成的运行委托给 `suggest-tooling` 兄弟技能，该技能会提出与工作相关的 Claude Code 技能、插件和 MCP 服务器，并编写 `research-toolbox.md`。默认关闭——不启用时运行结果在字节层面完全一致。引擎仍然只输出正好四个产出物；`suggest-tooling` 是一个单独的技能，用于编写第 5 个文件，并且永远不会自动安装任何内容。 | ## 流水线 ``` flowchart TD Start(["/deep-research <question>"]) --> P0 subgraph P0["Phase 0 — Query Architect"] B1[Parse question & flags] --> B2[Classify: academic/technical/current/mixed] B2 --> B3[Decompose into sub-questions
factual · contextual · contradictory · recency] B3 --> B4[Assemble tier profile
+ include_domains preview] B4 --> B5[Write research-plan.md] end P0 --> G1 G1{{"HUMAN GATE\n\nreview the plan\napprove / edit / cancel\nno Tavily call before approval"}} G1 -- "approve" --> P1 subgraph P1["Phase 1 — Broad Retrieval"] direction LR R1["tavily_search
search_depth=advanced"] --> R2["tavily_map
for domain discovery"] R2 --> R3["Paced under 20 req/min"] R3 --> R4["Conditional sources
declared in the plan:
GitHub · academic · Context7 · newsletter-signal
→ graceful Tavily degradation"] end P1 --> P2 subgraph P2["Phase 2 — Source Grading"] direction LR S1["score > 0.7"] --> S2["Tier 1–4 classification"] S2 --> S2b["MBFC credibility overlay
flag / downgrade at the margin
(optional, user-scope dataset)"] S2b --> S3["CRAAP Currency + Authority"] S3 --> S4["Admiralty A–F"] S4 --> S5["Dedupe canonical URL
punycode check"] end P2 --> P3 subgraph P3["Phase 3 — Precision Rerank"] direction LR J1["LLM-as-judge pointwise
≤10 docs per sub-question"] --> J2["Primary vs secondary"] J2 --> J3["Top 5–7 selected"] end P3 --> P4 subgraph P4["Phase 4 — Deep Extract & Synthesis"] direction LR E1["tavily_research
mini / pro"] --> E2["tavily_extract
extract_depth=advanced"] E2 --> E3["Write research-report.md
surgical quotes only"] end P4 --> P5 subgraph P5["Phase 5 — Grounding Validation (CRAG)"] direction LR C0["Decorrelated entailment judge
different Claude model, claim + span only
scope by rigor profile"] --> C1["groundedness ≥ 0.95?
corroboration ≥ 0.80?"] C1 --> C2{gates pass} C2 -- "no, <2 iters" --> Req["rewrite query →
tavily_search supplement"] Req --> C1 C2 -- "yes, or 2 iters done" --> C3["move failing claims
to Needs Verification"] end P5 --> P6 subgraph P6["Phase 6 — Confidence Annotation"] direction LR Z1["Admiralty credibility 1–6
per claim"] --> Z2["CONFIRMED / PROBABLY TRUE /
POSSIBLY TRUE → main body"] Z2 --> Z3["DOUBTFUL / IMPROBABLE /
UNVERIFIED → Needs Verification"] end P6 --> Done(["4 artifacts written atomically"]) style G1 fill:#FEF3C7,stroke:#D97706,color:#92400E style P1 fill:#DBEAFE,stroke:#3B82F6 style P2 fill:#FEF9C3,stroke:#CA8A04 style P3 fill:#FEF9C3,stroke:#CA8A04 style P4 fill:#DCFCE7,stroke:#059669 style P5 fill:#FEE2E2,stroke:#EF4444 style P6 fill:#F3F4F6,stroke:#6B7280 style Start fill:#7C3AED,stroke:#7C3AED,color:#fff style Done fill:#059669,stroke:#059669,color:#fff ``` ### 人工门限只需花费你三分钟的注意力，做一个决定。你审查该计划——分类、子问题分解、域名允许列表预览、预计的 Tavily 调用次数、停止条件——然后批准或编辑。**在批准之前，不会触发任何检索调用。** 这是不可妥协的；这是防止浪费运行时间和偏离研究目标的最有效干预措施。 | 门限 | 目的 | 时间 | |:-----|:--------|:-----| | ![Gate](https://img.shields.io/badge/Phase_0-Plan_Approval-D97706?style=flat-square) | 在消耗任何 Tavily 额度之前，捕获错误的分类、错误的分层配置文件以及遗漏的子问题类别 | ~2–3 分钟 | ## 来源评级三种重叠的准则，在阶段 2 中确定性应用，在阶段 3 中概率性应用。 ### 1. 分层注册表在任何内容进入综合 prompt 之前，对每个域名进行分类。基于 Anthropic 的内部发现：不受约束的智能体会倾向于 SEO 内容。 | 层级 | 示例 | Admiralty 可靠性 | 用法 | |:----:|:---------|:---------------------:|:------| | **1** | `arxiv.org`, `pubmed.ncbi.nlm.nih.gov`, `nature.com`, `*.gov`, `*.europa.eu`, `who.int` | **A** | 主要来源；在 `include_domains` 中首选 | | **2** | `anthropic.com`, `openai.com`, `reuters.com`, `ft.com`, `docs.python.org`, `gartner.com` | **B** | 默认检索基线（Tier 1+2 并集） | | **3** | `techcrunch.com`, `wired.com`, `arstechnica.com`, `substack.com`（机构附属） | **C** | 仅在有 Tier 1/2 交叉印证时可接受 | | **4** | `reddit.com`, `x.com`, `linkedin.com`, `medium.com` | **D–F** | 绝不作为主要来源。仅作为社交信号指针，放在带有标签的 `Signals` 小节中 | 完整列表见 [`references/methodology.md §6`](references/methodology.md)。 ### 2. NATO Admiralty 2×6 矩阵两个正交轴：来源的**可靠性**，交叉印证后信息的**可信度**。每个被引用的来源都带有一个可靠性字母；`research-evidence.json` 中的每个主张都带有一个可信度数字。 ``` flowchart LR Tier["Domain tier
(1–4)"] --> Rel["Reliability
A / B / C / D / E / F"] Rel --> Pair["Source record
e.g. A1, B2, C3"] Corr["Independent
corroborating sources"] --> Cred["Credibility
1 / 2 / 3 / 4 / 5 / 6"] Contra["Contradicting
sources"] --> Cred Cred --> Pair Pair --> Label["Claim label
CONFIRMED / PROBABLY TRUE /
POSSIBLY TRUE / …"] style Tier fill:#DBEAFE,stroke:#3B82F6 style Rel fill:#DBEAFE,stroke:#3B82F6 style Corr fill:#FEF3C7,stroke:#D97706 style Cred fill:#FEF3C7,stroke:#D97706 style Contra fill:#FEE2E2,stroke:#EF4444 style Pair fill:#DCFCE7,stroke:#059669 style Label fill:#059669,stroke:#059669,color:#fff ``` 可信度规则是确定性的——没有基于 LLM 流畅度的猜测。该表格展示了 [`references/methodology.md §4.1`](references/methodology.md) 中的规范优先级级联（首先匹配的行胜出；出现分歧时以级联为准）： | 可信度 | 条件（首个匹配项生效） | 标签 | |:-----------:|:----------|:------| | 1 | ≥2 个独立的 Tier 1/2 来源一致，无 Tier 1/2 矛盾来源 | **CONFIRMED** | | 2 | ≥1 个 Tier 1 来源且无矛盾来源；或 ≥2 个 Tier 1/2 来源且仅有 1 个矛盾来源 | **PROBABLY TRUE** | | 3 | 单一 Tier 1/2 来源，无矛盾来源（Tier 3 印证不会提升等级） | **POSSIBLY TRUE** | | 4 | ≥1 个 Tier 1/2 支持来源，且 ≥1 个同等权威的矛盾来源 | **DOUBTFUL** | | 5 | 被 ≥2 个 Tier 1/2 来源反驳 | **IMPROBABLE** | | 6 | 仅有 Tier 3/4 支持来源，或零支持来源 | **UNVERIFIED** | 标签 4/5/6 不会出现在报告正文中——它们会被导向 **Needs Verification** 小节，并附有明确的原因。 ### 3. 质量门限在综合完成后应用。未通过门限会在任何主张输出之前触发 CRAG 重新查询循环（每个子问题最多 2 次迭代）。 | 门限 | 阈值 | 失败操作 | |------|:---------:|:---------------| | 接地性 | ≥ 0.95 | 针对无支撑主张进行 CRAG 重新查询 | | 来源质量 | ≥ 0.80 Tier 1/2 | 扩展允许列表，重新检索 | | 覆盖率 | ≥ 0.90 子问题 | 添加后续子问题 | | 新鲜度 | 中位数在 `--since` 窗口内 | 添加时效性子问题 | | 交叉印证率 | ≥ 0.80 | 重新查询；否则 → Needs Verification | 完整阈值见 [`references/quality-gate.md`](references/quality-gate.md)。 ## 架构七阶段编排器，单一 `SKILL.md` 入口点，方法论外置到按需加载的参考文件中。 ``` graph LR O["SKILL.md
Orchestrator (7 phases)"] O -. Phase 0 reads .-> M["references/methodology.md
Operational spec (report distillation)"] O -. Phase 0 reads .-> T["references/tool-routing.md
Tavily binding table"] O -. Phase 0 reads .-> P["references/research-plan-template.md
Phase 0 scaffold"] O -. Phase 4 reads .-> S["references/report-structure.md
Output + JSON schemas"] O -. Phase 5 reads .-> Q["references/quality-gate.md
Deterministic thresholds"] O -. any phase .-> A["references/anti-patterns.md
Forbidden behaviors"] O -- calls --> TVS(["mcp__tavily__tavily_search"]) O -- calls --> TVR(["mcp__tavily__tavily_research"]) O -- calls --> TVE(["mcp__tavily__tavily_extract"]) O -- calls --> TVM(["mcp__tavily__tavily_map"]) O -- writes --> plan[("research-plan.md")] O -- writes --> report[("research-report.md")] O -- writes --> sources[("research-sources.json")] O -- writes --> evidence[("research-evidence.json")] ``` ### 文件结构 ``` ~/.claude/skills/deep-research/ ├── .claude/CLAUDE.md # Maintainer spec anchor — invariants, gotchas, conventions ├── SKILL.md # Orchestrator — 7 phases, human gate, provenance block ├── deep-research-report.md # Methodology source of truth (cited below) ├── scripts/ │ ├── verify_gates.py # Deterministic gate verification (stdlib-only, zero network) │ ├── github_rank.py # Composite GitHub-repo ranking (scoring only, zero network) │ └── academic_graph.py # Dual-track paper ranking + BibTeX/RIS export (zero network) └── references/ ├── methodology.md # Full distillation — tier registry, Admiralty, CRAAP, CRAG ├── tool-routing.md # Tavily MCP tool selection per intent ├── report-structure.md # research-report.md structure + JSON schemas ├── quality-gate.md # Deterministic thresholds, CRAG triggers ├── anti-patterns.md # Non-negotiables (no fabricated URLs, no WebSearch, etc.) ├── research-plan-template.md # Phase 0 scaffold ├── model-tiers.md # Model-tier policy (opus default, fable opt-in) ├── github-research.md # GitHub SOTA-repo discovery (sharding, expert prior, fake-star gate) ├── academic-research.md # Scholarly pipeline (open graph, dual-track, OA-only ingestion) ├── newsletter-signal.md # Curated-feed routing source (local FTS5 corpus, never cited) └── examples.md # Worked examples (read on demand) ``` ### 设计决策 | 决策 | 选择 | 理由 | |:---------|:-------|:----------| | 主要检索 | `tavily_search search_depth=advanced` | 分阶段控制；Tavily 已对语义分块进行重排 | | 针对狭窄子问题的综合 | `tavily_research model=mini\|pro` | 在有用时委托内部类 Perplexity 循环 | | 阶段 2 重排 | LLM-as-judge（≤10 个文档） | Cohere Rerank / 交叉编码器在 MCP 中不可用；判断器可近似达到交叉编码器的准确度 | | 前置上下文过滤 | 对 Tavily 结果应用内联 Claude 推理 | Anthropic 的动态过滤仅在 API 端可用；内联过滤可实现同等的严谨度 | | 来源评级 | NATO Admiralty 2×6 | 情报级来源溯源，人类可用，确定性 | | 矛盾处理 | 专门的报告章节 | 报告 §1 阶段 4——从不静默，从不在同等权威的来源之间自动解决 | | 门限验证 | `scripts/verify_gates.py`（仅依赖标准库，零网络请求） | 计数、比率、中位数、级联合规性以及 CWD-report SHA-256 均在运行时通过脚本验证——LLM 自报告的指标不能作为门限 | | 条件检索来源 | GitHub / 学术 / Context7 / newsletter-signal，在阶段 0 进行门控 | 在计划中声明，仅在相关时触发；任何缺失的 MCP/CLI/凭据/语料库平滑回退至仅使用 Tavily，并记录在 Methodology 注释中 | | Newsletter-signal 评级 | 仅作路由信号，绝不作为可引用记录 | 简报作为检索种子；指向的 URL 按正常流程评级并标记为 `surfaced via newsletter-signal corpus `。避免了循环的“我自己的摘要这么说”的权威性 | | 严谨度配置文件 | 默认为 `standard` · `critical`（由 `--confidential` 暗示） | 日常工具保持快速；高风险运行升级为对每个主张进行蕴含验证、无来源则拒绝、强制锚点以及阶段 0 阿谀奉承探测 | | 模型选择 | Claude-Code 原生（会话模型 + 子智能体覆盖），零 API 密钥 | 消费者均为无密钥的 Claude Max 用户；默认 Opus 4.8，可通过 `--model` 选启用 Fable 5，无 SDK 客户端 | | 保真度判断器 | 在*不同*的 Claude 模型上运行的去相关子智能体，仅限主张 + 跨度 | 在无需外部密钥的情况下打破同模型自评估的循环；评估并放弃了外部 Gemini/GPT 判断器（零密钥契约） | ## 配套技能：工具推荐器 `suggest-tooling` 是此仓库中的一个兄弟技能。它消耗已完成的 `/deep-research` 运行结果，并提出值得采纳以应对研究所发现问题的 Claude Code **技能、插件和 MCP 服务器**——根据你的工作情况进行相关性排名，并针对供应链安全进行**信任评级**。它从不安装任何东西。 | 属性 | 详情 | |:---------|:-------| | **调用** | `/suggest-tooling `，或者在深度研究运行中设置 `--suggest-tooling`，以在阶段 6 结束时自动委托 | | **六个发现渠道** | GitHub、MCP Registry、Claude Code 市场、Vercel skills、Smithery 和 awesome-* 列表（仅作为种子）——各自可独立降级 | | **信任层级** | `VERIFIED` · `MAINTAINED` · `COMMUNITY` · `CAUTION`，根据官方/已验证命名空间状态、签名出处、近期维护情况以及虚假信号分歧门限确定性地推导得出——与相关性排名保持**正交** | | **从不自动安装** | 安装命令仅显示为文本，绝不执行。每个列表和 README 都被视为不受信任的数据（反模式 A6）——仅用于解析标识符，从不遵照执行 | | **输出** | `research-toolbox.md` + `research-toolbox.json` sidecar（第 5 个产出物） | | **确定性** | 排名和分级完全在 `suggest-tooling/scripts/marketplace_rank.py` 中运行——仅依赖标准库、零网络、零 LLM——复用了从 `scripts/github_rank.py` 提取的虚假星标分歧门限 | 它本身就是通过对发现场景进行 `/deep-research` 运行并自行测试而设计的；那份评级报告保存在 [`docs/superpowers/`](docs/superpowers/) 下，作为该设计的证据基础。 ## 故障排除

Tavily MCP 未注册

如果在阶段 0 看不到 `mcp__tavily__*` 工具，技能将停止运行。在用户作用域内注册 Tavily 远程 MCP 服务器： ``` claude mcp add --scope user tavily --transport http https://mcp.tavily.com/mcp/ ``` 然后重新调用。

Tavily 频率限制 (429)

Research endpoint 的上限为 20 次请求/分钟。技能会以 30 秒 → 60 秒 → 120 秒的间隔进行退避。遇到持续的 429 错误时，受影响的子问题会自动降级为 `tavily_search` + 手动分解。这记录在最终报告的 Methodology 注释中。

详尽运行返回的来源数量 < 100

该技能在进入阶段 4 之前会运行一轮扩展——将允许列表扩大到 Tier 1+2 并集，添加 2–4 个上下文/时效性子问题。如果在该轮之后仍然不足，它仍会继续执行，并由 Methodology 注释记录原因（例如：话题狭窄、付费墙主导的领域）。100+ 的目标是一个质量校准项，而非硬性契约。

我期望看到的主张最终被放入了 Needs Verification

交叉印证阈值默认为 `--min-corroboration 2`。一个没有第二个独立印证者的单一 Tier 1 来源，其可信度为 2（PROBABLY TRUE）；一个单一的 Tier 2 来源可信度为 3——两者仍保留在正文中并带有内联标签。可信度 4–6（被反驳、仅有 Tier 3/4 支持等）将导向 Needs Verification。如果需要更严格的运行，请设置 `--min-corroboration 3`，或者检查 `research-evidence.json` 中的确切支持图。

我想覆盖分层配置文件

使用 `--profile academic|technical|current-affairs|mixed` 或直接传入 `--domains`。用户指定的域名会与分层配置文件取并集；它们绝不会在静默状态下被丢弃。任何低于 Tier 2 的 `--domains` 条目都会在 `research-plan.md` 中被标记，以便在人工门限时得到你的确认。

输出语言不匹配

`--lang` 优先于问题的语言。该技能会在翻译后的报告中保留原始语言的专有名词（立法名称、机构名称、论文标题），以保持引用的可追溯性。

## 扩展 | 目标 | 方法 | |:-----|:----| | **调整分层注册表** | 编辑 [`references/methodology.md §6`](references/methodology.md)。将域名添加到 Tier 1/2；在计划模板顶部重建 include_domains 预览。 | | **调整质量门限** | 编辑 [`references/quality-gate.md`](references/quality-gate.md)。阈值是确定性的；将接地性提高到 0.98 只会触发更多的 CRAG 迭代。 | | **添加子问题类别** | 编辑 `SKILL.md` 阶段 0 步骤 4，并在 [`references/research-plan-template.md`](references/research-plan-template.md) 中同步更新。 | | **更改默认长度校准** | 编辑 [`references/methodology.md`](references/methodology.md) 中的“长度校准”表。 | | **将 Tavily 替换为另一个 MCP** | 编辑 [`references/tool-routing.md`](references/tool-routing.md) 以及 `SKILL.md` 中的阶段 1 / 阶段 4 调用模板。保持评级阶段不变——它们是与 MCP 无关的。 | | **提供 newsletter-signal 语料库** | 将脱敏的 `briefs/YYYY-MM.jsonl` 文件（schema 见 [`tests/schema/newsletter-corpus-record.schema.json`](tests/schema/newsletter-corpus-record.schema.json)）放入 `~/.claude/deep-research/newsletter-corpus/`——这是用户作用域的，位于此仓库之外，与 `experts.yaml` 同级。生产者（例如 newsletter 智能体）通过 GitHub Contents API 提交它们；技能读取的是本地克隆。参见 [`references/newsletter-signal.md`](references/newsletter-signal.md)。 | | **调整工具推荐器** | 编辑 `suggest-tooling/references/tooling-categories.md`（封闭类别分类法）以及 `~/.claude/deep-research/tooling-hats.json`（用户作用域；模板作为 `suggest-tooling/tooling-hats.json.example` 提供）中的权重。类别集会在 CI 中针对排名器进行一致性检查。 | ## 路线图 ### 已交付 | 状态 | 功能 | |:------:|:--------| | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | 带有人工门限的 7 阶段流水线 | | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | 具有确定性可信度分配的 NATO Admiralty 2×6 评级 | | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | CRAG 接地循环（最多 2 次迭代，可平滑回退至 Needs-Verification） | | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | Unicode 同形异义词防御（对每个域名进行 punycode 归一化） | | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | 100+ 来源的详尽校准 | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | 条件检索来源——GitHub SOTA 代码库发现、学术开放图谱流水线（OpenAlex / arXiv / Semantic Scholar）、Context7 文档 | | ![0.4.0](https://img.shields.io/badge/-0.4.0-059669?style=flat-square) | Newsletter-signal 条件来源——通过本地 FTS5 语料库将维护者精选的每日简报整合到与工作相关的运行中（仅作路由信号，从不引用） | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | Claude-Code 原生模型层级（默认 Opus，可选启用 Fable 5）+ `critical` 严谨度配置文件 | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | 去相关蕴含判断器（不同的 Claude 模型）——无需外部密钥即可打破 LLM-as-judge 循环 | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | 分层注册表上的 MBFC 可信度覆盖（在边缘处进行标记/降级） | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | 引用图谱导出——用于学术交接的 BibTeX / RIS | | ![0.3.0](https://img.shields.io/badge/-0.3.0-059669?style=flat-square) | 五层评估测试套件 + 冻结的基准测试集，由保密门控的 CI 判断器运行 | | ![Done](https://img.shields.io/badge/-Done-059669?style=flat-square) | [`suggest-tooling`](#companion-skill-tooling-recommender) 配套技能 + `--suggest-tooling` 委托标志——跨六个发现渠道的、经过信任评级的技能/插件/MCP 推荐器，从不自动安装 | ### 已考虑 / 推迟 | 状态 | 项目 | 原因 | |:------:|:-----|:----| | ![Deferred](https://img.shields.io/badge/-Deferred-D97706?style=flat-square) | Exa `findSimilar` / Valyu 全文学术索引 | 在 0.3.0 学术流水线工作期间进行了评估；最终发布了开放图谱流水线（OpenAlex / arXiv / Semantic Scholar）。仅当召回率基准测试显示出实质性差距时才重新审视，且仅作为可选的、需要密钥门控的来源 | | ![Dropped](https://img.shields.io/badge/-Dropped-6B7280?style=flat-square) | 外部模型判断器（Gemini / GPT-4） | 被上述去相关的 Claude 判断器取代；零 API 密钥契约（消费者均为 Claude Max 用户）排除了外部提供商 | | ![Dropped](https://img.shields.io/badge/-Dropped-6B7280?style=flat-square) | *对比* Perplexity Deep Research 的质量基准 | 测试套件随附的是绝对质量门限——接地性、来源质量、交叉印证、蕴含——通过冻结的测试集在每次运行中进行追踪。Perplexity 不再作为参考比较对象 | ## 研究基础完整的方法论位于 [`deep-research-report.md`](deep-research-report.md) 中——这是一份关于 AI 智能体最先进网络搜索技术的独立情报简报。`references/methodology.md` 文件是该报告近乎逐字的提炼，每一条技能规则都反向引用到其来源章节（`[R§n]`）。该报告综合了以下来源，每个来源都直接关联到技能的特定部分： | 来源 | 用于 | 技能位置 | |:-------|:---------|:---------------| | [Perplexity Deep Research]() | 5 阶段检索-推理-精炼循环、分层来源偏好 | SKILL.md 阶段 0–6 结构 | | [Anthropic · Multi-agent research system](https://www.anthropic.com/engineering/built-multi-agent-research-system) | 编排器-工作者模式、“先宽后窄”、SEO 农场失败模式 | 阶段 1 并行检索、prompt 结构 | | [Anthropic · Web search tool (domain controls)](https://docs.claude.com/en/docs/build-with-claude/tool-use/web-search-tool) | `allowed_domains` / `blocked_domains`、Unicode 同形异义词防御 | 阶段 2 域名归一化 | | [Anthropic · Building effective agents](https://www.anthropic.com/research/building-effective-agents) | 扩展/交错思考、作为首要指标的来源质量 | 阶段 0 分解、阶段 3 重排 | | [Anthropic · Agent evaluation guide (2026)](https://www.anthropic.com/engineering/claude-evals) | 接地性 + 覆盖率 + 来源质量三要素 | 阶段 5 CRAG 门限 | | [Tavily API documentation](https://docs.tavily.com/) | `search_depth=advanced`、`include_domains`、`score` 过滤器、Research endpoint | 阶段 1、阶段 4 工具路由 | | [Corrective RAG (CRAG)](https://arxiv.org/abs/2401.15884) | 综合后主张评级与重新查询循环 | 阶段 5 | | [LevelRAG / PRISM query decomposition](https://arxiv.org/abs/2502.18139) | 针对多跳查询的 CoT 分解模式 | 阶段 0 | | [NATO Admiralty source grading](https://en.wikipedia.org/wiki/Admiralty_code) | 用于情报级来源溯源的 A–F × 1–6 矩阵 | 阶段 2、阶段 6 | | [CRAAP Test](https://library.csuchico.edu/craap-test) | 时效性 · 相关性 · 权威性 · 准确性 · 目的性自动化 | 阶段 2 过滤门限 | | [OSINT five-step validation](https://www.osintframework.com/) | 独立交叉印证、主要来源链、置信度陷阱防御 | 阶段 5、反模式 | 精选的域名分层注册表（约 60 个 Tier 1 域名 + 约 40 个 Tier 2 域名）嵌入在 [`references/methodology.md §6`](references/methodology.md) 中。完整的参考书目和方法论原理见 [`deep-research-report.md`](deep-research-report.md)。 ## 仓库布局 ``` deep-research/ ├── .claude/CLAUDE.md # maintainer spec anchor ├── README.md # this file ├── LICENSE # MIT ├── SKILL.md # skill entry point ├── deep-research-report.md # methodology source of truth ├── scripts/ │ ├── verify_gates.py # deterministic gate verification (stdlib-only, zero network) │ ├── github_rank.py # composite GitHub-repo ranking + exported fake_star_gate() (zero network) │ ├── academic_graph.py # dual-track paper ranking + BibTeX/RIS export (zero network) │ ├── newsletter_search.py # newsletter-signal corpus search (in-memory FTS5, zero network) │ └── eval_harness/ # 5-layer verification harness (judge prompts + secret-gated CI runner) ├── experts.yaml.example # anonymous template — the real seed lives user-scope, outside the repo ├── references/ │ ├── methodology.md │ ├── tool-routing.md │ ├── report-structure.md │ ├── quality-gate.md │ ├── anti-patterns.md │ ├── research-plan-template.md │ ├── model-tiers.md # model-tier policy (opus default, fable opt-in) │ ├── github-research.md # GitHub SOTA-repo discovery │ ├── academic-research.md # scholarly open-graph pipeline │ ├── newsletter-signal.md # curated-feed routing source (local FTS5 corpus) │ └── examples.md # worked examples (read on demand) ├── suggest-tooling/ # companion skill — trust-graded tooling recommender │ ├── SKILL.md # entry point, 6-connector orchestration, no-auto-install │ ├── references/ # tooling-discovery / tooling-categories / toolbox-output │ ├── scripts/marketplace_rank.py # composite rank + trust tiers (stdlib-only, zero network) │ ├── tooling-hats.json.example # hat-weight template (real file lives user-scope) │ └── evals/ # loading + e2e fixtures for the five failure modes ├── docs/superpowers/ # design specs, plans, and the dogfood research run ├── examples/eu-ai-act-2026/ # end-to-end fixture (4 artifacts, gate-conformant) ├── evals/ # loading / progressive / e2e + sycophancy-probes + benchmark-testset + rubric ├── CHANGELOG.md # semver release history (append-only) ├── gotchas-log.md # maintainer traps + perishable-asset cadences ├── tests/ # cross-reference / provenance / schema / invariant checks │ ├── check-cross-references.sh │ ├── check-provenance.sh │ ├── check-schema.sh │ ├── check-example-invariants.sh │ ├── check-newsletter-search.sh │ ├── check-marketplace-rank.sh # suggest-tooling ranker unit + regression checks │ ├── schema/{research-sources,research-evidence}.schema.json │ └── fixtures/ → examples/eu-ai-act-2026/*.json └── .github/workflows/validate.yml # CI — runs the check suite on push + PR ``` ### 同步模型此仓库是规范的副本。如果你使用符号链接而不是克隆（例如，从包含该仓库的工作目录中执行 `ln -s "$PWD" ~/.claude/skills/deep-research`），编辑内容将立即传播给 Claude Code。默认的 `gh repo clone hashbulla/deep-research ~/.claude/skills/deep-research` 安装会创建一个常规的克隆——从该目录向上游提交以发布更改。

标签：AI智能体, Claude Code, Tavily, 信息检索, 文献评估, 深度研究, 自动化报告, 逆向工具