MiroMindAI/MiroThinker

GitHub: MiroMindAI/MiroThinker

一个专为复杂研究与预测任务优化的开源深度研究智能体，支持长上下文和多轮工具交互。

Stars: 6715 | Forks: 506

[![博客](https://img.shields.io/badge/Blog-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/#blog) [![数据](https://img.shields.io/badge/Data-0040A1?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1) [![GITHUB](https://img.shields.io/badge/Github-24292F?style=for-the-badge&logo=github&logoColor=white)](https://github.com/MiroMindAI) [![网站](https://img.shields.io/badge/Website-4285F4?style=for-the-badge&logo=google-chrome&logoColor=white)](https://miromind.ai/) [![DISCORD](https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/GPqEnkzQZd)

### 🚀 [试用 MiroThinker！](https://dr.miromind.ai/)

**MiroThinker**：一个专为研究和预测优化的深度研究 Agent。它在极具挑战性的 BrowseComp 基准测试中取得了 88.2 的成绩。参见[快速开始](#-quick-start)。 ## 📋 目录 - 📰 [新闻与更新](#-news--updates) - 📝 [简介](#-introduction) - ✨ [核心特性](#-key-features) - 📈 [基准测试表现](#-performance-on-benchmarks) - 🚀 [快速开始](#-quick-start) - 📊 [基准评测](#-benchmark-evaluation) - 🔬 [轨迹收集](#-trace-collection) - ❓ [常见问题与故障排除](#-faq--troubleshooting) - 📄 [许可证](#-license) - 🙏 [致谢](#-acknowledgments) ## 📰 新闻与更新 - **[2026-03-11]** 🎉🎉🎉 推出 [MiroThinker-1.7](https://huggingface.co/collections/miromind-ai/mirothinker-17)，包括 [MiroThinker-1.7-mini](https://huggingface.co/miromind-ai/MiroThinker-1.7-mini) 和 [MiroThinker-1.7](https://huggingface.co/miromind-ai/MiroThinker-1.7)。MiroThinker-1.7-mini 在 BrowseComp-ZH 上取得 72.3 的成绩，仅使用 30B 参数即在开源模型中创下新的 SOTA。我们的 proprietary agent MiroThinker-H1 在开源和商业模型中的 BrowseComp 和 BrowseComp-ZH 上取得了领先表现。 - **\[2026-01-23\]** 🎉 我们为 [MiroThinker 在线版](http://dr.miromind.ai)带来了两项重要更新：核心研究报告生成：深度研究在线报告现已支持生成、预览和分享。扩展文档上传类型：现支持多种文件格式的上传，例如 `.pdf`、`.doc`、`.ppt`、`.xls`、`.jpg`。欢迎试用！MiroThinker 将持续维护和迭代升级，目标是成为您用过的最好的 Research Agent！ - **\[2026-01-05\]** 🎉🎉 我们发布 [MiroThinker-v1.5](https://huggingface.co/collections/miromind-ai/mirothinker-v15)，这是一系列专为金融预测优化的开源深度研究 Agent。[MiroThinker-v1.5-30B](https://huggingface.co/miromind-ai/MiroThinker-v1.5-30B) 以低得多的成本在 BrowseComp-ZH 上超越了 Kimi-K2-Thinking，仅使用了 1/30 的参数。[MiroThinker-v1.5-235B](https://huggingface.co/miromind-ai/MiroThinker-v1.5-235B) 在 HLE-Text 上得分 39.2%，BrowseComp 69.8%，BrowseComp-ZH 71.5%，GAIA-Val-165 80.8%，在搜索 Agent 中创下了新的 state-of-the-art。

📜 点击展开过往更新

- **\[2025-11-13\]** 🎉 [MiroThinker-v1.0](https://huggingface.co/collections/miromind-ai/mirothinker-v10) 现已发布！引入**交互式扩展**作为性能提升的第三维度，MiroThinker v1.0 支持 256K 上下文窗口，每个任务最多 600 次 tool calls。提供 8B、30B 和 72B 参数规模，在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Text-103 上分别达到 37.7%、47.1%、55.6% 和 81.9%。详见[技术报告](https://arxiv.org/abs/2511.11793)。 - **\[2025-09-11\]** MiroThinker-72B-Preview 在本周的 FutureX 基准测试中排名第 4。参见 [FutureX](https://futurex-ai.github.io/)。 - **\[2025-09-08\]** [MiroThinker-v0.2](https://huggingface.co/collections/miromind-ai/mirothinker-v02) 现已发布，在多个基准测试中取得了开源 SOTA 表现，包括 HLE (17.8%)、HLE-Text-Only (19.1%)、BrowseComp-EN (17.2%)、BrowseComp-ZH (29.4%)、XBench-DeepSearch (56.0%) 和 Frames (74.8%)。 - **\[2025-09-07\]** 我们支持了更多基准测试，包括 [BrowseComp-ZH](https://arxiv.org/abs/2504.19314)、[XBench-DeepSearch](https://xbench.org/agi/aisearch) 和 [FutureX](https://futurex-ai.github.io/)。我们计划在未来增加更多基准测试。 - **\[2025-08-22\]** 推出 MiroThinker 的精简部署选项，优化了资源使用和启动时间。体验交互式演示：[🚀 试用 Gradio Demo](apps/gradio-demo) - **\[2025-08-08\]** [MiroThinker-v0.1](https://huggingface.co/collections/miromind-ai/mirothinker-v01-689301b6d0563321862d44a1) 发布。

## 📝 简介 ### MiroThinker-1.7 我们新的 MiroThinker 系列在构建用于长链条任务的可靠 Agent 方面实现了重大飞跃。通过增强的 post-training pipeline 打造，我们的 MiroThinker-1.7 系列在开源模型的深度研究任务中达到了 SOTA 表现。 **核心特性** - 🚀 MiroThinker-1.7 支持 256K 上下文窗口、长程推理和深度多步分析。 - 🔧 每个任务最多处理 300 次 tool 交互，现已具备更准确的逐步推理和决策能力。 - 📦 提供 30B 和 235B 参数规模，并附带一套完整的工具和工作流，灵活支持各种研究场景和计算预算。 - 我们的 proprietary agent MiroThinker-H1 为长链条可验证推理提供了有力证据——即逐步可验证且全局可验证的推理过程，从而提升了复杂 agentic workflows 的表现。

| 模型名称 | 参数量 | 最大上下文 | 最大 Tool 调用次数 | HF 链接 | |:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:| | MiroThinker-1.7-mini | 30B | 256K | 300 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-1.7-mini) | | MiroThinker-1.7 | 235B | 256K | 300 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-1.7) |

MiroThinker-1.7 在广泛的基准测试中展现了强大的通用研究能力，在 BrowseComp、BrowseComp-ZH、GAIA-Val-165 和 HLE-Text 上分别取得了 74.0%、75.3%、82.7% 和 42.9% 的成绩。MiroThinker-1.7 在 BrowseComp-ZH 上取得了 SOTA 表现。 ![image](/assets/1.7_main_results.png) ### MiroThinker-v1.5

📦 点击展开 MiroThinker-v1.5 详情

MiroThinker v1.5 是世界领先的开源搜索 Agent，它通过**交互式扩展**推进了工具增强推理——将 Agent 训练为能够处理更深层次和更频繁的 agent-environment 交互，作为模型大小和上下文长度之外的第三个性能提升维度。 ![image](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/mirothinker_v1.5_framework.png) **核心特性** - 🚀 MiroThinker v1.5 支持 256K 上下文窗口、长程推理和深度多步分析。 - 🔧 每个任务最多处理 400 次 tool 调用——相比之前的开源研究 Agent 有显著提升。 - 📦 提供 30B 和 235B 参数规模，并附带一套完整的工具和工作流，灵活支持各种研究场景和计算预算。

| Agent 名称 | 基础 Agent | 最大上下文 | 最大 Tool 调用次数 | HF 链接 | |:---------------------:|:-----------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:| | MiroThinker-v1.5-30B | Qwen3-30B-A3B-Thinking-2507 | 256K | 400 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-v1.5-30B) | | MiroThinker-v1.5-235B | Qwen3-235B-A22B-Thinking-2507 | 256K | 400 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-v1.5-235B) |

MiroThinker v1.5 在广泛的基准测试中展现了强大的通用研究能力，在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Val-165 上分别取得了 39.2%、69.8%、71.5% 和 80.8% 的成绩。这些结果超越了之前的开源 Agent，并创下了新的世界领先 BrowseComp 表现。 ![image](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/mirothinker_v1.5_browsecomp.png)

### MiroThinker-v1.0

📦 点击展开 MiroThinker-v1.0 详情

与以往仅扩展模型大小或上下文长度的 Agent 不同，MiroThinker v1.0 在 Agent 层面引入了**交互式扩展**，系统地训练 Agent 以更深层次和更频繁地处理 agent–environment 交互，作为性能提升的第三维度。交互式扩展利用环境反馈和外部信息获取来纠正错误并优化轨迹。 ![image](https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/assets/MiroThinker_v1.0_Overall.png) ### ✨ 核心特性 - 🚀 **256K 上下文窗口**：支持长程推理和深度多步分析 - 🔧 **600 次 Tool 调用**：每个任务最多处理 600 次 tool 调用——相比之前的开源研究 Agent 有显著提升 - 📦 **多种规模**：提供 8B、30B 和 72B 参数规模，并附带一套完整的工具和工作流，灵活支持各种研究场景和计算预算

| Agent 名称 | 基础 Agent | 最大上下文 | 最大 Tool 调用次数 | HF 链接 | |:--------------------:|:---------------------------:|:-----------:|:--------------:|:------------------------------------------------------------------:| | MiroThinker-v1.0-8B | Qwen3-8B | 256K | 600 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-v1.0-8B) | | MiroThinker-v1.0-30B | Qwen3-30B-A3B-Thinking-2507 | 256K | 600 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-v1.0-30B) | | MiroThinker-v1.0-72B | Qwen2.5-72B-Instruct | 256K | 600 | [🤗 链接](https://huggingface.co/miromind-ai/MiroThinker-v1.0-72B) |

MiroThinker v1.0 在广泛的基准测试中展现了强大的通用研究能力，在 HLE-Text、BrowseComp、BrowseComp-ZH 和 GAIA-Text-103 上分别取得了 **37.7%**、**47.1%**、**55.6%** 和 **81.9%** 的成绩。这些结果超越了之前的开源 Agent，并缩小了与商业同类产品（如 **GPT-5-high**）的差距。

### MiroThinker-v0.2

📦 点击展开 MiroThinker-v0.2 详情

在这个新版本中，我们引入了三项关键改进： - 📚 **更丰富的训练数据**，来自英文和中文来源，在基准测试表现和泛化能力上取得了显著提升 - 🎯 **统一的 DPO 训练**，所有 Agent 共用一个偏好数据集 - 📏 **扩展的上下文长度**，从 40k 增加到 64k，以应对更具挑战性的多轮 tool-use 任务与 v0.1 相比，MiroThinker v0.2 在各基准测试中均有一致的提升。例如，在 **GAIA-Text-103** 上的分数从 **57.3 提升至 64.1**，在 **BrowseComp-ZH** 上从 **17.0 提升至29.4**，反映了模型通用研究 Agent 能力的显著进步。

### MiroThinker-v0.1

📦 点击展开 MiroThinker-v0.1 详情

Performance of Open-Source Agents on GAIA-Validation Benchmark.

我们发布了 **MiroThinker v0.1** 系列，包括 **8B**、**14B** 和 **32B** 参数规模的 SFT 和 DPO 版本。值得注意的是，MiroThinker v0.1 在 [GAIA benchmark](https://huggingface.co/datasets/gaia-benchmark/GAIA)（一个用于评估高级 agentic 能力的严格测试套件）上的开源模型中取得了 **state-of-the-art** 表现，展示了其在长上下文、高决策密度和真实世界任务场景中的实力。

## ✨ 核心特性 ### 🤖 **MiroThinker 优化框架** - 🔓 **完全开源的 Agent 框架**：框架和 Agent 完全公开，实现完全透明 - 🔗 **工具集成**：与外部工具和 API 无缝集成 - 📝 **轨迹收集**：全面记录和分析 Agent 交互，并显示经过的时间和预计完成时间（以分钟为单位）。可用于 SFT 和 DPO - 📊 **基准评测**：在多个基准数据集上进行了广泛测试 ### 📊 **全面的基准测试套件**

📋 点击展开基准测试列表

- **GAIA Validation**：通用 AI 助手的基准测试。([论文](https://arxiv.org/abs/2311.12983)) - **GAIA-Text-103**：GAIA Validation 中纯文本任务的子集。([论文](https://arxiv.org/abs/2505.22648)) - **HLE**：人类最后一场考试。([论文](https://arxiv.org/abs/2501.14249)) - **HLE-Text-2158**：HLE 中纯文本任务的子集。([论文](https://arxiv.org/abs/2501.14249)) - **HLE-Text-500**：HLE 中纯文本任务的子集，由 [WebThinker](https://arxiv.org/pdf/2504.21776) 创建。([论文](https://arxiv.org/pdf/2504.21776)) - **BrowseComp-EN**：网页浏览和理解任务。([论文](https://arxiv.org/abs/2504.12516)) - **BrowseComp-ZH**：BrowseComp 的中文版本。([论文](https://arxiv.org/abs/2504.19314)) - **WebWalkerQA**：网页导航和问答。([论文](https://arxiv.org/abs/2501.07572)) - **Frames**：事实性、检索和推理测量集。([论文](https://arxiv.org/abs/2409.12941)) - **XBench-DeepSearch**：用于深度研究 Agent 的基准测试。([网站](https://xbench.org/agi/aisearch)) - **FutureX**：旨在预测未知未来的实时基准测试。([网站](https://futurex-ai.github.io/)) - **SEAL-0**：用于评估 LLM 在冲突证据网页问题上表现的基准测试。([论文](https://arxiv.org/abs/2506.01062)) - **AIME2025**：美国邀请数学考试 2025。([网站](https://artificialanalysis.ai/evaluations/aime-2025)) - **DeepSearchQA**：Google 深度搜索问答基准测试。([论文](https://arxiv.org/abs/2505.20827))

## 📈 基准测试表现 ### MiroThinker-1.7

### MiroThinker-v1.5

📦 点击展开 MiroThinker-v1.5 详情

### MiroThinker-v1.0

📦 点击展开 MiroThinker-v1.0 详情

### MiroThinker-v0.2

📦 点击展开 MiroThinker-v0.2 详情

#### 与 SOTA 研究 Agent 的对比

#### GAIA Benchmark

### MiroThinker-v0.1

📦 点击展开 MiroThinker-v0.1 详情

#### GAIA Benchmark

| **方法** | Text-103
Best Pass@1 | Text-103
Pass@1 (Avg@8) | Val-165
Best Pass@1 | Val-165
Pass@1 (Avg@8) | |------------------------------|:-----------------------:|:--------------------------:|:----------------------:|:-------------------------:| | **🔹—— 7B/8B Agents ——** | | | | | | Search-o1-7B | 17.5 | - | - | - | | R1-Searcher-7B | 20.4 | - | - | - | | WebDancer-7B | 31.0 | - | - | - | | WebSailor-7B | 37.9 | - | - | - | | CK-Pro-8B | 40.3 | - | 32.7 | - | | **MiroThinker-8B-SFT-v0.1** | 44.7 | 40.1 | 34.6 | 31.8 | | + Commercial Tools | 46.6 | 42.1 | 37.6 | 33.9 | | **MiroThinker-8B-DPO-v0.1** | 46.6 | 44.8 | 37.0 | 35.4 | | + Commercial Tools | **50.5** | **46.7** | **38.2** | **35.9** | | **🔹—— 14B Agents ——** | | | | | | **MiroThinker-14B-SFT-v0.1** | 47.6 | 44.4 | 37.0 | 34.4 | | + Commercial Tools | 49.5 | 47.5 | 41.8 | 39.8 | | **MiroThinker-14B-DPO-v0.1** | 48.5 | 46.6 | 42.4 | 39.2 | | + Commercial Tools | **52.4** | **48.5** | **45.5** | **42.0** | | **🔹—— 32B Agents ——** | | | | | | Qwen3-32B | 31.1 | 26.7 | 29.7 | 26.4 | | Search-o1-32B | 28.2 | - | - | - | | WebThinker-32B-RL | 48.5 | - | - | - | | WebDancer-QwQ-32B | 51.5 | - | - | - | | WebSailor-32B | 53.2 | - | - | - | | WebShaper-QwQ-32B | 53.3 | - | - | - | | **MiroThinker-32B-SFT-v0.1** | 55.3 | 51.3 | 44.9 | 42.7 | | + Commercial Tools | 58.3 | 54.2 | 48.5 | 45.8 | | **MiroThinker-32B-DPO-v0.1** | 57.3 | 54.1 | 48.5 | 45.9 | | + Commercial Tools | **60.2** | **57.9** | **50.9** | **48.9** |

1. 遵循 WebThinker、WebAgents 和 CognitiveKernel 的惯例，我们报告 Best Pass@1，即三次运行中的最高分，这通常反映更强的性能，尽管可能表现出一定的波动性。为了提供更稳定的度量，我们额外报告 Pass@1 (Avg@8)，它以略低的分数为代价提供更高的一致性。 2. 为了与之前的开源工作保持一致，我们使用 WebAgents LLM-as-a-Judge 模板评估 GAIA-Text-103，并使用官方 GAIA scorer 脚本报告 GAIA-Val-165 的结果。 3. 默认情况下，我们尽可能使用开源工具，但 code tool [E2B](https://github.com/e2b-dev/E2B) 和 Google search tool [Serper](https://serper.dev/) 除外。我们在实现中使用了 [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo)、[Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) 和 [Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)。该框架可以轻松扩展到您选择的其他开源工具。 4. 用商业替代品替换这些开源工具可以带来性能提升。商业工具主要用于多模态能力和某些复杂的推理子任务。大多数任务，包括规划、浏览、优化、导航等，均由我们的 Agent 处理。 #### 更多基准测试

| 方法 | HLE
Pass@1 | Frames
Pass@1 | BrowseComp
Pass@1 | BrowseComp-ZH
Pass@1 | WebWalkerQA
Pass@1 | |------------------------------|:-------------:|:----------------:|:--------------------:|:-----------------------:|:---------------------:| | OpenAI Deep Research | 26.6 | - | 51.5 | 42.9 | - | | Gemini Deep Research | 26.9 | - | - | - | - | | Kimi-Researcher | 26.9 | 78.8 | - | - | - | | | | | | | | | WebDancer-7B | - | - | - | - | 36.0 | | WebSailor-7B | - | - | 6.7 | 14.2 | - | | **MiroThinker-8B-SFT-v0.1** | - | 58.0 | 5.5 | 9.3 | 41.3 | | **MiroThinker-8B-DPO-v0.1** | - | 64.4 | 8.7 | 13.6 | 45.7 | | | | | | | | | WebThinker-32B-RL | - | - | - | - | 46.5 | | WebDancer-QwQ-32B | - | - | 3.8 | 18.0 | 47.9 | | WebSailor-32B | - | - | 10.5 | 25.5 | - | | WebShaper-32B | - | - | - | - | 51.4 | | **MiroThinker-32B-SFT-v0.1** | 10.2 | 70.4 | 10.6 | 13.8 | 45.7 | | **MiroThinker-32B-DPO-v0.1** | 11.8 | 71.7 | 13.0 | 17.0 | 49.3 |

1. MiroThinker 的性能通过此代码库和开源工具进行测试；其他 Agent 的结果来自其论文和官方网站。 2. 由于 [MiroVerse-v0.1](https://huggingface.co/datasets/miromind-ai/MiroVerse-v0.1) 主要包含英文数据，Agent 的中文能力有限。我们计划在下一版本中添加更多中文数据以提升性能。

## 🚀 快速开始 ### 前置条件 - 🐍 **Python 3.10+** - 📦 **uv 包管理器** ([安装指南](https://github.com/astral-sh/uv)) - 🔑 **必需的 API keys**（参见下方的配置部分） ### 安装 ``` # 克隆仓库 git clone https://github.com/MiroMindAI/MiroThinker cd MiroThinker # 设置环境 cd apps/miroflow-agent uv sync # 配置 API Keys cp .env.example .env # 使用您的 API Keys (SERPER_API_KEY, JINA_API_KEY, E2B_API_KEY 等) 编辑 .env ``` ### 工具配置 #### MiroThinker-1.7 最小配置 | Server | 描述 | 提供的工具 | 必需的环境变量 | |:-------|:------------|:---------------|:-------------------------------| | **`tool-python`** | 执行环境和文件管理 (E2B sandbox) | `create_sandbox`, `run_command`, `run_python_code`, `upload_file_from_local_to_sandbox`, `download_file_from_sandbox_to_local`, `download_file_from_internet_to_sandbox` | `E2B_API_KEY` | | **`search_and_scrape_webpage`** | 通过 Serper API 进行 Google 搜索 | `google_search` | `SERPER_API_KEY`, `SERPER_BASE_URL` | | **`jina_scrape_llm_summary`** | 网页抓取及基于 LLM 的信息提取 | `scrape_and_extract_info` | `JINA_API_KEY`, `JINA_BASE_URL`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **最小 `.env` 配置示例：** ``` # MiroThinker v1.5 和 v1.0 必需（最小设置） SERPER_API_KEY=your_serper_key SERPER_BASE_URL="https://google.serper.dev" JINA_API_KEY=your_jina_key JINA_BASE_URL="https://r.jina.ai" E2B_API_KEY=your_e2b_key # jina_scrape_llm_summary 必需 # 注意：Summary LLM 可以是一个小模型（例如 Qwen3-14B 或 GPT-5-Nano） # 该选择对性能影响极小，请使用最方便的 SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions" SUMMARY_LLM_MODEL_NAME=your_llm_model_name # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano" SUMMARY_LLM_API_KEY=your_llm_api_key # Optional, depends on LLM provider # Benchmark 评估必需 OPENAI_API_KEY=your_openai_key # Required for running benchmark evaluations OPENAI_BASE_URL="https://api.openai.com/v1" # Optional, defaults to OpenAI's API ```

🔧 点击展开其他可用工具

以下可选工具可用，但未用于 MiroThinker v1.0-1.7 评测： | Server Name | Type | Description | |:---------------------|:-------------|:--------------------------------------------| | `tool-vqa` | Commercial | Vision processing using Claude | | `tool-vqa-os` | Open-Source | Vision processing (open-source alternative) | | `tool-transcribe` | Commercial | Audio transcription using OpenAI | | `tool-transcribe-os` | Open-Source | Audio transcription using Whisper | | `tool-reasoning` | Commercial | Reasoning engine using Claude | | `tool-reasoning-os` | Open-Source | Reasoning engine (open-source alternative) | | `tool-reading` | Open-Source | Document reading using MarkItDown | | `tool-google-search` | Commercial | Web search using Google + scraping | | `tool-sogou-search` | Commercial | Web search using Sogou (Chinese) | 有关所有可用工具的完整文档，请参见 [MiroFlow Tools README](libs/miroflow-tools/README.md)。

#### 预配置 Agent 设置 `apps/miroflow-agent/conf/agent/` 目录包含几个预配置的 Agent 设置。每个配置使用不同的工具，并需要在您的 `.env` 文件中设置相应的环境变量。 | 配置 | 描述 | 最大轮数 | 上下文保留 | 必需的环境变量 | 推荐用于 | |:---------------------------------------|:------------|:----------|:------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------| | **`mirothinker_1.7_keep5_max200`** ⭐ | 具有上下文管理的单 Agent | 200 | 保留最近 5 个 | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **1.7 (推荐用于大多数任务)** | | **`mirothinker_1.7_keep5_max300`** ⭐ | 具有上下文管理的单 Agent | 300 | 保留最近 5 个 | 同上 | **1.7 (用于 BrowseComp & BrowseComp-ZH)** |

📦 点击展开旧版配置 (v0.1/v0.2)

| 配置 | 描述 | 最大轮数 | 上下文保留 | 必需的环境变量 | 推荐用于 | |:-------------------------|:------------|:----------|:------------------|:-------------------------------|:----------------| | **`mirothinker_v1.5_keep5_max200`** | 具有上下文管理的单 Agent | 200 | 保留最近 5 个 | `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY` | **v1.5 (推荐用于大多数任务)** | | **`mirothinker_v1.5_keep5_max400`** | 具有上下文管理的单 Agent | 400 | 保留最近 5 个 | 同上 | **v1.5 (用于 BrowseComp & BrowseComp-ZH)** | | **`mirothinker_v1.5`** | 用于 MiroThinker v1.5 的单 Agent | 600 | 保留所有结果 | 同上 | **v1.5** | | **`mirothinker_v1.0_keep5`** | 具有上下文管理的单 Agent | 600 | 保留最近 5 个 | 同上 | **v1.0** | | **`mirothinker_v1.0`** | 用于 MiroThinker v1.0 的单 Agent | 600 | 保留所有结果 | 同上 | **v1.0** | | **`multi_agent`** | 具有商业工具的多 Agent (v0.1/v0.2) | 50 | 保留所有结果 | `E2B_API_KEY`, `ANTHROPIC_API_KEY`, `ANTHROPIC_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL` | v0.1/v0.2 | | **`multi_agent_os`** | 具有开源工具的多 Agent (v0.1/v0.2) | 50 | 保留所有结果 | `E2B_API_KEY`, `VISION_API_KEY`, `VISION_BASE_URL`, `VISION_MODEL_NAME`, `WHISPER_API_KEY`, `WHISPER_BASE_URL`, `WHISPER_MODEL_NAME`, `REASONING_API_KEY`, `REASONING_BASE_URL`, `REASONING_MODEL_NAME`, `SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL` | v0.1/v0.2 |

#### 创建自定义工具配置

🔧 点击展开自定义工具配置指南

您可以创建自己的 YAML 配置文件来自由组合 MCP servers。方法如下： 1. **在 `apps/miroflow-agent/conf/agent/` 中创建新的 YAML 文件**： ``` # conf/agent/my_custom_config.yaml defaults: - default - _self_ main_agent: tools: - tool-python # Execution environment - search_and_scrape_webpage # Google search - jina_scrape_llm_summary # Web scraping with LLM - tool-vqa # Vision processing (optional) - tool-transcribe # Audio processing (optional) - tool-reasoning # Reasoning engine (optional) - tool-reading # Document reading (optional) max_turns: 300 # Maximum number of turns sub_agents: agent-browsing: # Optional sub-agent tools: - tool-google-search - tool-vqa - tool-reading - tool-python max_turns: 50 keep_tool_result: -1 # Context retention budget: -1 keeps all tool results, or specify K to keep only the K most recent tool responses ``` 2. **在运行评估时使用您的自定义配置**： ``` cd apps/miroflow-agent uv run main.py llm=qwen-3 agent=my_custom_config llm.base_url=https://your_base_url/v1 ``` 3. **根据您使用的工具在 `.env` 中配置环境变量**。所有可用的环境变量都列在 `apps/miroflow-agent/.env.example` 中。将其复制到 `.env` 并根据您选择的配置进行设置： cd apps/miroflow-agent cp .env.example .env # 使用您的实际 API keys 编辑 .env **对于 MiroThinker v1.5**（`mirothinker_v1.5_keep5_max200.yaml`、`mirothinker_v1.5_keep5_max400.yaml` 或 `mirothinker_v1.5.yaml`）和 **v1.0**（`mirothinker_v1.0_keep5.yaml` 或 `mirothinker_v1.0.yaml`），请参阅上文的[最小配置](#minimal-configuration-for-mirothinker-v15-and-v10)部分以获取完整的配置示例。 **对于其他配置**，请参阅上文的[预配置 Agent 设置](#pre-configured-agent-settings)表格，了解需要哪些环境变量。

🔑 点击展开可选 API keys

``` # LLM-as-a-Judge 的 API（用于 Benchmark 测试，Benchmark 评估必需） OPENAI_API_KEY=your_openai_key OPENAI_BASE_URL="https://api.openai.com/v1" # Optional, defaults to OpenAI's API # 开源 Audio Transcription Tool 的 API（用于 Benchmark 测试，可选） WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo" WHISPER_API_KEY=your_whisper_key WHISPER_BASE_URL="https://your_whisper_base_url/v1" # 开源 VQA Tool 的 API（用于 Benchmark 测试，可选） VISION_MODEL_NAME="Qwen/Qwen2.5-VL-72B-Instruct" VISION_API_KEY=your_vision_key VISION_BASE_URL="https://your_vision_base_url/v1/chat/completions" # 开源 Reasoning Tool 的 API（用于 Benchmark 测试，可选） REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507" REASONING_API_KEY=your_reasoning_key REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions" # Claude Sonnet 3.7 作为商业工具的 API（可选） ANTHROPIC_API_KEY=your_anthropic_key # Sogou Search 的 API（可选） TENCENTCLOUD_SECRET_ID=your_tencent_cloud_secret_id TENCENTCLOUD_SECRET_KEY=your_tencent_cloud_secret_key # Summary LLM 的 API（可以使用小模型，如 Qwen3-14B 或 GPT-5-Nano） SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions" SUMMARY_LLM_MODEL_NAME=your_summary_llm_model_name # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano" SUMMARY_LLM_API_KEY=your_summary_llm_api_key ```

### 提供 MiroThinker Agent 服务 #### 选项 1（推荐）：使用 SGLang 或 vLLM 提供服务使用 SGLang 在端口 61002 上提供 MiroThinker 模型服务： ``` NUM_GPUS=4 PORT=61002 # 从 HF 下载 Agent AGENT_PATH=miromind-ai/MiroThinker-1.7-mini python3 -m sglang.launch_server \ --model-path $AGENT_PATH \ --tp $NUM_GPUS \ --dp 1 \ --host 0.0.0.0 \ --port $PORT \ --trust-remote-code ``` #### 选项 2：量化轻量级选项我们还提供了使用 CPU 优化和 GPU 加速量化技术来提供 MiroThinker agent 服务的综合指南，以及使用 llama.cpp、Ollama、SGLang 和其他推理框架进行部署的详细分析和指导。 ### 运行您的第一个任务设置环境并启动服务器后，运行 `main.py` 以使用默认问题进行测试：*"What is the title of today's arxiv paper in computer science?"*（今天计算机科学领域的 arxiv 论文标题是什么？） ``` cd apps/miroflow-agent # 使用 MiroThinker Agents（需要您自己的服务器） uv run python main.py llm=qwen-3 agent=mirothinker_1.7_keep5_max200 llm.base_url=http://localhost:61002/v1 # 或使用 Claude（需要在 .env 中设置 ANTHROPIC_API_KEY） uv run python main.py llm=claude-3-7 agent=single_agent_keep5 # 或使用 GPT-5（需要在 .env 中设置 OPENAI_API_KEY） uv run python main.py llm=gpt-5 agent=single_agent_keep5 ``` **要自定义您的问题**，请编辑 `main.py` 第 32 行： ``` task_description = "Your custom question here" ``` Agent 将搜索网络、根据需要执行代码，并提供带有来源的回答。 ## 📊 基准评测 ### 下载基准数据 ``` cd MiroThinker # Back to project root wget https://huggingface.co/datasets/miromind-ai/MiroFlow-Benchmarks/resolve/main/data_20251115_password_protected.zip unzip data_20251115_password_protected.zip # 密码：pf4* rm data_20251115_password_protected.zip ``` ### 运行基准评测 **可用参数：** 您可以在运行脚本之前通过设置以下环境变量来自定义评估： | 参数 | 默认值 | 描述 | |:----------|:--------|:------------| | `LLM_MODEL` | `"MiroThinker-Agents"` | Agent 名称标识符 | | `BASE_URL` | `"https://your-api.com/v1"` | 您的服务器 Base URL | | `NUM_RUNS` | 因基准而异 | 评估运行次数（大多数基准为 3 次，GAIA/XBench/FutureX/SEAL-0 为 8 次，AIME2025 为 32 次） | | `LLM_PROVIDER` | `"qwen"` | LLM 提供商（例如 `qwen`、`openai`、`anthropic`） | | `AGENT_SET` | `"mirothinker_1.7_keep5_max200"` | Agent 配置（例如 `mirothinker_1.7_keep5_max200`、`mirothinker_1.7_keep5_max300`。） | | `MAX_CONTEXT_LENGTH` | `262144` | 最大上下文长度 (256K) | | `MAX_CONCURRENT` | `10` | 最大并发任务数 | | `PASS_AT_K` `1` | Pass@K 评估指标 | | `TEMPERATURE` | `1.0` | 采样温度 | | `API_KEY` | `"xxx"` | 服务器的 API key | **使用示例：** ``` # 首先导航到 miroflow-agent 目录 cd apps/miroflow-agent # v1.5 基本用法（推荐） NUM_RUNS=8 LLM_MODEL="MiroThinker-1.7-mini" BASE_URL="https://your-api.com/v1" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # 或使用 v1.0 # NUM_RUNS=8 LLM_MODEL="MiroThinker-v1.0-30B" BASE_URL="https://your-api.com/v1" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # 自定义运行次数和 Agent 配置（带上下文管理的 v1.5） LLM_MODEL="MiroThinker-1.7-mini" \ BASE_URL="https://your-api.com/v1" \ NUM_RUNS=8 \ AGENT_SET="mirothinker_1.7_keep5_max200" \ bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh ```

📋 点击展开所有基准测试命令

``` # 首先导航到 miroflow-agent 目录 cd apps/miroflow-agent # HLE NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle.sh # HLE-Text-2158 NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-2158.sh # HLE-Text-500 NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-500.sh # GAIA-Text-103 NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh # GAIA-Validation (GAIA-Val-165) NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation.sh # BrowseComp-EN (⚠️ 使用 max300) NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max300" bash scripts/run_evaluate_multiple_runs_browsecomp.sh # BrowseComp-ZH (⚠️ 使用 max300) NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max300" bash scripts/run_evaluate_multiple_runs_browsecomp_zh.sh # WebWalkerQA NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_webwalkerqa.sh # XBench-DeepSearch NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_xbench_deepsearch.sh # FRAMES NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_frames.sh # SEAL-0 NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_seal-0.sh # FutureX NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_futurex.sh # AIME2025 NUM_RUNS=32 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_aime2025.sh # DeepSearchQA NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="mirothinker_1.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_deepsearchqa.sh ```

#### 3. **监控评估进度**

📊 点击展开进度监控命令

``` # 首先导航到 miroflow-agent 目录 cd apps/miroflow-agent # 针对 HLE python benchmarks/check_progress/check_progress_hle.py /path/to/evaluation/logs # 针对 HLE-Text-2158 python benchmarks/check_progress/check_progress_hle-text-2158.py /path/to/evaluation/logs # 针对 HLE-Text-500 python benchmarks/check_progress/check_progress_hle-text-500.py /path/to/evaluation/logs # 针对 BrowseComp-EN python benchmarks/check_progress/check_progress_browsecomp.py /path/to/evaluation/logs # 针对 BrowseComp-ZH python benchmarks/check_progress/check_progress_browsecomp_zh.py /path/to/evaluation/logs # 针对 GAIA-Validation python benchmarks/check_progress/check_progress_gaia-validation.py /path/to/evaluation/logs # 针对 GAIA-Text-103 python benchmarks/check_progress/check_progress_gaia-validation-text-103.py /path/to/evaluation/logs # 针对 WebWalkerQA python benchmarks/check_progress/check_progress_webwalkerqa.py /path/to/evaluation/logs # 针对 Frames python benchmarks/check_progress/check_progress_frames.py /path/to/evaluation/logs # 针对 XBench-DeepSearch python benchmarks/check_progress/check_progress_xbench_deepsearch.py /path/to/evaluation/logs # 针对 SEAL-0 python benchmarks/check_progress/check_progress_seal-0.py /path/to/evaluation/logs # 针对 AIME2025 python benchmarks/check_progress/check_progress_aime2025.py /path/to/evaluation/logs # 针对 DeepSearchQA python benchmarks/check_progress/check_progress_deepsearchqa.py /path/to/evaluation/logs ```

## 🔬 轨迹收集

📋 点击展开轨迹收集命令

``` cd apps/collect-trace # 收集用于 SFT 的 Traces bash scripts/collect_trace_claude37.sh bash scripts/collect_trace_gpt5.sh # 收集用于 DPO 的 Traces bash scripts/collect_trace_qwen3.sh ```

## ❓ 常见问题与故障排除 ### 常见问题

🔧 点击展开故障排除指南

#### **问：我应该使用哪个版本？** **答：** 我们推荐使用 **MiroThinker-1.7** ⭐ 配合最小配置： - **v1.7** ⭐：具有 256K 上下文的最新版本，世界领先的性能。使用配置（带上下文管理）： - `mirothinker_1.7_keep5_max200`（最多 200 轮，推荐用于大多数任务） - `mirothinker_1.7_keep5_max300`（最多 300 轮，仅用于 BrowseComp 和 BrowseComp-ZH） #### **问：如何获取 API keys？** **答：** 最小设置需要这些 keys： - **SERPER_API_KEY**：从 [Serper.dev](https://serper.dev/) 获取（Google search API） - **JINA_API_KEY**：从 [Jina.ai](https://jina.ai/) 获取（网页抓取） - **E2B_API_KEY**：从 [E2B.dev](https://e2b.dev/) 获取（代码执行沙箱） - **SUMMARY_LLM_API_KEY**：您的 LLM API 凭证（用于内容摘要）。可以是小模型，如 Qwen3-14B 或 GPT-5-Nano——选择对性能影响很小。 - **OPENAI_API_KEY**：从 [OpenAI](https://platform.openai.com/) 获取（基准评测需要，用于 LLM-as-a-Judge） - **OPENAI_BASE_URL**：可选，默认为 `https://api.openai.com/v1`。可更改为使用兼容 OpenAI 的 API。 #### **问：Agent 服务器连接错误** **答：** 常见问题： - **检查 base URL 格式**：应以 `/v1` 结尾（例如 `https://your-api.com/v1`） - **验证 API key**：确保在环境或脚本中正确设置了 `API_KEY` - **检查服务器状态**：确保您的服务器正在运行且可访问 - **网络问题**：验证防火墙/网络设置是否允许连接 #### **问：评估脚本无法运行** **答：** 故障排除步骤： 1. **检查工作目录**：确保您在 `apps/miroflow-agent` 目录中 2. **验证环境**：运行 `uv sync` 确保已安装依赖 3. **检查 .env 文件**：确保设置了所有必需的环境变量 4. **查看日志**：检查 `logs/` 目录以获取详细的错误消息 5. **验证数据路径**：确保基准数据已下载且位置正确 #### **问：内存不足错误** **答：** 解决方案： - **减少上下文长度**：将 `MAX_CONTEXT_LENGTH` 设置为较小的值（例如 131072 代表 128K） - **使用具有较少轮数的上下文管理**： - 对于 v1.5：使用 `mirothinker_1.7_keep5_max200` 或 `mirothinker_1.7_keep5_max300`（带上下文管理） - **减少并发任务**：将 `MAX_CONCURRENT` 设置为较小的数字（例如 5） - **使用较小的 Agent**： - 对于 v1.5：尝试 30B 而不是 235B - 对于 v1.0：尝试 8B 或 30B 而不是 72B #### **问：工具执行错误** **答：** 常见修复方法： - **E2B 错误**：验证 `E2B_API_KEY` 有效且帐户有额度 - **Serper 错误**：检查 `SERPER_API_KEY` 和速率限制 - **Jina 错误**：验证 `JINA_API_KEY` 和 `JINA_BASE_URL` 正确 - **LLM 摘要错误**：检查 `SUMMARY_LLM_*` 变量和 Agent 可用性 #### **问：如何监控长时间运行的评估？** **答：** 使用进度监控脚本： ``` cd apps/miroflow-agent python benchmarks/check_progress/check_progress_.py /path/to/logs ``` 脚本会显示完成状态、已用时间和预计剩余时间。

### 获取帮助 - 📖 **文档**：查看 [MiroFlow Tools README](libs/miroflow-tools/README.md) 了解工具详情 - 💬 **Discord**：加入我们的 [Discord 社区](https://discord.com/invite/GPqEnkzQZd) - 🐛 **问题**：在 [GitHub Issues](https://github.com/MiroMindAI/MiroThinker/issues) 上报告 bug - 📧 **联系**：访问 [我们的网站](https://miromind.ai/) 了解更多信息 ## 📄 许可证本项目采用 Apache 2.0 许可证授权 - 详情请参见 [LICENSE](LICENSE) 文件。 ## 🙏 致谢我们要向以下方面表示诚挚的感谢： - 🏆 **基准测试贡献者**，感谢他们提供的全面评估数据集 - 🌍 **开源社区**，感谢那些使之成为可能的工具和库 - 👥 **所有贡献者**，感谢他们帮助完善 MiroThinker

加入我们的社区，共同构建 AI agents 的未来！ ### 参考文献如果您发现此项目对您的研究有用，请考虑引用： ``` @article{miromind2025mirothinker, title={MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling}, author={MiroMind Team and Bai, Song and Bing, Lidong and Chen, Carson and Chen, Guanzheng and Chen, Yuntao and Chen, Zhe and Chen, Ziyi and Dong, Xuan and others}, journal={arXiv preprint arXiv:2511.11793}, year={2025} } ``` [![Star History Chart](https://api.star-history.com/svg?repos=MiroMindAI/MiroThinker&type=Date)](https://star-history.com/#MiroMindAI/MiroThinker&Date)

标签：Apex, BrowseComp, DLL 劫持, Hugging Face, IaC 扫描, MiroMind, MiroThinker, RAG, SOTA, 人工智能, 复杂任务处理, 大语言模型, 开源模型, 机器学习, 浏览基准测试, 深度学习, 深度研究智能体, 用户模式Hook绕过, 自动推理, 逆向工具, 预测模型