# **🔥 Firecrawl**
**将网站转化为适配 LLM 的数据。**
[**Firecrawl**](https://firecrawl.dev/?ref=github) 是一款 API 服务,能够抓取、爬取并从任何网站提取结构化数据,为 AI agent 和应用程序提供来自网络的实时上下文。
正在寻找我们的 MCP?请查看[这里](https://github.com/firecrawl/firecrawl-mcp-server)的代码库。
*本代码库仍在开发中,我们正在将自定义模块整合到单体仓库中。目前尚未完全准备好用于自托管部署,但您可以在本地运行它。*
## 为什么选择 Firecrawl?
- **适配 LLM 的输出**:整洁的 markdown、结构化 JSON、截图、HTML 等
- **行业领先的可靠性**:在[基准评估](https://www.firecrawl.dev/blog/the-worlds-best-web-data-api-v25)中覆盖率 >80%,表现优于所有其他受测提供商
- **处理棘手难题**:代理、JavaScript 渲染以及会导致其他抓取工具失效的动态内容
- **高度可定制**:排除标签、爬取需登录页面、最大深度设置等
- **媒体解析**:自动从 PDF、DOCX 和图像中提取文本
- **交互动作**:在提取前执行点击、滚动、输入、等待等操作
- **批处理**:异步抓取数千个 URL
- **变更追踪**:随时间推移监控网站内容变化
## 快速开始
在 [firecrawl.dev](https://firecrawl.dev) 注册以获取您的 API 密钥,并在几秒钟内开始提取数据。试用 [playground](https://firecrawl.dev/playground) 进行测试。
### 发出您的第一个 API 请求
```
curl -X POST 'https://api.firecrawl.dev/v2/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"url": "https://example.com"}'
```
响应:
```
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## 功能概览
## | 功能 | 描述 |
|---------|-------------|
| [**Scrape**](#scraping) | 将任意 URL 转换为 markdown、HTML、截图或结构化 JSON |
| [**Search**](#search) | 搜索网络并获取结果的完整页面内容 |
| [**Map**](#map) | 即时发现网站上的所有 URL |
| [**Crawl**](#crawling) | 通过单个请求抓取网站的所有 URL |
| [**Agent**](#agent) | 自动化数据收集,只需描述您的需求 |
## 抓取
将任意 URL 转换为整洁的 markdown、HTML 或结构化数据。
```
curl -X POST 'https://api.firecrawl.dev/v2/scrape' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
响应:
```
{
"success": true,
"data": {
"markdown": "# Firecrawl Docs\n\nTurn websites into LLM-ready data...",
"html": "...",
"metadata": {
"title": "Quickstart | Firecrawl",
"description": "Firecrawl allows you to turn entire websites into LLM-ready markdown",
"sourceURL": "https://docs.firecrawl.dev",
"statusCode": 200
}
}
}
```
### 提取结构化数据 (JSON 模式)
使用 schema 提取结构化数据:
```
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class CompanyInfo(BaseModel):
company_mission: str
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result.json)
```
```
{"company_mission": "Turn websites into LLM-ready data", "is_open_source": true, "is_in_yc": true}
```
或者仅通过提示提取(无 schema):
```
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "prompt": "Extract the company mission"}]
)
```
### Scrape 格式
可用格式:`markdown`、`html`、`rawHtml`、`screenshot`、`links`、`json`、`branding`
**获取截图**
```
doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot) # Base64 encoded image
```
**提取品牌标识 (颜色、字体、排版)**
```
doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding) # {"colors": {...}, "fonts": [...], "typography": {...}}
```
### 动作 (抓取前交互)
在提取前执行点击、输入、滚动等操作:
```
doc = app.scrape(
url="https://example.com/login",
formats=["markdown"],
actions=[
{"type": "write", "text": "user@example.com"},
{"type": "press", "key": "Tab"},
{"type": "write", "text": "password"},
{"type": "click", "selector": 'button[type="submit"]'},
{"type": "wait", "milliseconds": 2000},
{"type": "screenshot"}
]
)
```
## 搜索
搜索网络并选择性抓取结果。
```
curl -X POST 'https://api.firecrawl.dev/v2/search' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"query": "firecrawl web scraping",
"limit": 5
}'
```
响应:
```
{
"success": true,
"data": {
"web": [
{
"url": "https://www.firecrawl.dev/",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI.",
"position": 1
}
],
"images": [...],
"news": [...]
}
}
```
### 搜索并抓取内容
获取搜索结果的完整内容:
```
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
results = firecrawl.search(
"firecrawl web scraping",
limit=3,
scrape_options={
"formats": ["markdown", "links"]
}
)
```
## Agent
**从网络获取数据的最简单方式。** 描述您的需求,我们的 AI agent 会自动搜索、导航并提取数据。无需 URL。
Agent 是我们 `/extract` 端点的演进:更快、更可靠,且不需要您预先知道 URL。
```
curl -X POST 'https://api.firecrawl.dev/v2/agent' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Find the pricing plans for Notion"
}'
```
响应:
```
{
"success": true,
"data": {
"result": "Notion offers the following pricing plans:\n\n1. Free - $0/month...\n2. Plus - $10/seat/month...\n3. Business - $18/seat/month...",
"sources": ["https://www.notion.so/pricing"]
}
}
```
### 结构化输出的 Agent
使用 schema 获取结构化数据:
```
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema
)
print(result.data)
```
```
{
"founders": [
{"name": "Eric Ciarla", "role": "Co-founder"},
{"name": "Nicolas Camara", "role": "Co-founder"},
{"name": "Caleb Peffer", "role": "Co-founder"}
]
}
```
### 带 URL 的 Agent (可选)
将 agent 聚焦于特定页面:
```
result = app.agent(
urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt="Compare the features and pricing information"
)
```
### 模型选择
根据您的需求在两种模型之间选择:
| 模型 | 成本 | 最适合 |
|-------|------|----------|
| `spark-1-mini` (默认) | 便宜 60% | 大多数任务 |
| `spark-1-pro` | 标准 | 复杂研究、关键提取 |
```
result = app.agent(
prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
```
**何时使用 Pro:**
- 跨多个网站比较数据
- 从具有复杂导航或需要认证的网站提取
- 需要 agent 探索多条路径的研究任务
- 准确性至关重要的关键数据
在我们的 [Agent 文档](https://docs.firecrawl.dev/features/agent)中了解更多关于 Spark 模型的信息。
### 配合 AI agent 使用 Firecrawl
安装 Firecrawl skill,让 Claude Code、Codex 和 OpenCode 等 AI agent 自动使用 Firecrawl:
```
npx skills add firecrawl/cli
```
安装后请重启您的 agent。完整设置请参阅 [Skill + CLI 文档](https://docs.firecrawl.dev/sdks/cli)。
## Crawling
爬取整个网站并获取所有页面的内容。
```
curl -X POST 'https://api.firecrawl.dev/v2/crawl' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"]
}
}'
```
返回一个 job ID:
```
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/crawl/123-456-789"
}
```
### 检查爬取状态
```
curl -X GET 'https://api.firecrawl.dev/v2/crawl/123-456-789' \
-H 'Authorization: Bearer fc-YOUR_API_KEY'
```
```
{
"status": "completed",
"total": 50,
"completed": 50,
"creditsUsed": 50,
"data": [
{
"markdown": "# Page Title\n\nContent...",
"metadata": {"title": "Page Title", "sourceURL": "https://..."}
}
]
}
```
**注意:** [SDKs](#sdks) 会自动处理轮询,以提供更好的开发者体验。
## Map
即时发现网站上的所有 URL。
```
curl -X POST 'https://api.firecrawl.dev/v2/map' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"url": "https://firecrawl.dev"}'
```
响应:
```
{
"success": true,
"links": [
{"url": "https://firecrawl.dev", "title": "Firecrawl", "description": "Turn websites into LLM-ready data"},
{"url": "https://firecrawl.dev/pricing", "title": "Pricing", "description": "Firecrawl pricing plans"},
{"url": "https://firecrawl.dev/blog", "title": "Blog", "description": "Firecrawl blog"}
]
}
```
### 结合搜索的 Map
在站点内查找特定 URL:
```
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.map("https://firecrawl.dev", search="pricing")
# 返回与 "pricing" 相关的 URL
```
## Batch Scraping
一次性抓取多个 URL:
```
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
job = app.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
], formats=["markdown"])
for doc in job.data:
print(doc.metadata.source_url)
```
## SDKs
我们的 SDK 提供了与所有 Firecrawl 功能交互的便捷方式,并自动处理爬取和批量抓取等异步操作的轮询。
### Python
安装 SDK:
```
pip install firecrawl-py
```
```
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Scrape 单个 URL
doc = app.scrape("https://firecrawl.dev", formats=["markdown"])
print(doc.markdown)
# 使用 Agent 进行自主数据收集
result = app.agent(prompt="Find the founders of Stripe")
print(result.data)
# Crawl 网站(自动等待完成)
docs = app.crawl("https://docs.firecrawl.dev", limit=50)
for doc in docs.data:
print(doc.metadata.source_url, doc.markdown[:100])
# 搜索网络
results = app.search("best web scraping tools 2024", limit=10)
print(results)
```
### Node.js
安装 SDK:
```
npm install @mendable/firecrawl-js
```
```
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });
// Scrape a single URL
const doc = await app.scrape('https://firecrawl.dev', { formats: ['markdown'] });
console.log(doc.markdown);
// Use the Agent for autonomous data gathering
const result = await app.agent({ prompt: 'Find the founders of Stripe' });
console.log(result.data);
// Crawl a website (automatically waits for completion)
const docs = await app.crawl('https://docs.firecrawl.dev', { limit: 50 });
docs.data.forEach(doc => {
console.log(doc.metadata.sourceURL, doc.markdown.substring(0, 100));
});
// Search the web
const results = await app.search('best web scraping tools 2024', { limit: 10 });
results.data.web.forEach(result => {
console.log(`${result.title}: ${result.url}`);
});
```
### 社区 SDKs
- [Go SDK](https://github.com/mendableai/firecrawl-go)
- [Rust SDK](https://docs.firecrawl.dev/sdks/rust)
## 集成
**Agent 与 AI 工具**
- [Firecrawl Skill](https://docs.firecrawl.dev/sdks/cli)
- [Firecrawl MCP](https://github.com/mendableai/firecrawl-mcp-server)
**平台**
- [Lovable](https://docs.lovable.dev/integrations/firecrawl)
- [Zapier](https://zapier.com/apps/firecrawl/integrations)
- [n8n](https://n8n.io/integrations/firecrawl/)
[查看所有集成 →](https://www.firecrawl.dev/integrations)
**找不到您 favorite 的工具?** [提交 issue](https://github.com/mendableai/firecrawl/issues) 告诉我们!
## 资源
- [文档](https://docs.firecrawl.dev)
- [API 参考](https://docs.firecrawl.dev/api-reference/introduction)
- [Playground](https://firecrawl.dev/playground)
- [更新日志](https://firecrawl.dev/changelog)
## 开源版与云端版
Firecrawl 根据 AGPL-3.0 许可证开源。[firecrawl.dev](https://firecrawl.dev) 的云端版本包含额外功能:

要在本地运行,请参阅[贡献指南](https://github.com/firecrawl/firecrawl/blob/main/CONTRIBUTING.md)。要自托管,请参阅[自托管指南](https://docs.firecrawl.dev/contributing/self-host)。
## 贡献
我们非常欢迎贡献!在提交 pull request 之前,请阅读我们的[贡献指南](https://github.com/firecrawl/firecrawl/blob/main/CONTRIBUTING.md)。
### 贡献者
## 许可证
本项目主要根据 GNU Affero General Public License v3.0 (AGPL-3.0) 授权。SDK 和部分 UI 组件根据 MIT 许可证授权。详情请参阅特定目录中的 LICENSE 文件。
**尊重网站政策是最终用户的唯一责任。** 建议用户遵守适用的隐私政策和使用条款。默认情况下,Firecrawl 遵守 robots.txt 指令。使用 Firecrawl 即表示您同意遵守这些条件。
↑ 返回顶部 ↑