vercel-labs/agent-browser

GitHub: vercel-labs/agent-browser

专为 AI 智能体设计的无头浏览器自动化 CLI,通过快照引用机制和 JSON 输出让大语言模型能够高效、安全地操控浏览器完成复杂 Web 交互任务。

Stars: 19638 | Forks: 1148

# agent-browser 面向 AI agents 的 Headless browser 自动化 CLI。快速的 Rust CLI,带有 Node.js 回退支持。 ## 安装 ### 全局安装(推荐) 安装原生 Rust 二进制文件以获得最佳性能: ``` npm install -g agent-browser agent-browser install # Download Chromium ``` 这是最快的选项 —— 命令直接通过原生 Rust CLI 运行,解析开销为亚毫秒级。 ### 快速开始(无需安装) 如果你想在不安裈的情况下试用,可以直接使用 `npx` 运行: ``` npx agent-browser install # Download Chromium (first time only) npx agent-browser open example.com ``` ### 项目安装(本地依赖) 适用于希望在 `package.json` 中固定版本的项目: ``` npm install agent-browser npx agent-browser install ``` 然后通过 `npx` 或 `package.json` 脚本使用: ``` npx agent-browser open example.com ``` ### Homebrew (macOS) ``` brew install agent-browser agent-browser install # Download Chromium ``` ### 从源码构建 ``` git clone https://github.com/vercel-labs/agent-browser cd agent-browser pnpm install pnpm build pnpm build:native # Requires Rust (https://rustup.rs) pnpm link --global # Makes agent-browser available globally agent-browser install ``` ### Linux 依赖项 在 Linux 上,安装系统依赖: ``` agent-browser install --with-deps # 或手动执行:npx playwright install-deps chromium ``` ## 快速开始 ``` agent-browser open example.com agent-browser snapshot # Get accessibility tree with refs agent-browser click @e2 # Click by ref from snapshot agent-browser fill @e3 "test@example.com" # Fill by ref agent-browser get text @e1 # Get text by ref agent-browser screenshot page.png agent-browser close ``` ### 传统选择器(同样支持) ``` agent-browser click "#submit" agent-browser fill "#email" "test@example.com" agent-browser find role button click --name "Submit" ``` ## 命令 ### 核心命令 ``` agent-browser open # Navigate to URL (aliases: goto, navigate) agent-browser click # Click element (--new-tab to open in new tab) agent-browser dblclick # Double-click element agent-browser focus # Focus element agent-browser type # Type into element agent-browser fill # Clear and fill agent-browser press # Press key (Enter, Tab, Control+a) (alias: key) agent-browser keyboard type # Type with real keystrokes (no selector, current focus) agent-browser keyboard inserttext # Insert text without key events (no selector) agent-browser keydown # Hold key down agent-browser keyup # Release key agent-browser hover # Hover element agent-browser select # Select dropdown option agent-browser check # Check checkbox agent-browser uncheck # Uncheck checkbox agent-browser scroll [px] # Scroll (up/down/left/right, --selector ) agent-browser scrollintoview # Scroll element into view (alias: scrollinto) agent-browser drag # Drag and drop agent-browser upload # Upload files agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path) agent-browser screenshot --annotate # Annotated screenshot with numbered element labels agent-browser pdf # Save as PDF agent-browser snapshot # Accessibility tree with refs (best for AI) agent-browser eval # Run JavaScript (-b for base64, --stdin for piped input) agent-browser connect # Connect to browser via CDP agent-browser close # Close browser (aliases: quit, exit) ``` ### 获取信息 ``` agent-browser get text # Get text content agent-browser get html # Get innerHTML agent-browser get value # Get input value agent-browser get attr # Get attribute agent-browser get title # Get page title agent-browser get url # Get current URL agent-browser get count # Count matching elements agent-browser get box # Get bounding box agent-browser get styles # Get computed styles ``` ### 检查状态 ``` agent-browser is visible # Check if visible agent-browser is enabled # Check if enabled agent-browser is checked # Check if checked ``` ### 查找元素(语义定位器) ``` agent-browser find role [value] # By ARIA role agent-browser find text # By text content agent-browser find label
标签:AI 开发工具, AI 智能体, Chromium, GNU通用公共许可证, MITM代理, Node.js, Playwright, RPA, Rust, Web Scraping, 可视化界面, 威胁情报, 开发者工具, 无头浏览器, 浏览器自动化, 特征检测, 网络流量审计, 辅助功能树, 通知系统, 通知系统, 风险管理