livekit/agents

GitHub: livekit/agents

LiveKit Agents 是一个用于构建服务器端实时多模态语音 AI Agent 的开源框架，解决语音对话应用中 STT、LLM、TTS 组件集成与调度编排的复杂性问题。

Stars: 11033 | Forks: 3239

The LiveKit icon, the name of the repository and some sample code in the background.

![PyPI - Version](https://img.shields.io/pypi/v/livekit-agents) [![PyPI 下载量](https://static.pepy.tech/badge/livekit-agents/month)](https://pepy.tech/projects/livekit-agents) [![Slack 社区](https://img.shields.io/endpoint?url=https%3A%2F%2Flivekit.io%2Fbadges%2Fslack)](https://livekit.io/join-slack) [![Twitter 关注](https://img.shields.io/twitter/follow/livekit)](https://twitter.com/livekit) [![向 DeepWiki 提问以理解代码库](https://deepwiki.com/badge.svg)](https://deepwiki.com/livekit/agents) [![许可证](https://img.shields.io/github/license/livekit/livekit)](https://github.com/livekit/livekit/blob/master/LICENSE)
正在寻找 JS/TS 库？请查看 [AgentsJS](https://github.com/livekit/agents-js) ## 什么是 Agents？ Agent 框架旨在构建在服务器上运行的实时、可编程的参与者。用它来创建能够看、听和理解的对话式、多模态语音 agent。 ## 功能 - **灵活的集成**：一个综合的生态系统，可混合搭配最合适的 STT、LLM、TTS 和 Realtime API，以满足您的使用场景。 - **集成任务调度**：内置任务调度与分发功能，通过 [dispatch APIs](https://docs.livekit.io/agents/build/dispatch/) 将终端用户连接到 agent。 - **广泛的 WebRTC 客户端**：使用 LiveKit 的开源 SDK 生态系统构建客户端应用，支持所有主要平台。 - **电话集成**：与 LiveKit 的[电话技术栈](https://docs.livekit.io/sip/)无缝协作，允许您的 agent 拨打或接听电话。 - **与客户端交换数据**：使用 [RPCs](https://docs.livekit.io/home/client/data/rpc/) 和其他 [Data APIs](https://docs.livekit.io/home/client/data/) 与客户端无缝交换数据。 - **语义轮次检测**：使用 transformer 模型检测用户何时结束了他们的发言，有助于减少打断。 - **MCP 支持**：原生支持 MCP。只需一行代码即可集成 MCP 服务器提供的工具。 - **内置测试框架**：编写测试并使用评判器来确保您的 agent 表现符合预期。 - **开源**：完全开源，允许您在自己的服务器上运行整个技术栈，包括使用最广泛的 WebRTC 媒体服务器之一 [LiveKit server](https://github.com/livekit/livekit)。 ## 安装要安装核心 Agents 库以及适用于热门模型提供商的插件： ``` pip install "livekit-agents[openai,deepgram,cartesia]" ``` ## 文档与指南有关框架及其使用方法的文档可在[这里](https://docs.livekit.io/agents/)找到 ### 使用 AI 编程 agent 进行构建如果您正在使用 AI 编程助手来构建 LiveKit Agents，为了获得最佳效果，我们建议进行以下设置： 1. **安装 [LiveKit 文档 MCP server](https://docs.livekit.io/mcp)** — 让您的编程 agent 能够访问最新的 LiveKit 文档、跨 LiveKit 仓库的代码搜索以及可用的示例。 2. **安装 [LiveKit Agent 技能](https://github.com/livekit/agent-skills)** — 为您的编程 agent 提供构建语音 AI 应用的架构指导和最佳实践，包括工作流设计、交接、任务和测试模式。 npx skills add livekit/agent-skills --skill livekit-agents Agent 技能与 MCP server 配合使用效果最佳：该技能教您的 agent *如何着手* 构建 LiveKit 应用，而 MCP server 则提供*当前的 API 详情*以使其正确实现。 ## 核心概念 - Agent：具有预定义指令的基于 LLM 的应用。 - AgentSession：一个用于 agent 的容器，管理与终端用户的交互。 - entrypoint：交互式会话的起点，类似于 Web 服务器中的请求处理器。 - AgentServer：协调任务调度并为用户会话启动 agent 的主进程。 ## 用法 ### 简单的语音 agent ``` from livekit.agents import ( Agent, AgentServer, AgentSession, JobContext, RunContext, cli, function_tool, inference, ) @function_tool async def lookup_weather( context: RunContext, location: str, ): """Used to look up weather information.""" return {"weather": "sunny", "temperature": 70} server = AgentServer() @server.rtc_session() async def entrypoint(ctx: JobContext): session = AgentSession( vad=inference.VAD(), # any combination of STT, LLM, TTS, or realtime API can be used # this example shows LiveKit Inference, a unified API to access different models via LiveKit Cloud # to use model provider keys directly, replace with the following: # from livekit.plugins import deepgram, openai, cartesia # stt=deepgram.STT(model="nova-3"), # llm=openai.LLM(model="gpt-4.1-mini"), # tts=cartesia.TTS(model="sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"), stt=inference.STT("deepgram/nova-3", language="multi"), llm=inference.LLM("openai/gpt-4.1-mini"), tts=inference.TTS("cartesia/sonic-3", voice="9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"), ) agent = Agent( instructions="You are a friendly voice assistant built by LiveKit.", tools=[lookup_weather], ) await session.start(agent=agent, room=ctx.room) await session.generate_reply(instructions="greet the user and ask about their day") if __name__ == "__main__": cli.run_app(server) ``` 此示例需要以下环境变量： - LIVEKIT_URL - LIVEKIT_API_KEY - LIVEKIT_API_SECRET ### 多 agent 交接此代码片段已精简。完整示例请参见 [multi_agent.py](examples/voice_agents/multi_agent.py) ``` ... class IntroAgent(Agent): def __init__(self) -> None: super().__init__( instructions=f"You are a story teller. Your goal is to gather a few pieces of information from the user to make the story personalized and engaging." "Ask the user for their name and where they are from" ) async def on_enter(self): self.session.generate_reply(instructions="greet the user and gather information") @function_tool async def information_gathered( self, context: RunContext, name: str, location: str, ): """Called when the user has provided the information needed to make the story personalized and engaging. Args: name: The name of the user location: The location of the user """ context.userdata.name = name context.userdata.location = location story_agent = StoryAgent(name, location) return story_agent, "Let's start the story!" class StoryAgent(Agent): def __init__(self, name: str, location: str) -> None: super().__init__( instructions=f"You are a storyteller. Use the user's information in order to make the story personalized." f"The user's name is {name}, from {location}", # override the default model, switching to Realtime API from standard LLMs llm=openai.realtime.RealtimeModel(voice="echo"), chat_ctx=chat_ctx, ) async def on_enter(self): self.session.generate_reply() @server.rtc_session() async def entrypoint(ctx: JobContext): userdata = StoryData() session = AgentSession[StoryData]( vad=inference.VAD(), stt="deepgram/nova-3", llm="openai/gpt-4.1-mini", tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc", userdata=userdata, ) await session.start( agent=IntroAgent(), room=ctx.room, ) ... ``` ### 测试自动化测试对于构建可靠的 agent 至关重要，尤其是在 LLM 具有非确定性行为的情况下。LiveKit Agents 包含原生测试集成，可帮助您创建可靠的 agent。 ``` @pytest.mark.asyncio async def test_no_availability() -> None: llm = google.LLM() async with AgentSession(llm=llm) as sess: await sess.start(MyAgent()) result = await sess.run( user_input="Hello, I need to place an order." ) result.expect.skip_next_event_if(type="message", role="assistant") result.expect.next_event().is_function_call(name="start_order") result.expect.next_event().is_function_call_output() await ( result.expect.next_event() .is_message(role="assistant") .judge(llm, intent="assistant should be asking the user what they would like") ) ``` ## 示例有关更多示例和详细的设置说明，请参见[示例目录](examples/)。如需查看更多示例，请参见 [python-agents-examples](https://github.com/livekit-examples/python-agents-examples) 仓库。

🎙️ 入门级 Agent

为语音对话优化的入门级 agent。

代码

🔄 多用户按键通话

通过按键通话回复房间中的多个用户。

代码

🎵 背景音频

背景环境和思考音频，提升真实感。

代码

🛠️ 动态工具创建

动态创建函数工具。

代码

☎️ 外呼拨号器

进行外呼的 agent

代码

📋 结构化输出

使用来自 LLM 的结构化输出来引导 TTS 的语调。

代码

🔌 MCP 支持

使用来自 MCP 服务器的工具

代码

💬 纯文本 agent

完全跳过语音，并使用相同的代码进行纯文本集成

代码

📝 多用户转录器

生成房间中所有用户的转录文本

代码

🎥 视频虚拟形象

使用 Tavus、Bithuman、LemonSlice 等添加 AI 虚拟形象

代码

🍽️ 餐厅点餐与预订

处理餐厅来电的完整 agent 示例。

代码

👁️ Gemini Live 视觉

具有视觉能力的 Gemini Live agent 的完整示例（包含 iOS 应用）。

代码

## 运行您的 agent ### 在终端中测试 ``` python myagent.py console ``` 在终端模式下运行您的 agent，启用本地音频输入和输出以进行测试。此模式不需要外部服务器或依赖项，有助于快速验证行为。 ### 使用 LiveKit 客户端进行开发 ``` python myagent.py dev ``` 启动 agent 服务器并在文件更改时启用热重载。此模式允许每个进程高效地承载多个并发 agent。该 agent 将连接到 LiveKit Cloud 或您的自托管服务器。请设置以下环境变量： - LIVEKIT_URL - LIVEKIT_API_KEY - LIVEKIT_API_SECRET 您可以使用任何 LiveKit 客户端 SDK 或电话集成进行连接。为了快速开始，请尝试 [Agents Playground](https://agents-playground.livekit.io/)。 ### 在生产环境中运行 ``` python myagent.py start ``` 使用生产就绪的优化配置运行 agent。 ## 许可证 Agents 框架采用 [Apache-2.0](LICENSE) 许证。LiveKit 轮次检测模型采用 [LiveKit 模型许可协议](MODEL_LICENSE)。 ### 开发环境设置本项目使用 [uv](https://docs.astral.sh/uv/) 进行包管理。要安装开发依赖项： ``` uv sync --all-extras --dev ``` ### 示例本项目在 [`examples`](examples/) 目录中包含许多示例。要运行它们，请创建包含 LiveKit Server 凭据以及任何必要模型提供商凭据的 `examples/.env` 文件（参见 `examples/.env.example`），然后运行： ``` uv run examples/voice_agents/basic_agent.py dev ``` 有关更多信息，请参见[示例 README](examples/README.md)。 ### 测试单元测试位于 `tests` 目录中，可使用以下命令运行： ``` uv run pytest --unit ``` 每个插件的集成测试需要各种 API 凭据，并且会在 GitHub CI 中为项目维护者提交的 PR 自动运行。详见[测试工作流](.github/workflows/tests.yml)。 ### 格式化本项目使用 [ruff](https://github.com/astral-sh/ruff) 进行格式化和代码检查： ``` uv run ruff format uv run ruff check --fix ``` ### 文档要使用 [pdoc](https://github.com/pdoc3/pdoc) 在本地生成文档： ``` uv sync --all-extras --group docs uv run --active pdoc --skip-errors --html --output-dir=docs livekit ```

LiveKit 生态系统
Agents SDKs	Python · Node.js
LiveKit SDKs	Browser · Swift · Android · Flutter · React Native · Rust · Node.js · Python · Unity · Unity (WebGL) · ESP32 · C++
入门应用	Python Agent · TypeScript Agent · React App · SwiftUI App · Android App · Flutter App · React Native App · Web Embed
UI 组件	React · Android Compose · SwiftUI · Flutter
Server APIs	Node.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (社区) · .NET (社区)
资源	文档 · 文档 MCP Server · CLI · LiveKit Cloud
LiveKit Server OSS	LiveKit server · Egress · Ingress · SIP
社区	开发者社区 · Slack · X · YouTube

标签：WebRTC, 人工智能, 多模态代理, 实时音视频, 开发框架, 用户模式Hook绕过, 语音AI, 逆向工具