NihalAhmadKhan/Voice-AI-Agent

GitHub: NihalAhmadKhan/Voice-AI-Agent

基于Agno框架和Google Gemini的语音控制多智能体Linux桌面助手，通过语音指令自动路由到专业智能体执行系统操作、应用管理、网页搜索和文件管理任务。

Stars: 0 | Forks: 0

# 🎙️ 语音控制多智能体 AI 助手基于 **Agno 框架** + **Google Gemini** 模型构建。说出指令 → AI 将其路由到合适的专业智能体 → 你的 Linux 桌面执行指令 → 助手进行语音回复。 ## 🏗️ 系统架构 ``` Your Voice │ ▼ Speech Recognition (Google STT via microphone) │ ▼ ┌─────────────────────────────────────────────┐ │ Orchestrator Agent (Gemini Flash) │ │ Routes to the correct specialist │ └──────┬─────────────┬────────────┬────────────┘ │ │ │ │ ┌────▼───┐ ┌──────▼──┐ ┌─────▼──┐ ┌─────▼────┐ │ App │ │ System │ │ Web │ │ File │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │ │ │ │ │ │ │ │ Open / │ │Shutdown │ │ Open │ │ Create / │ │ Close │ │Restart │ │ sites │ │ List / │ │ apps │ │Sleep │ │ Search │ │ Rename / │ │ │ │Lock │ │ Maps │ │ Delete │ │ │ │Volume │ │ │ │ files │ └────────┘ └─────────┘ └────────┘ └──────────┘ │ ▼ Text-to-Speech (pyttsx3 → espeak on Linux) │ ▼ 🔊 You hear the reply ``` ## ⚡ 快速开始 (Linux) ### 1. 安装系统依赖 **Ubuntu / Debian:** ``` sudo apt update sudo apt install -y python3-pip python3-venv \ python3-pyaudio portaudio19-dev \ espeak espeak-ng \ pulseaudio-utils ``` **Fedora:** ``` sudo dnf install -y python3-pip python3-virtualenv \ portaudio-devel python3-pyaudio \ espeak espeak-ng ``` **Arch / Manjaro:** ``` sudo pacman -S python-pip python-virtualenv \ portaudio python-pyaudio \ espeak-ng ``` ### 2. 创建虚拟环境 ``` python3 -m venv venv source venv/bin/activate ``` ### 3. 安装 Python 依赖 ``` pip install -r requirements.txt ``` ### 4. 设置你的 API Key ``` cp .env.example .env nano .env # paste your GOOGLE_API_KEY ``` ### 5. 运行！ ``` # 语音模式 (实时麦克风) python main.py # 文本模式 (无需麦克风 — 适合测试) python main.py --text ``` ## 🗣️ 示例指令 ### 📱 打开 / 关闭应用 | 说出这个 | 会发生什么 | |----------|-------------| | "Open Firefox" | 启动 Firefox | | "Open VS Code" | 启动 Visual Studio Code | | "Open the terminal" | 打开 GNOME Terminal | | "Open calculator" | 打开 GNOME Calculator | | "Open file manager" | 打开 Nautilus | | "Close Spotify" | 终止 Spotify 进程 | ### ⚡ 系统控制 | 说出这个 | 会发生什么 | |----------|-------------| | "Shut down the computer" | 调用 `systemctl poweroff` | | "Restart my computer" | 调用 `systemctl reboot` | | "Put it to sleep" | 调用 `systemctl suspend` | | "Lock my screen" | 运行 `loginctl lock-session` | | "Set volume to 60" | 将 PulseAudio/PipeWire 音量设置为 60% | | "Mute the sound" | 静音默认音频输出设备 | | "What's my system info?" | 读取 CPU、RAM、磁盘使用情况 | | "How's my battery?" | 报告电池电量和状态 | ### 🌐 网页与搜索 | 说出这个 | 会发生什么 | |----------|-------------| | "Open YouTube" | 在浏览器中打开 youtube.com | | "Search for Python tutorials" | Google 搜索 | | "Play lo-fi music on YouTube" | YouTube 搜索 | | "Open GitHub" | 打开 github.com | | "Find directions to Connaught Place" | Google Maps 搜索 | ### 📁 文件管理 | 说出这个 | 会发生什么 | |----------|-------------| | "Create a folder called Projects on Desktop" | 创建该文件夹 | | "Show me what's on my Desktop" | 列出文件和文件夹 | | "Create a file called notes.txt" | 创建文本文件 | | "Open my Documents folder" | 在 Nautilus 中打开 | | "Delete old_stuff from Downloads" | 将其删除 | ## 🗂️ 项目结构 ``` voice_assistant/ ├── main.py # Voice loop + TTS ├── requirements.txt ├── .env.example ├── README.md └── agents/ ├── __init__.py ├── orchestrator.py # Routes to specialist agents ├── app_agent.py # Open/close Linux apps ├── system_agent.py # Power, volume, lock, info ├── web_agent.py # Websites & search └── file_agent.py # File & folder operations ``` ## 🛠️ 故障排除 **未检测到麦克风** ``` # 检查麦克风是否被识别 python3 -c "import speech_recognition as sr; print(sr.Microphone.list_microphone_names())" # 或者使用 ALSA 列出设备 arecord -l ``` **"systemctl poweroff" 需要密码** ``` # 为您的用户允许无密码关机 sudo visudo # 添加此行： yourusername ALL=(ALL) NOPASSWD: /bin/systemctl poweroff, /bin/systemctl reboot, /bin/systemctl suspend ``` **无 TTS 语音 / 无声音输出** ``` sudo apt install espeak espeak-ng # Ubuntu/Debian # 测试它： espeak "hello world" ``` **音量控制无效** ``` # 检查 PipeWire 或 PulseAudio 是否正在运行 pactl info # 如果缺失则安装 sudo apt install pulseaudio pulseaudio-utils ``` **找不到应用** 该应用必须位于你的 `$PATH` 中。请进行测试： ``` which google-chrome # or firefox, code, etc. ``` ## 🔧 自定义 ### 添加你自己的应用编辑 `agents/app_agent.py` → `LINUX_APPS`： ``` LINUX_APPS = { ... "my app": "myapp", # key = voice name, value = terminal command } ``` ### 添加网站编辑 `agents/web_agent.py` → `POPULAR_SITES`： ``` POPULAR_SITES = { ... "my site": "https://mysite.com", } ``` ### 使用更强大的 Gemini 模型在任何智能体文件中，修改： ``` model=Gemini(id="gemini-2.0-flash-001") # fast & free model=Gemini(id="gemini-1.5-pro-001") # smarter model=Gemini(id="gemini-2.5-pro-preview-0325") # best reasoning ``` ## 📜 许可证 MIT — 可免费使用和修改。

标签：Agno框架, AI代理, espeak, Linux桌面, Python, pyttsx3, STT, TTS, 人工智能, 人机交互, 多智能体, 开源AI, 无后门, 智能体编排, 智能路由, 桌面助手, 用户模式Hook绕过, 系统控制, 网络调试, 自动化, 语音助手, 语音合成, 语音控制, 语音识别, 跨应用控制, 逆向工具