cactus-compute/cactus

GitHub: cactus-compute/cactus

面向移动设备和可穿戴设备的低延迟 AI 推理引擎，提供跨平台 SDK 和 OpenAI 兼容 API，支持大语言模型、视觉和语音识别的端侧运行。

Stars: 5344 | Forks: 429

# Cactus

[![Docs](https://img.shields.io/badge/Docs-555?style=for-the-badge&logo=readthedocs&logoColor=white)][docs-url] [![Website](https://img.shields.io/badge/Website-555?style=for-the-badge&logo=safari&logoColor=white)][website-url] [![GitHub](https://img.shields.io/badge/GitHub-555?style=for-the-badge&logo=github&logoColor=white)][github-url] [![HuggingFace](https://img.shields.io/badge/HuggingFace-555?style=for-the-badge&logo=huggingface&logoColor=white)][hf-url] [![Reddit](https://img.shields.io/badge/Reddit-555?style=for-the-badge&logo=reddit&logoColor=white)][reddit-url] [![Blog](https://img.shields.io/badge/Blog-555?style=for-the-badge&logo=hashnode&logoColor=white)][blog-url] 一款面向移动设备和可穿戴设备的混合低延迟高效能 AI 引擎。 ``` ┌─────────────────┐ │ Cactus Engine │ ←── OpenAI-compatible APIs for all major languages └─────────────────┘ Chat, vision, STT, RAG, tool call, cloud handoff │ ┌─────────────────┐ │ Cactus Graph │ ←── Zero-copy computation graph (PyTorch for mobile) └─────────────────┘ Custom models, optimised for RAM & quantisation │ ┌─────────────────┐ │ Cactus Kernels │ ←── ARM SIMD kernels (Apple, Snapdragon, Exynos, etc) └─────────────────┘ Custom attention, KV-cache quant, chunked prefill ``` ## 快速演示 - 步骤 1：`brew install cactus-compute/cactus/cactus` - 步骤 2：`cactus transcribe` 或 `cactus run` ## Cactus 引擎 ``` #include cactus.h cactus_model_t model = cactus_init( "path/to/weight/folder", "path to txt or dir of txts for auto-rag", ); const char* messages = R"([ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "My name is Henry Ndubuaku"} ])"; const char* options = R"({ "max_tokens": 50, "stop_sequences": ["<|im_end|>"] })"; char response[4096]; int result = cactus_complete( model, // model handle messages, // JSON chat messages response, // response buffer sizeof(response), // buffer size options, // generation options nullptr, // tools JSON nullptr, // streaming callback nullptr // user data ); ``` Gemma3-270m 的示例响应 ``` { "success": true, // generation succeeded "error": null, // error details if failed "cloud_handoff": false, // true if cloud model used "response": "Hi there!", "function_calls": [], // parsed tool calls "confidence": 0.8193, // model confidence "time_to_first_token_ms": 45.23, "total_time_ms": 163.67, "prefill_tps": 1621.89, "decode_tps": 168.42, "ram_usage_mb": 245.67, "prefill_tokens": 28, "decode_tokens": 50, "total_tokens": 78 } ``` ## Cactus Graph ``` #include cactus.h CactusGraph graph; auto a = graph.input({2, 3}, Precision::FP16); auto b = graph.input({3, 4}, Precision::INT8); auto x1 = graph.matmul(a, b, false); auto x2 = graph.transpose(x1); auto result = graph.matmul(b, x2, true); float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f}; float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; graph.set_input(a, a_data, Precision::FP16); graph.set_input(b, b_data, Precision::INT8); graph.execute(); void* output_data = graph.get_output(result); graph.hard_reset(); ``` ## API 与 SDK 参考 | 参考 | 语言 | 描述 | |-----------|----------|-------------| | [Engine API](cactus_engine.md) | C | Chat completion, streaming, tool calling, transcription, embeddings, RAG, vision, VAD, vector index, cloud handoff | | [Graph API](cactus_graph.md) | C++ | 张量运算、矩阵乘法、注意力机制、归一化、激活函数 | | [Python SDK](/python/) | Python | Mac, Linux | | [Swift SDK](/apple/) | Swift | iOS, macOS, tvOS, watchOS, Android | | [Kotlin SDK](/android/) | Kotlin | Android, iOS (via KMP) | | [Flutter SDK](/flutter/) | Dart | iOS, macOS, Android | | [Rust SDK](/rust/) | Rust | Mac, Linux | | [React Native](https://github.com/cactus-compute/cactus-react-native) | JavaScript | iOS, Android | ## 基准测试 - 所有权重均为 INT4 量化 - LFM：1k-prefill / 100-decode，数值为 prefill tps / decode tps - LFM-VL：256px 输入，数值为延迟 / decode tps - Parakeet：30s 音频输入，数值为延迟 / decode tps - 缺失延迟 = 尚不支持 NPU | 设备 | LFM 1.2B | LFMVL 1.6B | Parakeet 1.1B | RAM | |--------|----------|------------|---------------|-----| | Mac M4 Pro | 582/100 | 0.2s/98 | 0.1s/900k+ | 76MB | | iPad/Mac M3 | 350/60 | 0.3s/69 | 0.3s/800k+ | 70MB | | iPhone 17 Pro | 327/48 | 0.3s/48 | 0.3s/300k+ | 108MB | | iPhone 13 Mini | 148/34 | 0.3s/35 | 0.7s/90k+ | 1GB | | Galaxy S25 Ultra | 255/37 | -/34 | -/250k+ | 1.5GB | | Pixel 6a | 70/15 | -/15 | -/17k+ | 1GB | | Galaxy A17 5G | 32/10 | -/11 | -/40k+ | 727MB | | CMF Phone 2 Pro | - | - | - | - | | Raspberry Pi 5 | 69/11 | 13.3s/11 | 4.5s/180k+ | 869MB | ## 路线图 | 日期 | 状态 | 里程碑 | |------|--------|-----------| | Sep 2025 | Done | Released v1 | | Oct 2025 | Done | Chunked prefill, KVCache Quant (2x prefill) | | Nov 2025 | Done | Cactus Attention (10 & 1k prefill = same decode) | | Dec 2025 | Done | 团队增至 +6 名研究工程师 | | Jan 2026 | Done | Apple NPU/RAM, iOS/Mac 速度提升 5-11 倍 | | Feb 2026 | Done | 混合推理, INT4, 无损量化 (1.5x) | | Mar 2026 | Coming | Qualcomm/Google NPU, Android 速度提升 5-11 倍 | | Apr 2026 | Coming | Mediatek/Exynos NPU, Cactus@ICLR | | May 2026 | Coming | Kernel→C++, Graph/Engine→Rust, Mac GPU & VR | | Jun 2026 | Coming | Torch/JAX 模型转译器 | | Jul 2026 | Coming | 可穿戴设备优化, Cactus@ICML | | Aug 2026 | Coming | 编排 (Orchestration) | | Sep 2026 | Coming | 完整 Cactus 论文, 芯片制造商合作伙伴 | ## 使用此仓库 ``` ┌──────────────────────────────────────────────────────────────────────────────┐ │ │ │ Step 0: if on Linux (Ubuntu/Debian) │ │ sudo apt-get install python3 python3-venv python3-pip cmake │ │ build-essential libcurl4-openssl-dev │ │ │ │ Step 1: clone and setup │ │ git clone https://github.com/cactus-compute/cactus && cd cactus │ │ source ./setup │ │ │ │ Step 2: use the commands │ │──────────────────────────────────────────────────────────────────────────────│ │ │ │ cactus auth manage Cloud API key │ │ --status show key status │ │ --clear remove saved key │ │ │ │ cactus run opens playground (auto downloads) │ │ --precision INT4|INT8|FP16 quantization (default: INT4) │ │ --token HF token (gated models) │ │ --reconvert force reconversion from source │ │ │ │ cactus transcribe [model] live mic transcription (parakeet-1.1b) │ │ --file transcribe file instead of mic │ │ --precision INT4|INT8|FP16 quantization (default: INT4) │ │ --token HF token (gated models) │ │ --reconvert force reconversion from source │ │ │ │ cactus download downloads model to ./weights │ │ --precision INT4|INT8|FP16 quantization (default: INT4) │ │ --token HuggingFace API token │ │ --reconvert force reconversion from source │ │ │ │ cactus convert [dir] convert model, supports LoRA merge │ │ --precision INT4|INT8|FP16 quantization (default: INT4) │ │ --lora LoRA adapter to merge │ │ --token HuggingFace API token │ │ │ │ cactus build build for ARM → build/libcactus.a │ │ --apple Apple (iOS/macOS) │ │ --android Android │ │ --flutter Flutter (all platforms) │ │ --python shared lib for Python FFI │ │ │ │ cactus test run unit tests and benchmarks │ │ --model default: LFM2-VL-450M │ │ --transcribe_model default: moonshine-base │ │ --benchmark use larger models │ │ --precision INT4|INT8|FP16 regenerate weights with precision │ │ --reconvert force reconversion from source │ │ --no-rebuild skip building library │ │ --llm / --stt / --performance run specific test suite │ │ --ios run on connected iPhone │ │ --android run on connected Android │ │ │ │ cactus clean remove all build artifacts │ │ cactus --help show all commands and flags │ │ │ └──────────────────────────────────────────────────────────────────────────────┘ ``` ## 支持的模型 | 模型 | 功能 | |-------|----------| | google/gemma-3-270m-it | completion | | google/functiongemma-270m-it | completion, tools | | LiquidAI/LFM2-350M | completion, tools, embed | | Qwen/Qwen3-0.6B | completion, tools, embed | | LiquidAI/LFM2-700M | completion, tools, embed | | LiquidAI/LFM2-8B-A1B | completion, tools, embed | | google/gemma-3-1b-it | completion | | LiquidAI/LFM2.5-1.2B-Thinking | completion, tools, embed | | LiquidAI/LFM2.5-1.2B-Instruct | completion, tools, embed | | Qwen/Qwen3-1.7B | completion, tools, embed | | LiquidAI/LFM2-2.6B | completion, tools, embed | | LiquidAI/LFM2-VL-450M | 视觉, 文本和图像嵌入, Apple NPU | | LiquidAI/LFM2.5-VL-1.6B | 视觉, 文本和图像嵌入, Apple NPU | | UsefulSensors/moonshine-base | 转录, 语音嵌入 | | openai/whisper-small | 转录, 语音嵌入, Apple NPU | | openai/whisper-medium | 转录, 语音嵌入, Apple NPU | | nvidia/parakeet-ctc-0.6b | 转录, 语音嵌入, Apple NPU | | nvidia/parakeet-ctc-1.1b | 转录, 语音嵌入, Apple NPU | | snakers4/silero-vad | vad | | nomic-ai/nomic-embed-text-v2-moe | embed | | Qwen/Qwen3-Embedding-0.6B | embed | ## 维护组织 1. [Cactus Compute, Inc. (YC S25)](https://cactuscompute.com/) 2. [UCLA's BruinAI](https://bruinai.org/) 3. [Char (YC S25)](https://char.com/) 4. [耶鲁大学 AI 协会](https://www.yale-ai.org/team) 5. [新加坡国立大学 AI 协会](https://www.nusaisociety.org/) 6. [UC Irvine's AI@UCI](https://aiclub.ics.uci.edu/) 7. [帝国理工学院 AI 协会](https://www.imperialcollegeunion.org/csp/1391) 8. [宾夕法尼亚大学 AI@Penn](https://ai-at-penn-main-105.vercel.app/) 9. [密歇根大学安娜堡分校 MSAIL](https://msail.github.io/) 10. [科罗拉多大学博尔德分校 AI 俱乐部](https://www.cuaiclub.org/) ## 引用如果您在研究中使用 Cactus，请按如下方式引用： ``` @software{cactus, title = {Cactus: AI Inference Engine for Phones & Wearables}, author = {Ndubuaku, Henry and Cactus Team}, url = {https://github.com/cactus-compute/cactus}, year = {2025} } ``` **注意：** 请向上滚动并点击徽章链接以获取资源！

标签：ARM SIMD, Bash脚本, LLM推理引擎, OpenAI兼容API, PyTorch Mobile, Spyse API, Tool Calling, 低延迟推理, 可穿戴设备, 可视化界面, 大模型移动端部署, 客户端加密, 客户端加密, 数据可视化, 检索增强生成, 模型量化, 混合AI, 移动开发框架, 移动端AI引擎, 端云协同, 端侧计算, 能量效率, 计算机视觉, 语音识别, 边缘AI, 逆向工具, 零拷贝计算图