NVIDIA/KAI-Scheduler

GitHub: NVIDIA/KAI-Scheduler

面向大规模 GPU 集群的 Kubernetes 原生调度器，为 AI 工作负载提供优化的资源分配和公平调度能力。

Stars: 1163 | Forks: 160

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) [![Coverage](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/c4ad765a36004108.svg)](https://github.com/NVIDIA/KAI-Scheduler/blob/main/.github/workflows/update-coverage-badge.yaml) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/NVIDIA/KAI-Scheduler) [![OpenSSF Best Practices](https://www.bestpractices.dev/projects/12064/badge)](https://www.bestpractices.dev/projects/12064) # KAI Scheduler KAI Scheduler 是一个健壮、高效且可扩展的 [Kubernetes scheduler](https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/)，旨在优化 AI 和机器学习工作负载的 GPU 资源分配。 KAI Scheduler 专为管理大规模 GPU 集群（包括数千个节点）和高吞吐量的工作负载而设计，使其成为广泛且苛刻环境的理想选择。 KAI Scheduler 允许 Kubernetes 集群管理员动态地将 GPU 资源分配给工作负载。 KAI Scheduler 支持整个 AI 生命周期，从需要最少资源的小型交互式作业到大型训练和推理，所有这些都可以在同一个集群内完成。它在确保最佳资源分配的同时，保持了不同消费者之间的资源公平性。它可以与集群上安装的其他调度器并存运行。 ## 最新消息 🔥 - [2025/11] **KubeCon NA 2025 演讲：** 观看演讲录像“[Lightning Talk: Mind the Topology: Smarter Scheduling for AI Workloads on Kubernetes](https://youtu.be/o5i7pTWZjfo?si=su5iTOAS4r4O1TPa)”，了解 KAI 的拓扑感知调度 (TAS) 如何为现代分离式服务架构优化布局。 - [2025/11] **与 [Grove](https://github.com/ai-dynamo/grove) 和 Dynamo 集成：** KAI 的拓扑感知和分层 Gang 调度能力已与 Grove 集成，以便大规模编排复杂的多组件工作负载，如分离式服务和代理管道。阅读 [博客文章](https://developer.nvidia.com/blog/streamline-complex-ai-inference-on-kubernetes-with-nvidia-grove/) 了解更多详情。 - [2025/10] **[v0.10.0 发布：](https://github.com/NVIDIA/KAI-Scheduler/releases/tag/v0.10.0)** 发布了主要功能，包括 [Topology-Aware Scheduling (TAS)](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/topology)、[Hierarchical PodGroups](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/hierarchical-podgroup) 和 [Time-based Fairshare](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/time-based-fairshare)。 - [2025/10] **KubeRay 集成：** KAI Scheduler 现已针对 [Kubernetes 上的 Ray 工作负载](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/kai-scheduler.html) 进行了原生集成。 - [2025/08] **基于时间的 Fairshare：** [基于时间的 Fairshare 提案](https://github.com/NVIDIA/KAI-Scheduler/blob/main/docs/developer/designs/time-based-fairshare/time-based-fairshare.md) 在 batch-wg 上进行了讨论。[观看录像。](https://zoom.us/rec/play/uW5ex5dmQP8_7UqOv5UjOGq8IqZeIa8AhKILqvDUQ6CnBAIdJjPY-BLfUWnoYblvDP-ZIvAp48p7XJNv.Cx5t7x1DwGqJgIYB?eagerLoadZvaPages=&accessLevel=meeting&canPlayFromShare=true&from=share_recording_detail&startTime=1755010542000&componentName=rec-play&originRequestUrl=https%3A%2F%2Fzoom.us%2Frec%2Fshare%2Frd_j_7ZDpC8lXxGNdQwguK2ZunoM3R93HR1Eo4A9rxD7b5lWSbmojDKc8OZ00ZMK.QxgEeMOxMcuiDkIY%3FstartTime%3D1755010542000) - [2025/04] **项目介绍：** [在 batch-wg 会议上介绍的 KAI Scheduler](https://zoom.us/rec/play/E1weaHroJpuTdXx6s9pjMu6oS78BiA53wsnvV9MWe_rIdwmDLFOG8J4XEPNW8-hIp4-HSFNdsbbP7mcv.YstbxFdS7z7tOfKw?eagerLoadZvaPages=&accessLevel=meeting&canPlayFromShare=true&from=share_recording_detail&startTime=1744124229000&componentName=rec-play&originRequestUrl=https%3A%2F%2Fzoom.us%2Frec%2Fshare%2FwP2WH6bqd7Dj8dupZD3YQTMWgG4AP5361_0h5vicI69LNb25JdQB8wn6fkvtLw2f.rLrRcQTSO1OCyRNu%3FstartTime%3D1744124229000) 的录像。 ## 关键特性 - [Batch Scheduling](docs/batch/README.md)：确保组内的所有 pod 要么同时被调度，要么完全不调度。 - Bin Packing & Spread Scheduling：通过减少碎片化或提高弹性和负载均衡来优化节点使用。 - [Workload Priority](docs/priority/README.md)：在队列内有效地确定工作负载的优先级。 - [Separation of workload priority and preemptibility](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/priority-preemptibility-separation)：支持将工作负载优先级和工作负载抢占性作为两个独立的策略进行分离。 - [Hierarchical Queues](docs/queues/README.md)：通过两级队列层级管理工作负载，实现灵活的组织控制。 - [Resource distribution](docs/fairness/README.md#resource-division-algorithm)：自定义每个队列的配额、超配权重、限制和优先级。 - [Fairness Policies](docs/fairness/README.md#reclaim-strategies)：利用主导资源公平性和跨队列的资源回收确保公平的资源分配。 - [Time-based Fairshare](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/time-based-fairshare)：随时间推移的公平资源使用，考虑历史使用情况、时间衰减和其他微调参数。 - [Min-guaranteed-runtime](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/min-runtime)：确保一个时间段，在此期间调度器不得抢占或回收正在运行的工作负载，即使它是可抢占的。 - Workload Consolidation：智能地重新分配正在运行的工作负载，以减少碎片化并提高集群利用率。 - [Elastic Workloads](docs/elastic/README.md)：在定义的最小和最大 pod 数量范围内动态调整工作负载。 - Dynamic Resource Allocation (DRA)：通过 Kubernetes ResourceClaims 支持特定供应商的硬件资源（例如，来自 NVIDIA 或 AMD 的 GPU）。 - [Topology-Aware Scheduling (TAS)](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/topology)：支持通过 [topology aware scheduling](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/topology-awareness) 和针对 [Hierarchical PodGroups](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/hierarchical-podgroup) 的分层拓扑感知调度进行优化布局。 - [Hierarchical PodGroups](https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/developer/designs/hierarchical-podgroup)：支持针对多级工作负载（如分布式和分离式工作负载，例如 Dynamo/Grove）的优化拓扑感知 Gang 调度。 - DRA 支持 - 支持 NVidia ComputeResources (GB200/GB300) 的 DRA。 - Workload signatures：KAI Scheduler 利用工作负载签名为大型多 pod 提交提供性能优化。 - Scheduler explainability：基于 K8S Events，调度过程的每个主要步骤都会被记录。 - [GPU Sharing](docs/gpu-sharing/README.md)：允许多个工作负载高效地共享单个或多个 GPU，最大化资源利用率。 - Cloud & On-premise Support：完全兼容动态云基础设施（包括像 Karpenter 这样的自动伸缩器）以及静态的本地部署。 ## 前置条件在安装 KAI Scheduler 之前，请确保您具备： - 一个正在运行的 Kubernetes 集群 - 已安装 [Helm](https://helm.sh/docs/intro/install) CLI - 已安装 [NVIDIA GPU-Operator](https://github.com/NVIDIA/gpu-operator)，以便调度请求 GPU 资源的工作负载 ## 安装说明 KAI Scheduler 将被安装在 `kai-scheduler` namespace 中。 ### 安装方式 KAI Scheduler 可以通过以下方式安装： - **从生产环境安装（推荐）** - **从源码安装（自行构建）** #### 从生产环境安装在 [releases](https://github.com/NVIDIA/KAI-Scheduler/releases) 页面找到最新的发布版本。在将 `` 替换为所需的发布版本后，运行以下命令： ``` helm upgrade -i kai-scheduler oci://ghcr.io/nvidia/kai-scheduler/kai-scheduler -n kai-scheduler --create-namespace --version ``` #### 从源码构建请遵循[此处](docs/developer/building-from-source.md)的说明 ## 特定版本说明 ### Openshift 当安装了 `gpu-operator`

标签：AI工作负载, Apache 2.0, DNS解析, Dynamo集成, EVTX分析, EVTX分析, Gang Scheduling, GPU资源管理, 人工智能, 任务调度, 分布式系统, 响应大小分析, 大规模集群, 子域名突变, 容器编排, 开源项目, 拓扑感知调度, 日志审计, 机器学习基础设施, 模型推理, 深度学习训练, 用户模式Hook绕过, 调度器, 资源分配, 高性能计算