Lightning-AI/pytorch-lightning

GitHub: Lightning-AI/pytorch-lightning

PyTorch Lightning是一个深度学习框架,通过抽象训练工程代码让研究者专注于模型逻辑,支持从单卡到数千GPU的无缝扩展。

Stars: 31121 | Forks: 3719

Lightning

**用于预训练和微调AI模型的深度学习框架。** **需要部署模型?** 使用 [LitServe](https://github.com/Lightning-AI/litserve?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) 用纯 Python 构建自定义推理服务器。

快速开始示例PyTorch LightningFabricLightning Cloud社区文档

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pytorch-lightning)](https://pypi.org/project/pytorch-lightning/) [![PyPI Status](https://badge.fury.io/py/pytorch-lightning.svg)](https://badge.fury.io/py/pytorch-lightning) [![PyPI - Downloads](https://img.shields.io/pypi/dm/pytorch-lightning)](https://pepy.tech/project/pytorch-lightning) [![Conda](https://img.shields.io/conda/v/conda-forge/lightning?label=conda&color=success)](https://anaconda.org/conda-forge/lightning) [![codecov](https://codecov.io/gh/Lightning-AI/pytorch-lightning/graph/badge.svg?token=SmzX8mnKlA)](https://codecov.io/gh/Lightning-AI/pytorch-lightning) [![Discord](https://img.shields.io/discord/1077906959069626439?style=plastic)](https://discord.gg/VptPCZkGNa) ![GitHub commit activity](https://img.shields.io/github/commit-activity/w/lightning-ai/lightning) [![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/pytorch-lightning/blob/master/LICENSE)
高级安装选项 #### 安装可选依赖 ``` pip install lightning['extra'] ``` #### Conda ``` conda install lightning -c conda-forge ``` #### 安装稳定版本 从源码安装未来版本 ``` pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/release/stable.zip -U ``` #### 安装最新版本 从源码安装每日构建版本(无保证) ``` pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U ``` 或从 testing PyPI 安装 ``` pip install -iU https://test.pypi.org/simple/ pytorch-lightning ```
### PyTorch Lightning 示例 定义训练工作流程。这是一个示例([探索真实示例](https://lightning.ai/lightning-ai/studios?view=public§ion=featured&query=pytorch+lightning&utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme)): ``` # main.py # ! pip install torchvision import torch, torch.nn as nn, torch.utils.data as data, torchvision as tv, torch.nn.functional as F import lightning as L # -------------------------------- # 第一步:定义 LightningModule # -------------------------------- # LightningModule(nn.Module 子类)定义了一个完整的*系统* # (即:LLM、扩散模型、自编码器或简单的图像分类器)。 class LitAutoEncoder(L.LightningModule): def __init__(self): super().__init__() self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3)) self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28)) def forward(self, x): # in lightning, forward defines the prediction/inference actions embedding = self.encoder(x) return embedding def training_step(self, batch, batch_idx): # training_step defines the train loop. It is independent of forward x, _ = batch x = x.view(x.size(0), -1) z = self.encoder(x) x_hat = self.decoder(z) loss = F.mse_loss(x_hat, x) self.log("train_loss", loss) return loss def configure_optimizers(self): optimizer = torch.optim.Adam(self.parameters(), lr=1e-3) return optimizer # ------------------- # 第二步:定义数据 # ------------------- dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor()) train, val = data.random_split(dataset, [55000, 5000]) # ------------------- # 第三步:训练 # ------------------- autoencoder = LitAutoEncoder() trainer = L.Trainer() trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val)) ``` 在终端运行模型 ``` pip install torchvision python main.py ```   # 从 PyTorch 迁移到 PyTorch Lightning PyTorch Lightning 就是组织后的 PyTorch - Lightning 将 PyTorch 代码解耦,使科学和工程分离。 ![PT to PL](https://raw.githubusercontent.com/Lightning-AI/pytorch-lightning/master/docs/source-pytorch/_static/images/general/pl_quick_start_full_compressed.gif)   ### 示例 探索使用 PyTorch Lightning 可以进行的各种训练类型。预训练和微调任何类型的模型来执行分类、分割、摘要等任何任务: | 任务 | 描述 | 运行 | |------|--------------|-----| | [Hello world](https://lightning.ai/lightning-ai/studios/pytorch-lightning-hello-world?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 预训练 - Hello world 示例 | OpenInStudio | | [图像分类](https://lightning.ai/lightning-ai/studios/image-classification-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - ResNet-34 模型对汽车图像进行分类 | OpenInStudio | | [图像分割](https://lightning.ai/lightning-ai/studios/image-segmentation-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - ResNet-50 模型分割图像 | OpenInStudio | | [目标检测](https://lightning.ai/lightning-ai/studios/object-detection-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - Faster R-CNN 模型检测目标 | OpenInStudio | | [文本分类](https://lightning.ai/lightning-ai/studios/text-classification-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 文本分类器(BERT 模型) | OpenInStudio | | [文本摘要](https://lightning.ai/lightning-ai/studios/text-summarization-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 文本摘要(Hugging Face transformer 模型) | OpenInStudio | | [音频生成](https://lightning.ai/lightning-ai/studios/finetune-a-personal-ai-music-generator?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 音频生成器(transformer 模型) | OpenInStudio | | [LLM 微调](https://lightning.ai/lightning-ai/studios/finetune-an-llm-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - LLM(Meta Llama 3.1 8B) | OpenInStudio | | [图像生成](https://lightning.ai/lightning-ai/studios/train-a-diffusion-model-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 预训练 - 图像生成器(diffusion 模型) | OpenInStudio | | [推荐系统](https://lightning.ai/lightning-ai/studios/recommendation-system-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 训练 - 推荐系统(因式分解和嵌入) | OpenInStudio | | [时间序列预测](https://lightning.ai/lightning-ai/studios/time-series-forecasting-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 训练 - 使用 LSTM 进行时间序列预测 | OpenInStudio | ## 高级功能 Lightning 拥有超过 [40+ 高级功能](https://lightning.ai/docs/pytorch/stable/common/trainer.html?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme#trainer-flags),专为大规模专业 AI 研究设计。 以下是一些示例:
无需更改代码即可在数千个 GPU 上训练 ``` # 8 个 GPU # 无需代码更改 trainer = Trainer(accelerator="gpu", devices=8) # 256 个 GPU trainer = Trainer(accelerator="gpu", devices=8, num_nodes=32) ```
无需更改代码即可在其他加速器(如 TPU)上训练 ``` # 无需代码更改 trainer = Trainer(accelerator="tpu", devices=8) ```
16 位精度 ``` # 无需代码更改 trainer = Trainer(precision=16) ```
实验管理器 ``` from lightning import loggers # litlogger trainer = Trainer(logger=LitLogger()) # tensorboard trainer = Trainer(logger=TensorBoardLogger("logs/")) # weights and biases trainer = Trainer(logger=loggers.WandbLogger()) # comet trainer = Trainer(logger=loggers.CometLogger()) # mlflow trainer = Trainer(logger=loggers.MLFlowLogger()) # ... 以及数十个更多 ```
早停 ``` es = EarlyStopping(monitor="val_loss") trainer = Trainer(callbacks=[es]) ```
检查点保存 ``` checkpointing = ModelCheckpoint(monitor="val_loss") trainer = Trainer(callbacks=[checkpointing]) ```
导出到 torchscript (JIT)(生产使用) ``` # torchscript autoencoder = LitAutoEncoder() torch.jit.save(autoencoder.to_torchscript(), "model.pt") ```
导出到 ONNX(生产使用) ``` # onnx with tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as tmpfile: autoencoder = LitAutoEncoder() input_sample = torch.randn((1, 64)) autoencoder.to_onnx(tmpfile.name, input_sample, export_params=True) os.path.isfile(tmpfile.name) ```
## 相比非结构化 PyTorch 的优势 - 模型与硬件无关 - 代码清晰易读,因为工程代码被抽象化了 - 更容易复现 - 减少错误,因为 Lightning 处理了复杂的工程部分 - 保持所有灵活性(LightningModules 仍然是 PyTorch 模块),但减少了大量样板代码 - Lightning 与流行的机器学习工具集成了数十种集成。 - [每个新 PR 都经过严格测试](https://github.com/Lightning-AI/lightning/tree/master/tests)。我们测试了所有支持的 PyTorch 和 Python 版本组合、每个操作系统、多 GPU 甚至 TPU。 - 极低的运行速度开销(与纯 PyTorch 相比每个 epoch 约 300 毫秒)。
阅读 PyTorch Lightning 文档
    # Lightning Fabric:专家级控制 在任何设备上、任何规模下运行,对 PyTorch 训练循环和扩展策略拥有专家级控制。您甚至可以编写自己的 Trainer。 Fabric 专为最复杂的模型设计,如基础模型扩展、LLM、diffusion、transformer、强化学习、主动学习。任何规模的模型。
需要修改的内容 结果 Fabric 代码(复制我的!)
``` + import lightning as L import torch; import torchvision as tv dataset = tv.datasets.CIFAR10("data", download=True, train=True, transform=tv.transforms.ToTensor()) + fabric = L.Fabric() + fabric.launch() model = tv.models.resnet18() optimizer = torch.optim.SGD(model.parameters(), lr=0.001) - device = "cuda" if torch.cuda.is_available() else "cpu" - model.to(device) + model, optimizer = fabric.setup(model, optimizer) dataloader = torch.utils.data.DataLoader(dataset, batch_size=8) + dataloader = fabric.setup_dataloaders(dataloader) model.train() num_epochs = 10 for epoch in range(num_epochs): for batch in dataloader: inputs, labels = batch - inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = torch.nn.functional.cross_entropy(outputs, labels) - loss.backward() + fabric.backward(loss) optimizer.step() print(loss.data) ``` ``` import lightning as L import torch; import torchvision as tv dataset = tv.datasets.CIFAR10("data", download=True, train=True, transform=tv.transforms.ToTensor()) fabric = L.Fabric() fabric.launch() model = tv.models.resnet18() optimizer = torch.optim.SGD(model.parameters(), lr=0.001) model, optimizer = fabric.setup(model, optimizer) dataloader = torch.utils.data.DataLoader(dataset, batch_size=8) dataloader = fabric.setup_dataloaders(dataloader) model.train() num_epochs = 10 for epoch in range(num_epochs): for batch in dataloader: inputs, labels = batch optimizer.zero_grad() outputs = model(inputs) loss = torch.nn.functional.cross_entropy(outputs, labels) fabric.backward(loss) optimizer.step() print(loss.data) ```
## 关键特性
轻松切换从 CPU 到 GPU(Apple Silicon、CUDA…)、TPU、多 GPU 甚至多节点训练 ``` # 使用您可用的硬件 # 无需代码更改 fabric = Fabric() # 在 GPU 上运行(CUDA 或 MPS) fabric = Fabric(accelerator="gpu") # 8 个 GPU fabric = Fabric(accelerator="gpu", devices=8) # 256 个 GPU,多节点 fabric = Fabric(accelerator="gpu", devices=8, num_nodes=32) # 在 TPU 上运行 fabric = Fabric(accelerator="tpu") ```
开箱即用使用最先进的分布式训练策略(DDP、FSDP、DeepSpeed)和混合精度 ``` # 使用最先进的分布式训练技术 fabric = Fabric(strategy="ddp") fabric = Fabric(strategy="deepspeed") fabric = Fabric(strategy="fsdp") # 切换精度 fabric = Fabric(precision="16-mixed") fabric = Fabric(precision="64") ```
所有设备逻辑样板代码都为您处理好了 ``` # no more of this! - model.to(device) - batch.to(device) ```
使用 Fabric 原语构建您自己的自定义 Trainer,用于训练检查点、日志记录等 ``` import lightning as L class MyCustomTrainer: def __init__(self, accelerator="auto", strategy="auto", devices="auto", precision="32-true"): self.fabric = L.Fabric(accelerator=accelerator, strategy=strategy, devices=devices, precision=precision) def fit(self, model, optimizer, dataloader, max_epochs): self.fabric.launch() model, optimizer = self.fabric.setup(model, optimizer) dataloader = self.fabric.setup_dataloaders(dataloader) model.train() for epoch in range(max_epochs): for batch in dataloader: input, target = batch optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) self.fabric.backward(loss) optimizer.step() ``` 您可以在我们的[示例](     ## 示例 ###### 自监督学习 - [CPC transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#cpc-transforms) - [Moco v2 transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#moco-v2-transforms) - [SimCLR transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#simclr-transforms) ###### 卷积架构 - [GPT-2](https://lightning-bolts.readthedocs.io/en/stable/models/convolutional.html#gpt-2) - [UNet](https://lightning-bolts.readthedocs.io/en/stable/models/convolutional.html#unet) ###### 强化学习 - [DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#dqn-loss) - [Double DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#double-dqn-loss) - [Per DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#per-dqn-loss) ###### GANs - [Basic GAN](https://lightning-bolts.readthedocs.io/en/stable/models/gans.html#basic-gan) - [DCGAN](https://lightning-bolts.readthedocs.io/en/stable/models/gans.html#dcgan) ###### 经典机器学习 - [Logistic Regression](https://lightning-bolts.readthedocs.io/en/stable/models/classic_ml.html#logistic-regression) - [Linear Regression](https://lightning-bolts.readthedocs.io/en/stable/models/classic_ml.html#linear-regression)     ## 持续集成 Lightning 在多个 CPU、GPU 和 TPU 上进行了严格测试,并与主要的 Python 和 PyTorch 版本进行了对比。 ###### \*Codecov 覆盖率超过 90%,但构建延迟可能显示较低
当前构建状态
| 系统 / PyTorch 版本 | 1.13 | 2.0 | 2.1 | | :--------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | Linux py3.9 \[GPU\] | | | [![Build Status](https://dev.azure.com/Lightning-AI/lightning/_apis/build/status%2Fpytorch-lightning%20%28GPUs%29?branchName=master)](https://dev.azure.com/Lightning-AI/lightning/_build/latest?definitionId=24&branchName=master) | | Linux(多个 Python 版本) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | | OSX(多个 Python 版本) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | | Windows(多个 Python 版本) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [![Test PyTorch](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/77426b626e181131.svg)](https://github.com/Lightning-AI/lightning/workflows/ci-tests-pytorch.yml) |
    ## 社区 Lightning 社区由以下人员维护: - [10+ 核心贡献者](https://lightning.ai/docs/pytorch/latest/community/governance.html),他们都是来自顶级 AI 实验室的专业工程师、研究科学家和博士生的组合。 - 800+ 社区贡献者。 想帮助我们构建 Lightning 并为数千名研究人员减少样板代码吗?[了解如何做出您的第一个贡献](https://lightning.ai/docs/pytorch/stable/generated/CONTRIBUTING.html?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) Lightning 也是 [PyTorch 生态系统](https://pytorch.org/ecosystem/) 的一部分,该生态系统要求项目有可靠的测试、文档和支持。 ### 寻求帮助 如果您有任何问题,请: 1. [阅读文档](https://lightning.ai/docs?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme)。 2. [搜索现有讨论](https://github.com/Lightning-AI/lightning/discussions),或[提出新问题](https://github.com/Lightning-AI/lightning/discussions/new) 3. [加入我们的 Discord](https://discord.com/invite/tfXFetEZxv)。
标签:AI模型部署, Apex, CNCF毕业项目, GPU计算, Lightning, Python, PyTorch, Vectored Exception Handling, 人工智能框架, 凭据扫描, 分布式训练, 多GPU训练, 大规模训练, 并行计算, 微调, 无后门, 机器学习, 模型优化, 模型训练, 深度学习, 深度学习框架, 神经网络, 训练框架, 逆向工具, 预训练, 高性能计算