**用于预训练和微调AI模型的深度学习框架。**
**需要部署模型?** 使用 [LitServe](https://github.com/Lightning-AI/litserve?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) 用纯 Python 构建自定义推理服务器。
快速开始 •
示例 •
PyTorch Lightning •
Fabric •
Lightning Cloud •
社区 •
文档
[](https://pypi.org/project/pytorch-lightning/)
[](https://badge.fury.io/py/pytorch-lightning)
[](https://pepy.tech/project/pytorch-lightning)
[](https://anaconda.org/conda-forge/lightning)
[](https://codecov.io/gh/Lightning-AI/pytorch-lightning)
[](https://discord.gg/VptPCZkGNa)

[](https://github.com/Lightning-AI/pytorch-lightning/blob/master/LICENSE)
高级安装选项
#### 安装可选依赖
```
pip install lightning['extra']
```
#### Conda
```
conda install lightning -c conda-forge
```
#### 安装稳定版本
从源码安装未来版本
```
pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/release/stable.zip -U
```
#### 安装最新版本
从源码安装每日构建版本(无保证)
```
pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U
```
或从 testing PyPI 安装
```
pip install -iU https://test.pypi.org/simple/ pytorch-lightning
```
### PyTorch Lightning 示例
定义训练工作流程。这是一个示例([探索真实示例](https://lightning.ai/lightning-ai/studios?view=public§ion=featured&query=pytorch+lightning&utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme)):
```
# main.py
# ! pip install torchvision
import torch, torch.nn as nn, torch.utils.data as data, torchvision as tv, torch.nn.functional as F
import lightning as L
# --------------------------------
# 第一步:定义 LightningModule
# --------------------------------
# LightningModule(nn.Module 子类)定义了一个完整的*系统*
# (即:LLM、扩散模型、自编码器或简单的图像分类器)。
class LitAutoEncoder(L.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 3))
self.decoder = nn.Sequential(nn.Linear(3, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
def training_step(self, batch, batch_idx):
# training_step defines the train loop. It is independent of forward
x, _ = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = F.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer
# -------------------
# 第二步:定义数据
# -------------------
dataset = tv.datasets.MNIST(".", download=True, transform=tv.transforms.ToTensor())
train, val = data.random_split(dataset, [55000, 5000])
# -------------------
# 第三步:训练
# -------------------
autoencoder = LitAutoEncoder()
trainer = L.Trainer()
trainer.fit(autoencoder, data.DataLoader(train), data.DataLoader(val))
```
在终端运行模型
```
pip install torchvision
python main.py
```
# 从 PyTorch 迁移到 PyTorch Lightning
PyTorch Lightning 就是组织后的 PyTorch - Lightning 将 PyTorch 代码解耦,使科学和工程分离。

### 示例
探索使用 PyTorch Lightning 可以进行的各种训练类型。预训练和微调任何类型的模型来执行分类、分割、摘要等任何任务:
| 任务 | 描述 | 运行 |
|------|--------------|-----|
| [Hello world](https://lightning.ai/lightning-ai/studios/pytorch-lightning-hello-world?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 预训练 - Hello world 示例 |

|
| [图像分类](https://lightning.ai/lightning-ai/studios/image-classification-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - ResNet-34 模型对汽车图像进行分类 |

|
| [图像分割](https://lightning.ai/lightning-ai/studios/image-segmentation-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - ResNet-50 模型分割图像 |

|
| [目标检测](https://lightning.ai/lightning-ai/studios/object-detection-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - Faster R-CNN 模型检测目标 |

|
| [文本分类](https://lightning.ai/lightning-ai/studios/text-classification-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 文本分类器(BERT 模型) |

|
| [文本摘要](https://lightning.ai/lightning-ai/studios/text-summarization-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 文本摘要(Hugging Face transformer 模型) |

|
| [音频生成](https://lightning.ai/lightning-ai/studios/finetune-a-personal-ai-music-generator?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - 音频生成器(transformer 模型) |

|
| [LLM 微调](https://lightning.ai/lightning-ai/studios/finetune-an-llm-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 微调 - LLM(Meta Llama 3.1 8B) |

|
| [图像生成](https://lightning.ai/lightning-ai/studios/train-a-diffusion-model-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 预训练 - 图像生成器(diffusion 模型) |

|
| [推荐系统](https://lightning.ai/lightning-ai/studios/recommendation-system-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 训练 - 推荐系统(因式分解和嵌入) |

|
| [时间序列预测](https://lightning.ai/lightning-ai/studios/time-series-forecasting-with-pytorch-lightning?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) | 训练 - 使用 LSTM 进行时间序列预测 |

|
## 高级功能
Lightning 拥有超过 [40+ 高级功能](https://lightning.ai/docs/pytorch/stable/common/trainer.html?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme#trainer-flags),专为大规模专业 AI 研究设计。
以下是一些示例:
无需更改代码即可在数千个 GPU 上训练
```
# 8 个 GPU
# 无需代码更改
trainer = Trainer(accelerator="gpu", devices=8)
# 256 个 GPU
trainer = Trainer(accelerator="gpu", devices=8, num_nodes=32)
```
无需更改代码即可在其他加速器(如 TPU)上训练
```
# 无需代码更改
trainer = Trainer(accelerator="tpu", devices=8)
```
16 位精度
```
# 无需代码更改
trainer = Trainer(precision=16)
```
实验管理器
```
from lightning import loggers
# litlogger
trainer = Trainer(logger=LitLogger())
# tensorboard
trainer = Trainer(logger=TensorBoardLogger("logs/"))
# weights and biases
trainer = Trainer(logger=loggers.WandbLogger())
# comet
trainer = Trainer(logger=loggers.CometLogger())
# mlflow
trainer = Trainer(logger=loggers.MLFlowLogger())
# ... 以及数十个更多
```
早停
```
es = EarlyStopping(monitor="val_loss")
trainer = Trainer(callbacks=[es])
```
检查点保存
```
checkpointing = ModelCheckpoint(monitor="val_loss")
trainer = Trainer(callbacks=[checkpointing])
```
导出到 torchscript (JIT)(生产使用)
```
# torchscript
autoencoder = LitAutoEncoder()
torch.jit.save(autoencoder.to_torchscript(), "model.pt")
```
导出到 ONNX(生产使用)
```
# onnx
with tempfile.NamedTemporaryFile(suffix=".onnx", delete=False) as tmpfile:
autoencoder = LitAutoEncoder()
input_sample = torch.randn((1, 64))
autoencoder.to_onnx(tmpfile.name, input_sample, export_params=True)
os.path.isfile(tmpfile.name)
```
## 相比非结构化 PyTorch 的优势
- 模型与硬件无关
- 代码清晰易读,因为工程代码被抽象化了
- 更容易复现
- 减少错误,因为 Lightning 处理了复杂的工程部分
- 保持所有灵活性(LightningModules 仍然是 PyTorch 模块),但减少了大量样板代码
- Lightning 与流行的机器学习工具集成了数十种集成。
- [每个新 PR 都经过严格测试](https://github.com/Lightning-AI/lightning/tree/master/tests)。我们测试了所有支持的 PyTorch 和 Python 版本组合、每个操作系统、多 GPU 甚至 TPU。
- 极低的运行速度开销(与纯 PyTorch 相比每个 epoch 约 300 毫秒)。
# Lightning Fabric:专家级控制
在任何设备上、任何规模下运行,对 PyTorch 训练循环和扩展策略拥有专家级控制。您甚至可以编写自己的 Trainer。
Fabric 专为最复杂的模型设计,如基础模型扩展、LLM、diffusion、transformer、强化学习、主动学习。任何规模的模型。
| 需要修改的内容 |
结果 Fabric 代码(复制我的!) |
|
```
+ import lightning as L
import torch; import torchvision as tv
dataset = tv.datasets.CIFAR10("data", download=True,
train=True,
transform=tv.transforms.ToTensor())
+ fabric = L.Fabric()
+ fabric.launch()
model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
- device = "cuda" if torch.cuda.is_available() else "cpu"
- model.to(device)
+ model, optimizer = fabric.setup(model, optimizer)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
+ dataloader = fabric.setup_dataloaders(dataloader)
model.train()
num_epochs = 10
for epoch in range(num_epochs):
for batch in dataloader:
inputs, labels = batch
- inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.cross_entropy(outputs, labels)
- loss.backward()
+ fabric.backward(loss)
optimizer.step()
print(loss.data)
```
|
```
import lightning as L
import torch; import torchvision as tv
dataset = tv.datasets.CIFAR10("data", download=True,
train=True,
transform=tv.transforms.ToTensor())
fabric = L.Fabric()
fabric.launch()
model = tv.models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
model, optimizer = fabric.setup(model, optimizer)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=8)
dataloader = fabric.setup_dataloaders(dataloader)
model.train()
num_epochs = 10
for epoch in range(num_epochs):
for batch in dataloader:
inputs, labels = batch
optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.cross_entropy(outputs, labels)
fabric.backward(loss)
optimizer.step()
print(loss.data)
```
|
## 关键特性
轻松切换从 CPU 到 GPU(Apple Silicon、CUDA…)、TPU、多 GPU 甚至多节点训练
```
# 使用您可用的硬件
# 无需代码更改
fabric = Fabric()
# 在 GPU 上运行(CUDA 或 MPS)
fabric = Fabric(accelerator="gpu")
# 8 个 GPU
fabric = Fabric(accelerator="gpu", devices=8)
# 256 个 GPU,多节点
fabric = Fabric(accelerator="gpu", devices=8, num_nodes=32)
# 在 TPU 上运行
fabric = Fabric(accelerator="tpu")
```
开箱即用使用最先进的分布式训练策略(DDP、FSDP、DeepSpeed)和混合精度
```
# 使用最先进的分布式训练技术
fabric = Fabric(strategy="ddp")
fabric = Fabric(strategy="deepspeed")
fabric = Fabric(strategy="fsdp")
# 切换精度
fabric = Fabric(precision="16-mixed")
fabric = Fabric(precision="64")
```
所有设备逻辑样板代码都为您处理好了
```
# no more of this!
- model.to(device)
- batch.to(device)
```
使用 Fabric 原语构建您自己的自定义 Trainer,用于训练检查点、日志记录等
```
import lightning as L
class MyCustomTrainer:
def __init__(self, accelerator="auto", strategy="auto", devices="auto", precision="32-true"):
self.fabric = L.Fabric(accelerator=accelerator, strategy=strategy, devices=devices, precision=precision)
def fit(self, model, optimizer, dataloader, max_epochs):
self.fabric.launch()
model, optimizer = self.fabric.setup(model, optimizer)
dataloader = self.fabric.setup_dataloaders(dataloader)
model.train()
for epoch in range(max_epochs):
for batch in dataloader:
input, target = batch
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
self.fabric.backward(loss)
optimizer.step()
```
您可以在我们的[示例](
## 示例
###### 自监督学习
- [CPC transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#cpc-transforms)
- [Moco v2 transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#moco-v2-transforms)
- [SimCLR transforms](https://lightning-bolts.readthedocs.io/en/stable/transforms/self_supervised.html#simclr-transforms)
###### 卷积架构
- [GPT-2](https://lightning-bolts.readthedocs.io/en/stable/models/convolutional.html#gpt-2)
- [UNet](https://lightning-bolts.readthedocs.io/en/stable/models/convolutional.html#unet)
###### 强化学习
- [DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#dqn-loss)
- [Double DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#double-dqn-loss)
- [Per DQN Loss](https://lightning-bolts.readthedocs.io/en/stable/losses.html#per-dqn-loss)
###### GANs
- [Basic GAN](https://lightning-bolts.readthedocs.io/en/stable/models/gans.html#basic-gan)
- [DCGAN](https://lightning-bolts.readthedocs.io/en/stable/models/gans.html#dcgan)
###### 经典机器学习
- [Logistic Regression](https://lightning-bolts.readthedocs.io/en/stable/models/classic_ml.html#logistic-regression)
- [Linear Regression](https://lightning-bolts.readthedocs.io/en/stable/models/classic_ml.html#linear-regression)
## 持续集成
Lightning 在多个 CPU、GPU 和 TPU 上进行了严格测试,并与主要的 Python 和 PyTorch 版本进行了对比。
###### \*Codecov 覆盖率超过 90%,但构建延迟可能显示较低
当前构建状态
| 系统 / PyTorch 版本 | 1.13 | 2.0 | 2.1 |
| :--------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| Linux py3.9 \[GPU\] | | | [](https://dev.azure.com/Lightning-AI/lightning/_build/latest?definitionId=24&branchName=master) |
| Linux(多个 Python 版本) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) |
| OSX(多个 Python 版本) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) |
| Windows(多个 Python 版本) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/actions/workflows/ci-tests-pytorch.yml) | [](https://github.com/Lightning-AI/lightning/workflows/ci-tests-pytorch.yml) |
## 社区
Lightning 社区由以下人员维护:
- [10+ 核心贡献者](https://lightning.ai/docs/pytorch/latest/community/governance.html),他们都是来自顶级 AI 实验室的专业工程师、研究科学家和博士生的组合。
- 800+ 社区贡献者。
想帮助我们构建 Lightning 并为数千名研究人员减少样板代码吗?[了解如何做出您的第一个贡献](https://lightning.ai/docs/pytorch/stable/generated/CONTRIBUTING.html?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme)
Lightning 也是 [PyTorch 生态系统](https://pytorch.org/ecosystem/) 的一部分,该生态系统要求项目有可靠的测试、文档和支持。
### 寻求帮助
如果您有任何问题,请:
1. [阅读文档](https://lightning.ai/docs?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme)。
2. [搜索现有讨论](https://github.com/Lightning-AI/lightning/discussions),或[提出新问题](https://github.com/Lightning-AI/lightning/discussions/new)
3. [加入我们的 Discord](https://discord.com/invite/tfXFetEZxv)。