hpcaitech/Open-Sora
GitHub: hpcaitech/Open-Sora
开源视频生成大模型项目,以低成本实现接近Sora级别的文本/图像到视频生成能力
Stars: 28669 | Forks: 2905
加速的训练、推理等功能。我们的模型仅需 3 天训练即可生成 2 秒的 512x512 视频。[[checkpoints]](#open-sora-10-model-weights)
[[博客]](https://hpc-ai.com/blog/open-sora-v1.0) [[报告]](/docs/report_01.md)
- **[2024.03.04]** Open-Sora 提供了成本降低 46% 的训练方案。
[[博客]](https://hpc-ai.com/blog/open-sora)
📍 由于 Open-Sora 处于活跃开发中,我们保留了不同版本的不同分支。最新版本是 [main](https://github.com/hpcaitech/Open-Sora)。旧版本包括:[v1.0](https://github.com/hpcaitech/Open-Sora/tree/opensora/v1.0)、[v1.1](https://github.com/hpcaitech/Open-Sora/tree/opensora/v1.1)、[v1.2](https://github.com/hpcaitech/Open-Sora/tree/opensora/v1.2)、[v1.3](https://github.com/hpcaitech/Open-Sora/tree/opensora/v1.3)。
## 🎥 最新演示
为了方便起见,演示以压缩的 GIF 格式呈现。如需原始质量的样本及其对应的提示词,请访问我们的 [图库](https://hpcaitech.github.io/Open-Sora/)。
| **5s 1024×576** | **5s 576×1024** | **5s 576×1024** |
| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| [
](https://streamable.com/e/8g9y9h?autoplay=1) | [
](https://streamable.com/e/k50mnv?autoplay=1) | [
](https://streamable.com/e/bzrn9n?autoplay=1) |
| [
](https://streamable.com/e/dsv8da?autoplay=1) | [
](https://streamable.com/e/3wif07?autoplay=1) | [
](https://streamable.com/e/us2w7h?autoplay=1) |
| [
](https://streamable.com/e/yfwk8i?autoplay=1) | [
](https://streamable.com/e/jgjil0?autoplay=1) | [
](https://streamable.com/e/lsoai1?autoplay=1) |
OpenSora 1.3 演示
| **5s 720×1280** | **5s 720×1280** | **5s 720×1280** | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [
](https://streamable.com/e/r0imrp?quality=highest&autoplay=1) | [
](https://streamable.com/e/hfvjkh?quality=highest&autoplay=1) | [
](https://streamable.com/e/kutmma?quality=highest&autoplay=1) |
| [
](https://streamable.com/e/osn1la?quality=highest&autoplay=1) | [
](https://streamable.com/e/l1pzws?quality=highest&autoplay=1) | [
](https://streamable.com/e/2vqari?quality=highest&autoplay=1) |
| [
](https://streamable.com/e/1in7d6?quality=highest&autoplay=1) | [
](https://streamable.com/e/e9bi4o?quality=highest&autoplay=1) | [
](https://streamable.com/e/09z7xi?quality=highest&autoplay=1) |
| [
](https://streamable.com/e/16c3hk?quality=highest&autoplay=1) | [
](https://streamable.com/e/wi250w?quality=highest&autoplay=1) | [
](https://streamable.com/e/vw5b64?quality=highest&autoplay=1) |
OpenSora 1.2 演示
| **4s 720×1280** | **4s 720×1280** | **4s 720×1280** | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/7895aab6-ed23-488c-8486-091480c26327) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/20f07c7b-182b-4562-bbee-f1df74c86c9a) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/3d897e0d-dc21-453a-b911-b3bda838acc2) |
| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/644bf938-96ce-44aa-b797-b3c0b513d64c) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/272d88ac-4b4a-484d-a665-8d07431671d0) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/ebbac621-c34e-4bb4-9543-1c34f8989764) |
| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/a1e3a1a3-4abd-45f5-8df2-6cced69da4ca) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/d6ce9c13-28e1-4dff-9644-cc01f5f11926) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/561978f8-f1b0-4f4d-ae7b-45bec9001b4a) |
OpenSora 1.1 演示
| **2s 240×426** | **2s 240×426** | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) | [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c31ebc52-de39-4a4e-9b1e-9211d45e05b2) |
| [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/f7ce4aaa-528f-40a8-be7a-72e61eaacbbd) | [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/5d58d71e-1fda-4d90-9ad3-5f2f7b75c6a9) |
| **2s 426×240** | **4s 480×854** |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/34ecb4a0-4eef-4286-ad4c-8e3a87e5a9fd) | [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/c1619333-25d7-42ba-a91c-18dbc1870b18)| **16s 320×320** | **16s 224×448** | **2s 426×240** |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/3cab536e-9b43-4b33-8da8-a0f9cf842ff2) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/9fb0b9e0-c6f4-4935-b29e-4cac10b373c4) | [
](https://github.com/hpcaitech/Open-Sora-dev/assets/99191637/3e892ad2-9543-4049-b005-643a4c1bf3bf) |
OpenSora 1.0 演示
| **2s 512×512** | **2s 512×512** | **2s 512×512** | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16) |
| 森林地区宁静的夜景。[...] 视频是延时摄影,捕捉了从白天到夜晚的过渡,湖泊和森林作为恒定的背景。 | 无人机升空镜头捕捉了海岸悬崖的壮丽美景,[...] 水面轻轻拍打着岩石基部,绿色植物依附在悬崖顶部。 | 瀑布从悬崖倾泻而下落入宁静湖泊的壮丽美景。[...] 镜头角度提供了瀑布的鸟瞰图。 |
| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) |
| 夜晚繁忙的城市街道,充满了汽车前灯的光辉和路灯的环境光。[...] | 向日葵田野的充满活力的美景。向日葵排列整齐,营造出一种秩序感和对称感。[...] | 宁静的水下场景,一只海龟游过珊瑚礁。海龟有着绿褐色的壳 [...] |
视频被下采样为 `.gif` 以便展示。点击查看原始视频。提示词为便于展示进行了修剪,
请参阅[此处](/assets/texts/t2v_samples.txt)查看完整提示词。
|
|
|
### 提示词优化
我们利用 ChatGPT 来优化提示词。您可以使用以下命令来优化提示词。该功能适用于文本生成视频和图像生成视频。
```
export OPENAI_API_KEY=sk-xxxx
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --refine-prompt True
```
### 可复现性
为了使结果可复现,您可以通过以下方式设置随机种子:
```
torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/t2i2v_256px.py --save-dir samples --prompt "raining, sea" --sampling_option.seed 42 --seed 42
```
使用 `--num-sample k` 为每个提示词生成 `k` 个样本。
## 计算效率
我们在 H100/H800 GPU 上测试了文本生成视频的计算效率。对于 256x256,我们使用 colossalai 的 tensor parallelism,并使用了 `--offload True`。对于 768x768,我们使用 colossalai 的 sequence parallelism。所有测试均使用步数 50。结果以以下格式呈现:$\color{blue}{\text{总时间 (秒)}}/\color{red}{\text{峰值 GPU 显存 (GB)}}$
| 分辨率 | 1x GPU | 2x GPUs | 4x GPUs | 8x GPUs |
| ---------- | -------------------------------------- | ------------------------------------- | ------------------------------------- | ------------------------------------- |
| 256x256 | $\color{blue}{60}/\color{red}{52.5}$ | $\color{blue}{40}/\color{red}{44.3}$ | $\color{blue}{34}/\color{red}{44.3}$ | |
| 768x768 | $\color{blue}{1656}/\color{red}{60.3}$ | $\color{blue}{863}/\color{red}{48.3}$ | $\color{blue}{466}/\color{red}{44.3}$ | $\color{blue}{276}/\color{red}{44.3}$ |
## 评估
在 [VBench](https://huggingface.co/spaces/Vchitect/VBench_Leaderboard) 上,与 Open-Sora 1.2 相比,Open-Sora 2.0 显著缩小了与 OpenAI Sora 的差距,从 4.52% → 0.69%。

人类偏好结果显示我们的模型与 HunyuanVideo 11B 和 Step-Video 30B 相当。

凭借强大的性能,Open-Sora 2.0 具有成本效益。

## 致谢
这里我们仅列出了部分项目。关于其他工作和数据集,请参阅我们的报告。
- [ColossalAI](https://github.com/hpcaitech/ColossalAI):一个强大的大模型并行加速和优化系统。
- [DiT](https://github.com/facebookresearch/DiT):基于 Transformers 的可扩展扩散模型。
- [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT):DiT 训练的加速方案。我们采用了 OpenDiT 中宝贵的加速策略来推进训练进度。
- [PixArt](https://github.com/PixArt-alpha/PixArt-alpha):一个开源的基于 DiT 的文本生成图像模型。
- [Flux](https://github.com/black-forest-labs/flux):一个强大的文本生成图像模型。
- [Latte](https://github.com/Vchitect/Latte):高效训练视频 DiT 的一种尝试。
- [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/tree/main?tab=readme-ov-file):开源文本生成视频模型。
- [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original):一个强大的图像 VAE 模型。
- [DC-AE](https://github.com/mit-han-lab/efficientvit):用于图像压缩的深度压缩自动编码器。
- [CLIP](https://github.com/openai/CLIP):一个强大的文本-图像嵌入模型。
- [T5](https://github.com/google-research/text-to-text-transfer-transformer):一个强大的文本编码器。
- [LLaVA](https://github.com/haotian-liu/LLaVA):一个基于 [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) 和 [Yi-34B](https://huggingface.co/01-ai/Yi-34B) 的强大图像描述模型。
- [PLLaVA](https://github.com/magic-research/PLLaVA):一个强大的视频描述模型。
- [MiraData](https://github.com/mira-space/MiraData):一个具有长时长和结构化描述的大规模视频数据集。
## 引用
```
@article{opensora,
title={Open-sora: Democratizing efficient video production for all},
author={Zheng, Zangwei and Peng, Xiangyu and Yang, Tianji and Shen, Chenhui and Li, Shenggui and Liu, Hongxin and Zhou, Yukun and Li, Tianyi and You, Yang},
journal={arXiv preprint arXiv:2412.20404},
year={2024}
}
@article{opensora2,
title={Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k},
author={Xiangyu Peng and Zangwei Zheng and Chenhui Shen and Tom Young and Xinying Guo and Binluo Wang and Hang Xu and Hongxin Liu and Mingyan Jiang and Wenjun Li and Yuhui Wang and Anbang Ye and Gang Ren and Qianran Ma and Wanying Liang and Xiang Lian and Xiwen Wu and Yuting Zhong and Zhuangyan Li and Chaoyu Gong and Guojun Lei and Leijun Cheng and Limin Zhang and Minghao Li and Ruijie Zhang and Silan Hu and Shijie Huang and Xiaokang Wang and Yuanheng Zhao and Yuqi Wang and Ziang Wei and Yang You},
year={2025},
journal={arXiv preprint arXiv:2503.09642},
}
```
## Star 趋势
[](https://star-history.com/#hpcaitech/Open-Sora&Date)标签:AIGC, AI视频, Apex, ColossalAI, HPC, Open-Sora, Sora替代, Transformer, 人工智能, 内容创作, 凭据扫描, 图生视频, 多模态大模型, 开源Sora, 扩散模型, 文生视频, 机器学习, 深度学习, 用户模式Hook绕过, 系统调用监控, 自动化视频, 视频制作, 视频合成, 视频生成, 逆向工具