Update README.md
#12
by
feifeiobama
- opened
README.md
CHANGED
|
@@ -11,7 +11,7 @@ tags:
|
|
| 11 |
|
| 12 |
# ⚡️Pyramid Flow⚡️
|
| 13 |
|
| 14 |
-
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow)
|
| 15 |
|
| 16 |
This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
|
| 17 |
|
|
@@ -31,11 +31,24 @@ This is the official repository for Pyramid Flow, a training-efficient **Autoreg
|
|
| 31 |
## News
|
| 32 |
|
| 33 |
* `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
|
|
|
|
| 34 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
| 35 |
|
| 36 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
```python
|
| 41 |
from huggingface_hub import snapshot_download
|
|
@@ -44,6 +57,8 @@ model_path = 'PATH' # The local directory to save downloaded checkpoint
|
|
| 44 |
snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
|
| 45 |
```
|
| 46 |
|
|
|
|
|
|
|
| 47 |
To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
|
| 48 |
|
| 49 |
```python
|
|
@@ -53,7 +68,7 @@ from pyramid_dit import PyramidDiTForVideoGeneration
|
|
| 53 |
from diffusers.utils import load_image, export_to_video
|
| 54 |
|
| 55 |
torch.cuda.set_device(0)
|
| 56 |
-
model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16
|
| 57 |
|
| 58 |
model = PyramidDiTForVideoGeneration(
|
| 59 |
'PATH', # The downloaded checkpoint dir
|
|
@@ -80,9 +95,10 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
|
|
| 80 |
height=768,
|
| 81 |
width=1280,
|
| 82 |
temp=16, # temp=16: 5s, temp=31: 10s
|
| 83 |
-
guidance_scale=9.0, # The guidance for the first frame
|
| 84 |
video_guidance_scale=5.0, # The guidance for the other video latent
|
| 85 |
output_type="pil",
|
|
|
|
| 86 |
)
|
| 87 |
|
| 88 |
export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
|
|
@@ -102,12 +118,15 @@ with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
|
|
| 102 |
temp=16,
|
| 103 |
video_guidance_scale=4.0,
|
| 104 |
output_type="pil",
|
|
|
|
| 105 |
)
|
| 106 |
|
| 107 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
| 108 |
```
|
| 109 |
|
| 110 |
-
|
|
|
|
|
|
|
| 111 |
|
| 112 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
| 113 |
* The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.
|
|
|
|
| 11 |
|
| 12 |
# ⚡️Pyramid Flow⚡️
|
| 13 |
|
| 14 |
+
[[Paper]](https://arxiv.org/abs/2410.05954) [[Project Page ✨]](https://pyramid-flow.github.io) [[Code 🚀]](https://github.com/jy0205/Pyramid-Flow) [[demo 🤗](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow)]
|
| 15 |
|
| 16 |
This is the official repository for Pyramid Flow, a training-efficient **Autoregressive Video Generation** method based on **Flow Matching**. By training only on open-source datasets, it generates high-quality 10-second videos at 768p resolution and 24 FPS, and naturally supports image-to-video generation.
|
| 17 |
|
|
|
|
| 31 |
## News
|
| 32 |
|
| 33 |
* `COMING SOON` ⚡️⚡️⚡️ Training code and new model checkpoints trained from scratch.
|
| 34 |
+
* `2024.10.11` 🤗🤗🤗 [Hugging Face demo](https://huggingface.co/spaces/Pyramid-Flow/pyramid-flow) is available. Thanks [@multimodalart](https://huggingface.co/multimodalart) for the commit!
|
| 35 |
* `2024.10.10` 🚀🚀🚀 We release the [technical report](https://arxiv.org/abs/2410.05954), [project page](https://pyramid-flow.github.io) and [model checkpoint](https://huggingface.co/rain1011/pyramid-flow-sd3) of Pyramid Flow.
|
| 36 |
|
| 37 |
+
## Installation
|
| 38 |
+
|
| 39 |
+
We recommend setting up the environment with conda. The codebase currently uses Python 3.8.10 and PyTorch 2.1.2, and we are actively working to support a wider range of versions.
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
git clone https://github.com/jy0205/Pyramid-Flow
|
| 43 |
+
cd Pyramid-Flow
|
| 44 |
+
|
| 45 |
+
# create env using conda
|
| 46 |
+
conda create -n pyramid python==3.8.10
|
| 47 |
+
conda activate pyramid
|
| 48 |
+
pip install -r requirements.txt
|
| 49 |
+
```
|
| 50 |
|
| 51 |
+
Then, you can directly download the model from [Huggingface](https://huggingface.co/rain1011/pyramid-flow-sd3). We provide both model checkpoints for 768p and 384p video generation. The 384p checkpoint supports 5-second video generation at 24FPS, while the 768p checkpoint supports up to 10-second video generation at 24FPS.
|
| 52 |
|
| 53 |
```python
|
| 54 |
from huggingface_hub import snapshot_download
|
|
|
|
| 57 |
snapshot_download("rain1011/pyramid-flow-sd3", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
|
| 58 |
```
|
| 59 |
|
| 60 |
+
## Usage
|
| 61 |
+
|
| 62 |
To use our model, please follow the inference code in `video_generation_demo.ipynb` at [this link](https://github.com/jy0205/Pyramid-Flow/blob/main/video_generation_demo.ipynb). We further simplify it into the following two-step procedure. First, load the downloaded model:
|
| 63 |
|
| 64 |
```python
|
|
|
|
| 68 |
from diffusers.utils import load_image, export_to_video
|
| 69 |
|
| 70 |
torch.cuda.set_device(0)
|
| 71 |
+
model_dtype, torch_dtype = 'bf16', torch.bfloat16 # Use bf16 (not support fp16 yet)
|
| 72 |
|
| 73 |
model = PyramidDiTForVideoGeneration(
|
| 74 |
'PATH', # The downloaded checkpoint dir
|
|
|
|
| 95 |
height=768,
|
| 96 |
width=1280,
|
| 97 |
temp=16, # temp=16: 5s, temp=31: 10s
|
| 98 |
+
guidance_scale=9.0, # The guidance for the first frame, set it to 7 for 384p variant
|
| 99 |
video_guidance_scale=5.0, # The guidance for the other video latent
|
| 100 |
output_type="pil",
|
| 101 |
+
save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
|
| 102 |
)
|
| 103 |
|
| 104 |
export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
|
|
|
|
| 118 |
temp=16,
|
| 119 |
video_guidance_scale=4.0,
|
| 120 |
output_type="pil",
|
| 121 |
+
save_memory=True, # If you have enough GPU memory, set it to `False` to improve vae decoding speed
|
| 122 |
)
|
| 123 |
|
| 124 |
export_to_video(frames, "./image_to_video_sample.mp4", fps=24)
|
| 125 |
```
|
| 126 |
|
| 127 |
+
We also support CPU offloading to allow inference with **less than 12GB** of GPU memory by adding a `cpu_offloading=True` parameter. This feature was contributed by [@Ednaordinary](https://github.com/Ednaordinary), see [#23](https://github.com/jy0205/Pyramid-Flow/pull/23) for details.
|
| 128 |
+
|
| 129 |
+
## Usage tips
|
| 130 |
|
| 131 |
* The `guidance_scale` parameter controls the visual quality. We suggest using a guidance within [7, 9] for the 768p checkpoint during text-to-video generation, and 7 for the 384p checkpoint.
|
| 132 |
* The `video_guidance_scale` parameter controls the motion. A larger value increases the dynamic degree and mitigates the autoregressive generation degradation, while a smaller value stabilizes the video.
|