kevinwang676's picture
Upload folder using huggingface_hub
fb4fac3 verified

A newer version of the Gradio SDK is available: 5.9.1

Upgrade

ExVideo

ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended Stable Video Diffusion to achieve the generation of long videos up to 128 frames.

Example: Text-to-video via extended Stable Video Diffusion

Generate a video using a text-to-image model and our image-to-video model. See ExVideo_svd_test.py.

https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc

Train

  • Step 1: Install additional packages
pip install lightning deepspeed
  • Step 2: Download base model (from HuggingFace or ModelScope) to models/stable_video_diffusion/svd_xt.safetensors.

  • Step 3: Prepare datasets

path/to/your/dataset
β”œβ”€β”€ metadata.json
└── videos
    β”œβ”€β”€ video_1.mp4
    β”œβ”€β”€ video_2.mp4
    └── video_3.mp4

where the metadata.json is

[
    {
        "path": "videos/video_1.mp4"
    },
    {
        "path": "videos/video_2.mp4"
    },
    {
        "path": "videos/video_3.mp4"
    }
]
  • Step 4: Run
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -u ExVideo_svd_train.py \
  --pretrained_path "models/stable_video_diffusion/svd_xt.safetensors" \
  --dataset_path "path/to/your/dataset" \
  --output_path "path/to/save/models" \
  --steps_per_epoch 8000 \
  --num_frames 128 \
  --height 512 \
  --width 512 \
  --dataloader_num_workers 2 \
  --learning_rate 1e-5 \
  --max_epochs 100
  • Step 5: Post-process checkpoints

Calculate Exponential Moving Average (EMA) and package it using safetensors.

python ExVideo_ema.py --output_path "path/to/save/models/lightning_logs/version_xx" --gamma 0.9
  • Step 6: Enjoy your model

The EMA model is at path/to/save/models/lightning_logs/version_xx/checkpoints/epoch=xx-step=yyy-ema.safetensors. Load it in ExVideo_svd_test.py and then enjoy your model.