metadata

tags:
  - Text-to-Video

zeroscope_v2 30x448x256

Modelscope without the watermark, in a ratio close to 16:9 with a smoother output. Trained at 30 frames, 448x256 resolution
Trained with 9923 clips and 29,769 tagged frames

This low-res modelscope model is intended to be upscaled with potat1 using vid2vid in the 1111 text2video extension by kabachuha

example output upscaled to 1152 x 640 with potat1

1111 text2video extension usage

Rename zeroscope_v2_30x448x256.pth to text2video_pytorch_model.pth
Rename zeroscope_v2_30x448x256_text.bin to open_clip_pytorch_model.bin
Replace files in stable-diffusion-webui\models\ModelScope\t2v

Upscaling

I recommend upscaling this using vid2vid in the 1111 extension to 1152x640 with a denoise strength between 0.66 and 0.85. Use the same prompt and settings used to create the original clip.

Known issues

Using a lower resolution or fewer frames will result in a worse output
Many clips come out with cuts. This will be fixed soon with 2.1 with a much cleaner dataset
Some clips come out too slow, and might need prompt engineering to be faster in pace

Thanks to camenduru, kabachuha, ExponentialML, polyware, tin2tin