tags:
- Text-to-Video
zeroscope_v2 30x448x256
Modelscope without the watermark, in a ratio close to 16:9 with a smoother output.
Trained at 30 frames, 448x256 resolution
Trained with 9923 clips and 29,769 tagged frames
This low-res modelscope model is intended to be upscaled with potat1 using vid2vid in the 1111 text2video extension by kabachuha
example output upscaled to 1152 x 640 with potat1
1111 text2video extension usage
- Rename zeroscope_v2_30x448x256.pth to text2video_pytorch_model.pth
- Rename zeroscope_v2_30x448x256_text.bin to open_clip_pytorch_model.bin
- Replace files in stable-diffusion-webui\models\ModelScope\t2v
Upscaling
I recommend upscaling this using vid2vid in the 1111 extension to 1152x640 with a denoise strength between 0.66 and 0.85. Use the same prompt and settings used to create the original clip.
Known issues
Using a lower resolution or fewer frames will result in a worse output
Many clips come out with cuts. This will be fixed soon with 2.1 with a much cleaner dataset
Some clips come out too slow, and might need prompt engineering to be faster in pace
Thanks to camenduru, kabachuha, ExponentialML, polyware, tin2tin