cerspense's picture
Update README.md
7678ab8
|
raw
history blame
1.57 kB
metadata
tags:
  - Text-to-Video

model example

zeroscope_v2 30x448x256

Modelscope without the watermark, in a ratio close to 16:9 with a smoother output. Trained at 30 frames, 448x256 resolution
Trained with 9923 clips and 29,769 tagged frames

This low-res modelscope model is intended to be upscaled with potat1 using vid2vid in the 1111 text2video extension by kabachuha

example output upscaled to 1152 x 640 with potat1

1111 text2video extension usage

  1. Rename zeroscope_v2_30x448x256.pth to text2video_pytorch_model.pth
  2. Rename zeroscope_v2_30x448x256_text.bin to open_clip_pytorch_model.bin
  3. Replace files in stable-diffusion-webui\models\ModelScope\t2v

Upscaling

I recommend upscaling this using vid2vid in the 1111 extension to 1152x640 with a denoise strength between 0.66 and 0.85. Use the same prompt and settings used to create the original clip.

Known issues

Using a lower resolution or fewer frames will result in a worse output
Many clips come out with cuts. This will be fixed soon with 2.1 with a much cleaner dataset
Some clips come out too slow, and might need prompt engineering to be faster in pace

Thanks to camenduru, kabachuha, ExponentialML, polyware, tin2tin