KoolCogVideoX

Running

request for information on how finetuning was done

by GeeveGeorge - opened Sep 19

Sep 19

hope you guys may kindly provide some documentation on how you fine-tuned the 2b and 5b models.
and what resources did it take to fine-tune those models.

bertjiazheng

Owner Sep 20

Thanks for your interest. Both 2B and 5B models are fine-tuned using about 40k video clips and 4 NVIDIA A800 devices.

GeeveGeorge

Sep 20

@bertjiazheng thanks for the clarification. I had some more questions.

How long did it take to train (training time) on the 4x A800 GPUs.
What were the duration of the 40k clips on average (and resolution , where the half HD or full HD or some particular resolution)
Could you provide the training/finetuning script for cogivideoX, or if it is available publicly could you link it here.?

bertjiazheng

Owner Sep 20

I train both models and ensure that each video is observed once (about 2500 iterations). It takes about 1 day for 2B models and 2 days for 5B models, respectively.
The durations range from 3 to 10 seconds. The video is at least 720x480, as CogVideo can only handle this resolution.
I directly use the official fine-tuning script. You can follow the instructions here.

bertjiazheng changed discussion status to closed Sep 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment