request for information on how finetuning was done

#1
by GeeveGeorge - opened

hope you guys may kindly provide some documentation on how you fine-tuned the 2b and 5b models.
and what resources did it take to fine-tune those models.

Thanks for your interest. Both 2B and 5B models are fine-tuned using about 40k video clips and 4 NVIDIA A800 devices.

@bertjiazheng thanks for the clarification. I had some more questions.

  1. How long did it take to train (training time) on the 4x A800 GPUs.
  2. What were the duration of the 40k clips on average (and resolution , where the half HD or full HD or some particular resolution)
  3. Could you provide the training/finetuning script for cogivideoX, or if it is available publicly could you link it here.?
  1. I train both models and ensure that each video is observed once (about 2500 iterations). It takes about 1 day for 2B models and 2 days for 5B models, respectively.
  2. The durations range from 3 to 10 seconds. The video is at least 720x480, as CogVideo can only handle this resolution.
  3. I directly use the official fine-tuning script. You can follow the instructions here.
bertjiazheng changed discussion status to closed

Sign up or log in to comment