--- license: apache-2.0 --- # ExVideo ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended CogVideoX-5B to generate videos up to 129 frames long. This is our second publicly released model, incorporating LoRA into the structure of CogVideoX-5B. * [Project Page](https://ecnu-cilab.github.io/ExVideoProjectPage/) * [Source Code](https://github.com/modelscope/DiffSynth-Studio) * [Technical report](https://arxiv.org/abs/2406.14130) ## Usages ```python from diffsynth import ModelManager, CogVideoPipeline, save_video, download_models import torch download_models(["CogVideoX-5B", "ExVideo-CogVideoX-LoRA-129f-v1"]) model_manager = ModelManager(torch_dtype=torch.bfloat16) model_manager.load_models([ "models/CogVideo/CogVideoX-5b/text_encoder", "models/CogVideo/CogVideoX-5b/transformer", "models/CogVideo/CogVideoX-5b/vae/diffusion_pytorch_model.safetensors", ]) model_manager.load_lora("models/lora/ExVideo-CogVideoX-LoRA-129f-v1.safetensors") pipe = CogVideoPipeline.from_model_manager(model_manager) torch.manual_seed(6) video = pipe( prompt="an astronaut riding a horse on Mars.", height=480, width=720, num_frames=129, cfg_scale=7.0, num_inference_steps=100, ) save_video(video, "video_with_lora.mp4", fps=8, quality=5) ``` Please refer to [DiffSynth](https://github.com/modelscope/DiffSynth-Studio) for more information. ## Examples Prompt: an astronaut riding a horse on Mars. Prompt: Static camera, two men shake hands happily, the background is in a modern office. Prompt: The camera captures the northern lights dancing across an Arctic sky, with stars twinkling above a snow-covered landscape, creating a serene and magical atmosphere. Prompt: FPV aerial shot, the sunshine shines on the snow capped mountains, a quiet atmosphere. Prompt: A Chinese mother, draped in a soft, pastel-colored robe, gently rocks back and forth in a cozy rocking chair positioned in the tranquil setting of a nursery. The dimly lit bedroom is adorned with whimsical mobiles dangling from the ceiling, casting shadows that dance on the walls. Her baby, swaddled in a delicate, patterned blanket, rests against her chest, the child's earlier cries now replaced by contented coos as the mother's soothing voice lulls the little one to sleep. The scent of lavender fills the air, adding to the serene atmosphere, while a warm, orange glow from a nearby nightlight illuminates the scene with a gentle hue, capturing a moment of tender love and comfort. Comparing the model with/without the ExVideo extension module, we found that the original model exhibited noticeable detail loss when generating long videos. The ExVideo extension module significantly enhances the detail of the videos.
Without ExVideo extension module
With ExVideo extension module