--- license: apache-2.0 base_model: - THUDM/CogVideoX-5b language: - en tags: - video-generation - paddlemix --- 简体中文 | [English](README.md) # VCtrl

🤗 Huggingface Space | 🌐 Github | 📜 arxiv | 📷 Project

## 模型介绍 **VCtrl** 是一个通用的视频生成控制模型,通过引入辅助条件编码器,能够灵活对接各类控制模块,并且在不改变原始生成器的前提下避免了大规模重训练。该模型利用稀疏残差连接实现对控制信号的高效传递,同时通过统一的条件编码流程,将多种控制输入转换为标准化表示,再结合任务特定掩码以提升适应性。得益于这种统一而灵活的设计,VCtrl 可广泛应用于**人物动画**、**场景转换**、**视频编辑**等视频生成场景。下表展示我们在本代提供的视频生成模型列表相关信息:
模型名 VCtrl-Canny VCtrl-Mask VCtrl-Pose
视频分辨率 720 * 480 720*480 720*480 & 480*720
推理精度 FP16(推荐)
单GPU显存消耗 V100: 32GB minimum*
推理速度
(Step = 25, FP16)
单卡A100: ~300秒(49帧)
单卡V100: ~400秒(49帧)
提示词语言 English*
提示词长度上限 224 Tokens
视频长度 T2V模型只支持49帧,I2V模型可以扩展为任意帧
帧率 30 帧 / 秒
## 快速开始 🤗 本模型已经支持使用 paddlemix 的 ppdiffusers 库进行部署,你可以按照以下步骤进行部署。 **我们推荐您进入我们的 [github](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl)以获得更好的体验。** 1. 安装对应的依赖 ```shell # 克隆 PaddleMIX 仓库 git clone https://github.com/PaddlePaddle/PaddleMIX.git #安装paddlemix cd PaddleMIX pip install -e . # 安装ppdiffusers pip install -e ppdiffusers # 安装paddlenlp pip install paddlenlp==v3.0.0-beta2 # 进入 vctrl目录 cd ppdiffusers/examples/ppvctrl # 安装其他所需的依赖 pip install -r requirements.txt #安装paddlex pip install paddlex==3.0.0b2 ``` 2. 运行代码 ```python import os import paddle import numpy as np from decord import VideoReader from moviepy.editor import ImageSequenceClip from PIL import Image from ppdiffusers import ( CogVideoXDDIMScheduler, CogVideoXTransformer3DVCtrlModel, CogVideoXVCtrlPipeline, VCtrlModel, ) def write_mp4(video_path, samples, fps=8): clip = ImageSequenceClip(samples, fps=fps) clip.write_videofile(video_path, audio_codec="aac") def save_vid_side_by_side(batch_output, validation_control_images, output_folder, fps): flattened_batch_output = [img for sublist in batch_output for img in sublist] ori_video_path = output_folder + "/origin_predict.mp4" video_path = output_folder + "/test_1.mp4" ori_final_images = [] final_images = [] outputs = [] def get_concat_h(im1, im2): dst = Image.new("RGB", (im1.width + im2.width, max(im1.height, im2.height))) dst.paste(im1, (0, 0)) dst.paste(im2, (im1.width, 0)) return dst for image_list in zip(validation_control_images, flattened_batch_output): predict_img = image_list[1].resize(image_list[0].size) result = get_concat_h(image_list[0], predict_img) ori_final_images.append(np.array(image_list[1])) final_images.append(np.array(result)) outputs.append(np.array(predict_img)) write_mp4(ori_video_path, ori_final_images, fps=fps) write_mp4(video_path, final_images, fps=fps) output_path = output_folder + "/output.mp4" write_mp4(output_path, outputs, fps=fps) def load_images_from_folder_to_pil(folder): images = [] valid_extensions = {".jpg", ".jpeg", ".png", ".bmp", ".gif", ".tiff"} def frame_number(filename): new_pattern_match = re.search("frame_(\\d+)_7fps", filename) if new_pattern_match: return int(new_pattern_match.group(1)) matches = re.findall("\\d+", filename) if matches: if matches[-1] == "0000" and len(matches) > 1: return int(matches[-2]) return int(matches[-1]) return float("inf") sorted_files = sorted(os.listdir(folder), key=frame_number) for filename in sorted_files: ext = os.path.splitext(filename)[1].lower() if ext in valid_extensions: img = Image.open(os.path.join(folder, filename)).convert("RGB") images.append(img) return images def load_images_from_video_to_pil(video_path): images = [] vr = VideoReader(video_path) length = len(vr) for idx in range(length): frame = vr[idx].asnumpy() images.append(Image.fromarray(frame)) return images validation_control_images = load_images_from_video_to_pil('your_path') prompt = 'Group of fishes swimming in aquarium.' vctrl = VCtrlModel.from_pretrained( paddlemix/vctrl-5b-t2v-canny, low_cpu_mem_usage=True, paddle_dtype=paddle.float16 ) pipeline = CogVideoXVCtrlPipeline.from_pretrained( paddlemix/cogvideox-5b-vctrl, vctrl=vctrl, paddle_dtype=paddle.float16, low_cpu_mem_usage=True, map_location="cpu", ) pipeline.scheduler = CogVideoXDDIMScheduler.from_config(pipeline.scheduler.config, timestep_spacing="trailing") pipeline.vae.enable_tiling() pipeline.vae.enable_slicing() task='canny' final_result=[] video = pipeline( prompt=prompt, num_inference_steps=25, num_frames=49, guidance_scale=35, generator=paddle.Generator().manual_seed(42), conditioning_frames=validation_control_images[:num_frames], conditioning_frame_indices=list(range(num_frames)), conditioning_scale=1.0, width=720, height=480, task='canny', conditioning_masks=validation_mask_images[:num_frames] if task == "mask" else None, vctrl_layout_type='spacing', ).frames[0] final_result.append(video) save_vid_side_by_side(final_result, validation_control_images[:num_frames], 'save.mp4', fps=30) ``` ## 深入研究 欢迎进入我们的 [github]("https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl"),你将获得: 1. 更加详细的技术细节介绍和代码解释。 2. 控制条件的提取算法细节。 3. 模型推理的详细代码。 4. 项目更新日志动态,更多互动机会。 5. PaddleMix工具链,帮助您更好的使用模型。