alibaba-pai
/

EasyAnimateV5-12b-zh-InP

Diffusers

Safetensors

EasyAnimatePipeline_Multi_Text_Encoder

Model card Files Files and versions Community

bubbliiiing commited on Nov 22

Commit

5dfb363

•

1 Parent(s): f625293

Update Readme

Browse files

Files changed (2) hide show

README.md +88 -32
README_en.md +93 -18

README.md CHANGED Viewed

@@ -30,21 +30,6 @@ tasks:
 #- vllm
 ---
-# EasyAnimate | 高分辨率长视频生成的端到端解决方案
-😊 EasyAnimate是一个用于生成高分辨率和长视频的端到端解决方案。我们可以训练基于转换器的扩散生成器，训练用于处理长视频的VAE，以及预处理元数据。
-😊 我们基于DIT，使用transformer进行作为扩散器进行视频与图片生成。
-😊 Welcome!
-[![Arxiv Page](https://img.shields.io/badge/Arxiv-Page-red)](https://arxiv.org/abs/2405.18991)
-[![Project Page](https://img.shields.io/badge/Project-Website-green)](https://easyanimate.github.io/)
-[![Modelscope Studio](https://img.shields.io/badge/Modelscope-Studio-blue)](https://modelscope.cn/studios/PAI/EasyAnimate/summary)
-[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/EasyAnimate)
-[![Discord Page](https://img.shields.io/badge/Discord-Page-blue)](https://discord.gg/UzkpB4Bn)
-[English](./README.md) | 简体中文
 # 目录
 - [目录](#目录)
 - [简介](#简介)
@@ -143,6 +128,39 @@ Linux 的详细信息：
 我们需要大约 60GB 的可用磁盘空间，请检查！
 #### b. 权重放置
 我们最好将[权重](#model-zoo)按照指定路径进行放置：
@@ -161,8 +179,7 @@ EasyAnimateV5:
 ### EasyAnimateV5-12b-zh-InP
-Resolution-1024
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -181,8 +198,6 @@ Resolution-1024
 </table>
-Resolution-768
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -200,8 +215,6 @@ Resolution-768
   </tr>
 </table>
-Resolution-512
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -219,6 +232,41 @@ Resolution-512
   </tr>
 </table>
 ### EasyAnimateV5-12b-zh-Control
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
@@ -364,6 +412,13 @@ sh scripts/train.sh
 # 模型地址
 EasyAnimateV5:
 | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
 | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持中文与英文双语预测 |
@@ -373,29 +428,29 @@ EasyAnimateV5:
 <details>
   <summary>(Obsolete) EasyAnimateV4:</summary>
-| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
 |--|--|--|--|--|--|
-| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV4-XL-2-InP.tar.gz) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| 官方的图生视频权重。支持多分辨率（512，768，1024，1280）的视频预测，以144帧、每秒24帧进行训练 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV3:</summary>
-| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
 |--|--|--|--|--|--|
-| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| 官方的512x512分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
-| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | 官方的768x768分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
-| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-960x960.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | 官方的960x960（720P）分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV2:</summary>
-| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
-|--|--|--|--|--|--|
-| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| 官方的512x512分辨率的重量。以144帧、每秒24帧进行训练 |
-| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | 官方的768x768分辨率的重量。以144帧、每秒24帧进行训练 |
-| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
 </details>
 <details>
@@ -426,6 +481,7 @@ EasyAnimateV5:
 # 参考文献
 - CogVideo: https://github.com/THUDM/CogVideo/
 - magvit: https://github.com/google-research/magvit
 - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
 - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan

 #- vllm
 ---
 # 目录
 - [目录](#目录)
 - [简介](#简介)
 我们需要大约 60GB 的可用磁盘空间，请检查！
+EasyAnimateV5-12B的视频大小可以由不同的GPU Memory生成，包括：
+| GPU memory |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
+|----------|----------|----------|----------|----------|----------|----------|
+| 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
+| 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
+| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+✅ 表示它可以在"model_cpu_offload"的情况下运行，🧡代表它可以在"model_cpu_offload_and_qfloat8"的情况下运行，⭕️ 表示它可以在"sequential_cpu_offload"的情况下运行，❌ 表示它无法运行。请注意，使用sequential_cpu_offload运行会更慢。
+有一些不支持torch.bfloat16的卡型，如2080ti、V100，需要将app.py、predict文件中的weight_dtype修改为torch.float16才可以运行。
+EasyAnimateV5-12B使用不同GPU在25个steps中的生成时间如下：
+| GPU |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
+|----------|----------|----------|----------|----------|----------|----------|
+| A10 24GB |约120秒 (4.8s/it)|约240秒 (9.6s/it)|约320秒 (12.7s/it)| 约750秒 (29.8s/it)| ❌ | ❌ |
+| A100 80GB |约45秒 (1.75s/it)|约90秒 (3.7s/it)|约120秒 (4.7s/it)|约300秒 (11.4s/it)|约265秒 (10.6s/it)| 约710秒 (28.3s/it)|
+(⭕️) 表示它可以在low_gpu_memory_mode=True的情况下运行，但速度较慢，同时❌ 表示它无法运行。
+<details>
+  <summary>(Obsolete) EasyAnimateV3:</summary>
+EasyAnimateV3的视频大小可以由不同的GPU Memory生成，包括：
+| GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
+|----------|----------|----------|----------|----------|----------|----------|
+| 12GB | ⭕️ | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
+| 16GB | ✅ | ✅ | ⭕️ | ⭕️ | ⭕️ | ❌ |
+| 24GB | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
+| 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+</details>
 #### b. 权重放置
 我们最好将[权重](#model-zoo)按照指定路径进行放置：
 ### EasyAnimateV5-12b-zh-InP
+#### I2V
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
 </table>
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
   </tr>
 </table>
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
   </tr>
 </table>
+#### T2V
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+      <td>
+          <video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
+      </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
+      </td>
+       <td>
+          <video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
+     </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
+     </td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+      <td>
+          <video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
+      </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
+      </td>
+       <td>
+          <video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
+     </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
+     </td>
+  </tr>
+</table>
 ### EasyAnimateV5-12b-zh-Control
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
 # 模型地址
 EasyAnimateV5:
+7B:
+| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
+|--|--|--|--|--|--|
+| EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP)| 官方的7B图生视频权重。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持中文与英文双语预测 |
+| EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh)| 官方的7B文生视频权重。可用于进行下游任务的fientune。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持中文与英文双语预测 |
+12B:
 | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
 | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率（512，768，1024）的视频预测，支持多分辨率（512，768，1024）的视频预测，以49帧、每秒8帧进行训练，支持中文与英文双语预测 |
 <details>
   <summary>(Obsolete) EasyAnimateV4:</summary>
+| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
+| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| 官方的图生视频权重。支持多分辨率（512，768，1024，1280）的视频预测，以144帧、每秒24帧进行训练 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV3:</summary>
+| 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
 |--|--|--|--|--|--|
+| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB| [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512)| 官方的512x512分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
+| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768)| 官方的768x768分辨��的图生视频权重。以144帧、每秒24帧进行训练 |
+| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960)| 官方的960x960（720P）分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV2:</summary>
+| 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | Model Scope | 描述 |
+|--|--|--|--|--|--|--|
+| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| 官方的512x512分辨率的重量。以144帧、每秒24帧进行训练 |
+| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| 官方的768x768分辨率的重量。以144帧、每秒24帧进行训练 |
+| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
 </details>
 <details>
 # 参考文献
 - CogVideo: https://github.com/THUDM/CogVideo/
+- Flux: https://github.com/black-forest-labs/flux
 - magvit: https://github.com/google-research/magvit
 - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
 - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan

README_en.md CHANGED Viewed

@@ -112,6 +112,41 @@ The detailed of Linux:
 - GPU：Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
 We need about 60GB available on disk (for saving weights), please check!
 #### b. Weights
 We'd better place the [weights](#model-zoo) along the specified path:
@@ -131,8 +166,7 @@ The results displayed are all based on image.
 ### EasyAnimateV5-12b-zh-InP
-Resolution-1024
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -151,8 +185,6 @@ Resolution-1024
 </table>
-Resolution-768
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -170,8 +202,6 @@ Resolution-768
   </tr>
 </table>
-Resolution-512
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
@@ -189,6 +219,41 @@ Resolution-512
   </tr>
 </table>
 ### EasyAnimateV5-12b-zh-Control
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
@@ -335,6 +400,13 @@ For details on setting some parameters, please refer to [Readme Train](scripts/R
 EasyAnimateV5:
 | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
 | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
@@ -344,28 +416,29 @@ EasyAnimateV5:
 <details>
   <summary>(Obsolete) EasyAnimateV4:</summary>
-| Name | Type | Storage Space | Url | Hugging Face | Description |
 |--|--|--|--|--|--|
-| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV4-XL-2-InP.tar.gz) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV3:</summary>
-| Name | Type | Storage Space | Url | Hugging Face | Description |
 |--|--|--|--|--|--|
-| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
-| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
-| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-960x960.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and  image to video resolution. Training with 144 frames and fps 24 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV2:</summary>
-| Name | Type | Storage Space | Url | Hugging Face | Description |
-|--|--|--|--|--|--|
-| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512) | EasyAnimateV2 official weights for 512x512 resolution. Training with 144 frames and fps 24 |
-| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | EasyAnimateV2 official weights for 768x768 resolution. Training with 144 frames and fps 24 |
-| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors) | - | A lora training with a specifial type images. Images can be downloaded from [Url](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v2/Minimalism.zip). |
 </details>
 <details>
@@ -397,6 +470,8 @@ EasyAnimateV5:
 # Reference
 - magvit: https://github.com/google-research/magvit
 - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
 - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
@@ -406,4 +481,4 @@ EasyAnimateV5:
 - HunYuan DiT: https://github.com/tencent/HunyuanDiT
 # License
-This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).

 - GPU：Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
 We need about 60GB available on disk (for saving weights), please check!
+The video size for EasyAnimateV5-12B can be generated by different GPU Memory, including:
+| GPU memory | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
+|------------|------------|------------|------------|------------|------------|------------|
+| 16GB       | 🧡         | 🧡         | ❌         | ❌         | ❌         | ❌         |
+| 24GB       | 🧡         | 🧡         | 🧡         | 🧡         | ❌         | ❌         |
+| 40GB       | ✅         | ✅         | ✅         | ✅         | ❌         | ❌         |
+| 80GB       | ✅         | ✅         | ✅         | ✅         | ✅         | ✅         |
+✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.
+Some GPUs that do not support torch.bfloat16, such as 2080ti and V100, require changing the weight_dtype in app.py and predict files to torch.float16 in order to run.
+The generation time for EasyAnimateV5-12B using different GPUs over 25 steps is as follows:
+| GPU       | 384x672x72       | 384x672x49       | 576x1008x25      | 576x1008x49      | 768x1344x25      | 768x1344x49     |
+|-----------|------------------|------------------|------------------|------------------|------------------|-----------------|
+| A10 24GB  | ~120s (4.8s/it)  | ~240s (9.6s/it)  | ~320s (12.7s/it) | ~750s (29.8s/it) | ❌               | ❌              |
+| A100 80GB | ~45s (1.75s/it)  | ~90s (3.7s/it)   | ~120s (4.7s/it)  | ~300s (11.4s/it) | ~265s (10.6s/it) | ~710s (28.3s/it) |
+(⭕️) indicates it can run with low_gpu_memory_mode=True, but at a slower speed, and ❌ means it can't run.
+<details>
+  <summary>(Obsolete) EasyAnimateV3:</summary>
+The video size for EasyAnimateV3 can be generated by different GPU Memory, including:
+| GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
+|------------|------------|-------------|-------------|--------------|-------------|--------------|
+| 12GB       | ⭕️         | ⭕️          | ⭕️          | ⭕️           | ❌          | ❌           |
+| 16GB       | ✅         | ✅          | ⭕️          | ⭕️           | ⭕️          | ❌           |
+| 24GB       | ✅         | ✅          | ✅          | ✅           | ✅          | ❌           |
+| 40GB       | ✅         | ✅          | ✅          | ✅           | ✅          | ✅           |
+| 80GB       | ✅         | ✅          | ✅          | ✅           | ✅          | ✅           |
+</details>
 #### b. Weights
 We'd better place the [weights](#model-zoo) along the specified path:
 ### EasyAnimateV5-12b-zh-InP
+#### I2V
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
 </table>
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
   </tr>
 </table>
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
   <tr>
       <td>
   </tr>
 </table>
+#### T2V
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+      <td>
+          <video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
+      </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
+      </td>
+       <td>
+          <video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
+     </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
+     </td>
+  </tr>
+</table>
+<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
+  <tr>
+      <td>
+          <video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
+      </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
+      </td>
+       <td>
+          <video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
+     </td>
+      <td>
+          <video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
+     </td>
+  </tr>
+</table>
 ### EasyAnimateV5-12b-zh-Control
 <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
 EasyAnimateV5:
+7B:
+| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
+|--|--|--|--|--|--|
+| EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP) | Official 7B image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
+| EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh) | Official 7B text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
+12B:
 | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
 | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
 <details>
   <summary>(Obsolete) EasyAnimateV4:</summary>
+| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
+| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB |[🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV3:</summary>
+| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
 |--|--|--|--|--|--|
+| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
+| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
+| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and  image to video resolution. Training with 144 frames and fps 24 |
 </details>
 <details>
   <summary>(Obsolete) EasyAnimateV2:</summary>
+| Name | Type | Storage Space | Url | Hugging Face | Model Scope | Description |
+|--|--|--|--|--|--|--|
+| EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| EasyAnimateV2 official weights for 512x512 resolution. Training with 144 frames and fps 24 |
+| EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| EasyAnimateV2 official weights for 768x768 resolution. Training with 144 frames and fps 24 |
+| easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | A lora training with a specifial type images. Images can be downloaded from [Url](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v2/Minimalism.zip). |
 </details>
 <details>
 # Reference
+- CogVideo: https://github.com/THUDM/CogVideo/
+- Flux: https://github.com/black-forest-labs/flux
 - magvit: https://github.com/google-research/magvit
 - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
 - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
 - HunYuan DiT: https://github.com/tencent/HunyuanDiT
 # License
+This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).