bubbliiiing commited on
Commit
5dfb363
1 Parent(s): f625293

Update Readme

Browse files
Files changed (2) hide show
  1. README.md +88 -32
  2. README_en.md +93 -18
README.md CHANGED
@@ -30,21 +30,6 @@ tasks:
30
  #- vllm
31
  ---
32
 
33
- # EasyAnimate | 高分辨率长视频生成的端到端解决方案
34
- 😊 EasyAnimate是一个用于生成高分辨率和长视频的端到端解决方案。我们可以训练基于转换器的扩散生成器,训练用于处理长视频的VAE,以及预处理元数据。
35
-
36
- 😊 我们基于DIT,使用transformer进行作为扩散器进行视频与图片生成。
37
-
38
- 😊 Welcome!
39
-
40
- [![Arxiv Page](https://img.shields.io/badge/Arxiv-Page-red)](https://arxiv.org/abs/2405.18991)
41
- [![Project Page](https://img.shields.io/badge/Project-Website-green)](https://easyanimate.github.io/)
42
- [![Modelscope Studio](https://img.shields.io/badge/Modelscope-Studio-blue)](https://modelscope.cn/studios/PAI/EasyAnimate/summary)
43
- [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/EasyAnimate)
44
- [![Discord Page](https://img.shields.io/badge/Discord-Page-blue)](https://discord.gg/UzkpB4Bn)
45
-
46
- [English](./README.md) | 简体中文
47
-
48
  # 目录
49
  - [目录](#目录)
50
  - [简介](#简介)
@@ -143,6 +128,39 @@ Linux 的详细信息:
143
 
144
  我们需要大约 60GB 的可用磁盘空间,请检查!
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  #### b. 权重放置
147
  我们最好将[权重](#model-zoo)按照指定路径进行放置:
148
 
@@ -161,8 +179,7 @@ EasyAnimateV5:
161
 
162
  ### EasyAnimateV5-12b-zh-InP
163
 
164
- Resolution-1024
165
-
166
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
167
  <tr>
168
  <td>
@@ -181,8 +198,6 @@ Resolution-1024
181
  </table>
182
 
183
 
184
- Resolution-768
185
-
186
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
187
  <tr>
188
  <td>
@@ -200,8 +215,6 @@ Resolution-768
200
  </tr>
201
  </table>
202
 
203
- Resolution-512
204
-
205
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
206
  <tr>
207
  <td>
@@ -219,6 +232,41 @@ Resolution-512
219
  </tr>
220
  </table>
221
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
  ### EasyAnimateV5-12b-zh-Control
223
 
224
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
@@ -364,6 +412,13 @@ sh scripts/train.sh
364
  # 模型地址
365
  EasyAnimateV5:
366
 
 
 
 
 
 
 
 
367
  | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
368
  |--|--|--|--|--|--|
369
  | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
@@ -373,29 +428,29 @@ EasyAnimateV5:
373
  <details>
374
  <summary>(Obsolete) EasyAnimateV4:</summary>
375
 
376
- | 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
377
  |--|--|--|--|--|--|
378
- | EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV4-XL-2-InP.tar.gz) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以144帧、每秒24帧进行训练 |
379
  </details>
380
 
381
  <details>
382
  <summary>(Obsolete) EasyAnimateV3:</summary>
383
 
384
- | 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
385
  |--|--|--|--|--|--|
386
- | EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| 官方的512x512分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
387
- | EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | 官方的768x768分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
388
- | EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-960x960.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | 官方的960x960(720P)分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
389
  </details>
390
 
391
  <details>
392
  <summary>(Obsolete) EasyAnimateV2:</summary>
393
 
394
- | 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | 描述 |
395
- |--|--|--|--|--|--|
396
- | EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| 官方的512x512分辨率的重量。以144帧、每秒24帧进行训练 |
397
- | EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | 官方的768x768分辨率的重量。以144帧、每秒24帧进行训练 |
398
- | easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
399
  </details>
400
 
401
  <details>
@@ -426,6 +481,7 @@ EasyAnimateV5:
426
 
427
  # 参考文献
428
  - CogVideo: https://github.com/THUDM/CogVideo/
 
429
  - magvit: https://github.com/google-research/magvit
430
  - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
431
  - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
 
30
  #- vllm
31
  ---
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  # 目录
34
  - [目录](#目录)
35
  - [简介](#简介)
 
128
 
129
  我们需要大约 60GB 的可用磁盘空间,请检查!
130
 
131
+ EasyAnimateV5-12B的视频大小可以由不同的GPU Memory生成,包括:
132
+ | GPU memory |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
133
+ |----------|----------|----------|----------|----------|----------|----------|
134
+ | 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
135
+ | 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
136
+ | 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
137
+ | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
138
+
139
+ ✅ 表示它可以在"model_cpu_offload"的情况下运行,🧡代表它可以在"model_cpu_offload_and_qfloat8"的情况下运行,⭕️ 表示它可以在"sequential_cpu_offload"的情况下运行,❌ 表示它无法运行。请注意,使用sequential_cpu_offload运行会更慢。
140
+
141
+ 有一些不支持torch.bfloat16的卡型,如2080ti、V100,需要将app.py、predict文件中的weight_dtype修改为torch.float16才可以运行。
142
+
143
+ EasyAnimateV5-12B使用不同GPU在25个steps中的生成时间如下:
144
+ | GPU |384x672x72|384x672x49|576x1008x25|576x1008x49|768x1344x25|768x1344x49|
145
+ |----------|----------|----------|----------|----------|----------|----------|
146
+ | A10 24GB |约120秒 (4.8s/it)|约240秒 (9.6s/it)|约320秒 (12.7s/it)| 约750秒 (29.8s/it)| ❌ | ❌ |
147
+ | A100 80GB |约45秒 (1.75s/it)|约90秒 (3.7s/it)|约120秒 (4.7s/it)|约300秒 (11.4s/it)|约265秒 (10.6s/it)| 约710秒 (28.3s/it)|
148
+
149
+ (⭕️) 表示它可以在low_gpu_memory_mode=True的情况下运行,但速度较慢,同时❌ 表示它无法运行。
150
+
151
+ <details>
152
+ <summary>(Obsolete) EasyAnimateV3:</summary>
153
+
154
+ EasyAnimateV3的视频大小可以由不同的GPU Memory生成,包括:
155
+ | GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
156
+ |----------|----------|----------|----------|----------|----------|----------|
157
+ | 12GB | ⭕️ | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
158
+ | 16GB | ✅ | ✅ | ⭕️ | ⭕️ | ⭕️ | ❌ |
159
+ | 24GB | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
160
+ | 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
161
+ | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
162
+ </details>
163
+
164
  #### b. 权重放置
165
  我们最好将[权重](#model-zoo)按照指定路径进行放置:
166
 
 
179
 
180
  ### EasyAnimateV5-12b-zh-InP
181
 
182
+ #### I2V
 
183
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
184
  <tr>
185
  <td>
 
198
  </table>
199
 
200
 
 
 
201
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
202
  <tr>
203
  <td>
 
215
  </tr>
216
  </table>
217
 
 
 
218
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
219
  <tr>
220
  <td>
 
232
  </tr>
233
  </table>
234
 
235
+ #### T2V
236
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
237
+ <tr>
238
+ <td>
239
+ <video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
240
+ </td>
241
+ <td>
242
+ <video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
243
+ </td>
244
+ <td>
245
+ <video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
246
+ </td>
247
+ <td>
248
+ <video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
249
+ </td>
250
+ </tr>
251
+ </table>
252
+
253
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
254
+ <tr>
255
+ <td>
256
+ <video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
257
+ </td>
258
+ <td>
259
+ <video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
260
+ </td>
261
+ <td>
262
+ <video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
263
+ </td>
264
+ <td>
265
+ <video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
266
+ </td>
267
+ </tr>
268
+ </table>
269
+
270
  ### EasyAnimateV5-12b-zh-Control
271
 
272
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
 
412
  # 模型地址
413
  EasyAnimateV5:
414
 
415
+ 7B:
416
+ | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
417
+ |--|--|--|--|--|--|
418
+ | EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP)| 官方的7B图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
419
+ | EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh)| 官方的7B文生视频权重。可用于进行下游任务的fientune。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
420
+
421
+ 12B:
422
  | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
423
  |--|--|--|--|--|--|
424
  | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024)的视频预测,支持多分辨率(512,768,1024)的视频预测,以49帧、每秒8帧进行训练,支持中文与英文双语预测 |
 
428
  <details>
429
  <summary>(Obsolete) EasyAnimateV4:</summary>
430
 
431
+ | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
432
  |--|--|--|--|--|--|
433
+ | EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | 解压前 8.9 GB / 解压后 14.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以144帧、每秒24帧进行训练 |
434
  </details>
435
 
436
  <details>
437
  <summary>(Obsolete) EasyAnimateV3:</summary>
438
 
439
+ | 名称 | 种类 | 存储空间 | Hugging Face | Model Scope | 描述 |
440
  |--|--|--|--|--|--|
441
+ | EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB| [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512)| 官方的512x512分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
442
+ | EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768)| 官方的768x768分辨��的图生视频权重。以144帧、每秒24帧进行训练 |
443
+ | EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960)| 官方的960x960(720P)分辨率的图生视频权重。以144帧、每秒24帧进行训练 |
444
  </details>
445
 
446
  <details>
447
  <summary>(Obsolete) EasyAnimateV2:</summary>
448
 
449
+ | 名称 | 种类 | 存储空间 | 下载地址 | Hugging Face | Model Scope | 描述 |
450
+ |--|--|--|--|--|--|--|
451
+ | EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| 官方的512x512分辨率的重量。以144帧、每秒24帧进行训练 |
452
+ | EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| 官方的768x768分辨率的重量。以144帧、每秒24帧进行训练 |
453
+ | easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | 使用特定类型的图像进行lora训练的结果。图片可从这里[下载](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/webui/Minimalism.zip). |
454
  </details>
455
 
456
  <details>
 
481
 
482
  # 参考文献
483
  - CogVideo: https://github.com/THUDM/CogVideo/
484
+ - Flux: https://github.com/black-forest-labs/flux
485
  - magvit: https://github.com/google-research/magvit
486
  - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
487
  - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
README_en.md CHANGED
@@ -112,6 +112,41 @@ The detailed of Linux:
112
  - GPU:Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
113
 
114
  We need about 60GB available on disk (for saving weights), please check!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
  #### b. Weights
117
  We'd better place the [weights](#model-zoo) along the specified path:
@@ -131,8 +166,7 @@ The results displayed are all based on image.
131
 
132
  ### EasyAnimateV5-12b-zh-InP
133
 
134
- Resolution-1024
135
-
136
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
137
  <tr>
138
  <td>
@@ -151,8 +185,6 @@ Resolution-1024
151
  </table>
152
 
153
 
154
- Resolution-768
155
-
156
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
157
  <tr>
158
  <td>
@@ -170,8 +202,6 @@ Resolution-768
170
  </tr>
171
  </table>
172
 
173
- Resolution-512
174
-
175
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
176
  <tr>
177
  <td>
@@ -189,6 +219,41 @@ Resolution-512
189
  </tr>
190
  </table>
191
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
  ### EasyAnimateV5-12b-zh-Control
193
 
194
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
@@ -335,6 +400,13 @@ For details on setting some parameters, please refer to [Readme Train](scripts/R
335
 
336
  EasyAnimateV5:
337
 
 
 
 
 
 
 
 
338
  | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
339
  |--|--|--|--|--|--|
340
  | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
@@ -344,28 +416,29 @@ EasyAnimateV5:
344
  <details>
345
  <summary>(Obsolete) EasyAnimateV4:</summary>
346
 
347
- | Name | Type | Storage Space | Url | Hugging Face | Description |
348
  |--|--|--|--|--|--|
349
- | EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV4-XL-2-InP.tar.gz) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
350
  </details>
351
 
352
  <details>
353
  <summary>(Obsolete) EasyAnimateV3:</summary>
354
 
355
- | Name | Type | Storage Space | Url | Hugging Face | Description |
356
  |--|--|--|--|--|--|
357
- | EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
358
- | EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
359
- | EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-960x960.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and image to video resolution. Training with 144 frames and fps 24 |
360
  </details>
361
 
362
  <details>
363
  <summary>(Obsolete) EasyAnimateV2:</summary>
364
- | Name | Type | Storage Space | Url | Hugging Face | Description |
365
- |--|--|--|--|--|--|
366
- | EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512) | EasyAnimateV2 official weights for 512x512 resolution. Training with 144 frames and fps 24 |
367
- | EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV2-XL-2-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | EasyAnimateV2 official weights for 768x768 resolution. Training with 144 frames and fps 24 |
368
- | easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors) | - | A lora training with a specifial type images. Images can be downloaded from [Url](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v2/Minimalism.zip). |
 
369
  </details>
370
 
371
  <details>
@@ -397,6 +470,8 @@ EasyAnimateV5:
397
 
398
 
399
  # Reference
 
 
400
  - magvit: https://github.com/google-research/magvit
401
  - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
402
  - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
@@ -406,4 +481,4 @@ EasyAnimateV5:
406
  - HunYuan DiT: https://github.com/tencent/HunyuanDiT
407
 
408
  # License
409
- This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).
 
112
  - GPU:Nvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
113
 
114
  We need about 60GB available on disk (for saving weights), please check!
115
+ The video size for EasyAnimateV5-12B can be generated by different GPU Memory, including:
116
+
117
+ | GPU memory | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
118
+ |------------|------------|------------|------------|------------|------------|------------|
119
+ | 16GB | 🧡 | 🧡 | ❌ | ❌ | ❌ | ❌ |
120
+ | 24GB | 🧡 | 🧡 | 🧡 | 🧡 | ❌ | ❌ |
121
+ | 40GB | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
122
+ | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
123
+
124
+ ✅ indicates it can run under "model_cpu_offload", 🧡 represents it can run under "model_cpu_offload_and_qfloat8", ⭕️ indicates it can run under "sequential_cpu_offload", ❌ means it can't run. Please note that running with sequential_cpu_offload will be slower.
125
+
126
+ Some GPUs that do not support torch.bfloat16, such as 2080ti and V100, require changing the weight_dtype in app.py and predict files to torch.float16 in order to run.
127
+
128
+ The generation time for EasyAnimateV5-12B using different GPUs over 25 steps is as follows:
129
+
130
+ | GPU | 384x672x72 | 384x672x49 | 576x1008x25 | 576x1008x49 | 768x1344x25 | 768x1344x49 |
131
+ |-----------|------------------|------------------|------------------|------------------|------------------|-----------------|
132
+ | A10 24GB | ~120s (4.8s/it) | ~240s (9.6s/it) | ~320s (12.7s/it) | ~750s (29.8s/it) | ❌ | ❌ |
133
+ | A100 80GB | ~45s (1.75s/it) | ~90s (3.7s/it) | ~120s (4.7s/it) | ~300s (11.4s/it) | ~265s (10.6s/it) | ~710s (28.3s/it) |
134
+
135
+ (⭕️) indicates it can run with low_gpu_memory_mode=True, but at a slower speed, and ❌ means it can't run.
136
+
137
+ <details>
138
+ <summary>(Obsolete) EasyAnimateV3:</summary>
139
+
140
+ The video size for EasyAnimateV3 can be generated by different GPU Memory, including:
141
+
142
+ | GPU memory | 384x672x72 | 384x672x144 | 576x1008x72 | 576x1008x144 | 720x1280x72 | 720x1280x144 |
143
+ |------------|------------|-------------|-------------|--------------|-------------|--------------|
144
+ | 12GB | ⭕️ | ⭕️ | ⭕️ | ⭕️ | ❌ | ❌ |
145
+ | 16GB | ✅ | ✅ | ⭕️ | ⭕️ | ⭕️ | ❌ |
146
+ | 24GB | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
147
+ | 40GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
148
+ | 80GB | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
149
+ </details>
150
 
151
  #### b. Weights
152
  We'd better place the [weights](#model-zoo) along the specified path:
 
166
 
167
  ### EasyAnimateV5-12b-zh-InP
168
 
169
+ #### I2V
 
170
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
171
  <tr>
172
  <td>
 
185
  </table>
186
 
187
 
 
 
188
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
189
  <tr>
190
  <td>
 
202
  </tr>
203
  </table>
204
 
 
 
205
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
206
  <tr>
207
  <td>
 
219
  </tr>
220
  </table>
221
 
222
+ #### T2V
223
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
224
+ <tr>
225
+ <td>
226
+ <video src="https://github.com/user-attachments/assets/eccb0797-4feb-48e9-91d3-5769ce30142b" width="100%" controls autoplay loop></video>
227
+ </td>
228
+ <td>
229
+ <video src="https://github.com/user-attachments/assets/76b3db64-9c7a-4d38-8854-dba940240ceb" width="100%" controls autoplay loop></video>
230
+ </td>
231
+ <td>
232
+ <video src="https://github.com/user-attachments/assets/0b8fab66-8de7-44ff-bd43-8f701bad6bb7" width="100%" controls autoplay loop></video>
233
+ </td>
234
+ <td>
235
+ <video src="https://github.com/user-attachments/assets/9fbddf5f-7fcd-4cc6-9d7c-3bdf1d4ce59e" width="100%" controls autoplay loop></video>
236
+ </td>
237
+ </tr>
238
+ </table>
239
+
240
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
241
+ <tr>
242
+ <td>
243
+ <video src="https://github.com/user-attachments/assets/19c1742b-e417-45ac-97d6-8bf3a80d8e13" width="100%" controls autoplay loop></video>
244
+ </td>
245
+ <td>
246
+ <video src="https://github.com/user-attachments/assets/641e56c8-a3d9-489d-a3a6-42c50a9aeca1" width="100%" controls autoplay loop></video>
247
+ </td>
248
+ <td>
249
+ <video src="https://github.com/user-attachments/assets/2b16be76-518b-44c6-a69b-5c49d76df365" width="100%" controls autoplay loop></video>
250
+ </td>
251
+ <td>
252
+ <video src="https://github.com/user-attachments/assets/e7d9c0fc-136f-405c-9fab-629389e196be" width="100%" controls autoplay loop></video>
253
+ </td>
254
+ </tr>
255
+ </table>
256
+
257
  ### EasyAnimateV5-12b-zh-Control
258
 
259
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
 
400
 
401
  EasyAnimateV5:
402
 
403
+ 7B:
404
+ | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
405
+ |--|--|--|--|--|--|
406
+ | EasyAnimateV5-7b-zh-InP | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh-InP) | Official 7B image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
407
+ | EasyAnimateV5-7b-zh | EasyAnimateV5 | 22 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-7b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-7b-zh) | Official 7B text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
408
+
409
+ 12B:
410
  | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
411
  |--|--|--|--|--|--|
412
  | EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
 
416
  <details>
417
  <summary>(Obsolete) EasyAnimateV4:</summary>
418
 
419
+ | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
420
  |--|--|--|--|--|--|
421
+ | EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB |[🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
422
  </details>
423
 
424
  <details>
425
  <summary>(Obsolete) EasyAnimateV3:</summary>
426
 
427
+ | Name | Type | Storage Space | Hugging Face | Model Scope | Description |
428
  |--|--|--|--|--|--|
429
+ | EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
430
+ | EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
431
+ | EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and image to video resolution. Training with 144 frames and fps 24 |
432
  </details>
433
 
434
  <details>
435
  <summary>(Obsolete) EasyAnimateV2:</summary>
436
+
437
+ | Name | Type | Storage Space | Url | Hugging Face | Model Scope | Description |
438
+ |--|--|--|--|--|--|--|
439
+ | EasyAnimateV2-XL-2-512x512.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-512x512)| EasyAnimateV2 official weights for 512x512 resolution. Training with 144 frames and fps 24 |
440
+ | EasyAnimateV2-XL-2-768x768.tar | EasyAnimateV2 | 16.2GB | - | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV2-XL-2-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV2-XL-2-768x768)| EasyAnimateV2 official weights for 768x768 resolution. Training with 144 frames and fps 24 |
441
+ | easyanimatev2_minimalism_lora.safetensors | Lora of Pixart | 485.1MB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Personalized_Model/easyanimatev2_minimalism_lora.safetensors)| - | - | A lora training with a specifial type images. Images can be downloaded from [Url](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v2/Minimalism.zip). |
442
  </details>
443
 
444
  <details>
 
470
 
471
 
472
  # Reference
473
+ - CogVideo: https://github.com/THUDM/CogVideo/
474
+ - Flux: https://github.com/black-forest-labs/flux
475
  - magvit: https://github.com/google-research/magvit
476
  - PixArt: https://github.com/PixArt-alpha/PixArt-alpha
477
  - Open-Sora-Plan: https://github.com/PKU-YuanGroup/Open-Sora-Plan
 
481
  - HunYuan DiT: https://github.com/tencent/HunyuanDiT
482
 
483
  # License
484
+ This project is licensed under the [Apache License (Version 2.0)](https://github.com/modelscope/modelscope/blob/master/LICENSE).