maxin-cn Fabrice-TIERCELIN commited on
Commit
2964625
β€’
1 Parent(s): 10a9008

This PR adds a description & tags (#3)

Browse files

- This PR adds a description & tags (6c6202960a0d4ada88cca5d1e2438d840a83712b)


Co-authored-by: Fabrice TIERCELIN <[email protected]>

Files changed (1) hide show
  1. README.md +141 -133
README.md CHANGED
@@ -1,133 +1,141 @@
1
- ---
2
- title: Cinemo
3
- app_file: demo.py
4
- sdk: gradio
5
- sdk_version: 4.37.2
6
- ---
7
- ## Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models<br><sub>Official PyTorch Implementation</sub>
8
-
9
-
10
- [![Arxiv](https://img.shields.io/badge/Arxiv-b31b1b.svg)](https://arxiv.org/abs/2407.15642)
11
- [![Project Page](https://img.shields.io/badge/Project-Website-blue)](https://maxin-cn.github.io/cinemo_project/)
12
-
13
-
14
- This repo contains pre-trained weights, and sampling code for our paper exploring image animation with motion diffusion models (Cinemo). You can find more visualizations on our [project page](https://maxin-cn.github.io/cinemo_project/).
15
-
16
- In this project, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency and smoothness. To improve motion smoothness, Cinemo learns the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, a structural similarity index-based method is proposed to control the motion intensity. Furthermore, we propose a noise refinement technique based on discrete cosine transformation to ensure temporal consistency. These three methods help Cinemo generate highly consistent, smooth, and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance.
17
-
18
- <div align="center">
19
- <img src="visuals/pipeline.svg">
20
- </div>
21
-
22
- ## News
23
-
24
- - (πŸ”₯ New) Jul. 23, 2024. πŸ’₯ Our paper is released on [arxiv](https://arxiv.org/abs/2407.15642).
25
-
26
- - (πŸ”₯ New) Jun. 2, 2024. πŸ’₯ The inference code is released. The checkpoint can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main).
27
-
28
-
29
- ## Setup
30
-
31
- First, download and set up the repo:
32
-
33
- ```bash
34
- git clone https://github.com/maxin-cn/Cinemo
35
- cd Cinemo
36
- ```
37
-
38
- We provide an [`environment.yml`](environment.yml) file that can be used to create a Conda environment. If you only want
39
- to run pre-trained models locally on CPU, you can remove the `cudatoolkit` and `pytorch-cuda` requirements from the file.
40
-
41
- ```bash
42
- conda env create -f environment.yml
43
- conda activate cinemo
44
- ```
45
-
46
-
47
- ## Animation
48
-
49
- You can sample from our **pre-trained Cinemo models** with [`animation.py`](pipelines/animation.py). Weights for our pre-trained Cinemo model can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main). The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:
50
-
51
- ```bash
52
- bash pipelines/animation.sh
53
- ```
54
-
55
- All related checkpoints will download automatically and then you will get the following results,
56
-
57
- <table style="width:100%; text-align:center;">
58
- <tr>
59
- <td align="center">Input image</td>
60
- <td align="center">Output video</td>
61
- <td align="center">Input image</td>
62
- <td align="center">Output video</td>
63
- </tr>
64
- <tr>
65
- <td align="center"><img src="visuals/animations/people_walking/0.jpg" width="100%"></td>
66
- <td align="center"><img src="visuals/animations/people_walking/people_walking.gif" width="100%"></td>
67
- <td align="center"><img src="visuals/animations/sea_swell/0.jpg" width="100%"></td>
68
- <td align="center"><img src="visuals/animations/sea_swell/sea_swell.gif" width="100%"></td>
69
- </tr>
70
- <tr>
71
- <td align="center" colspan="2">"People Walking"</td>
72
- <td align="center" colspan="2">"Sea Swell"</td>
73
- </tr>
74
- <tr>
75
- <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/0.jpg" width="100%"></td>
76
- <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/girl_dancing_under_the_stars.gif" width="100%"></td>
77
- <td align="center"><img src="visuals/animations/dragon_glowing_eyes/0.jpg" width="100%"></td>
78
- <td align="center"><img src="visuals/animations/dragon_glowing_eyes/dragon_glowing_eyes.gif" width="100%"></td>
79
- </tr>
80
- <tr>
81
- <td align="center" colspan="2">"Girl Dancing under the Stars"</td>
82
- <td align="center" colspan="2">"Dragon Glowing Eyes"</td>
83
- </tr>
84
-
85
- </table>
86
-
87
-
88
- ## Other Applications
89
-
90
- You can also utilize Cinemo for other applications, such as motion transfer and video editing:
91
-
92
- ```bash
93
- bash pipelines/video_editing.sh
94
- ```
95
-
96
- All related checkpoints will download automatically and you will get the following results,
97
-
98
- <table style="width:100%; text-align:center;">
99
- <tr>
100
- <td align="center">Input video</td>
101
- <td align="center">First frame</td>
102
- <td align="center">Edited first frame</td>
103
- <td align="center">Output video</td>
104
- </tr>
105
- <tr>
106
- <td align="center"><img src="visuals/video_editing/origin/a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
107
- <td align="center"><img src="visuals/video_editing/origin/0.jpg" width="100%"></td>
108
- <td align="center"><img src="visuals/video_editing/edit/0.jpg" width="100%"></td>
109
- <td align="center"><img src="visuals/video_editing/edit/editing_a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
110
- </tr>
111
-
112
- </table>
113
-
114
-
115
-
116
- ## Citation
117
- If you find this work useful for your research, please consider citing it.
118
- ```bibtex
119
- @article{ma2024cinemo,
120
- title={Cinemo: Latent Diffusion Transformer for Video Generation},
121
- author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
122
- journal={arXiv preprint arXiv:2407.15642},
123
- year={2024}
124
- }
125
- ```
126
-
127
-
128
- ## Acknowledgments
129
- Cinemo has been greatly inspired by the following amazing works and teams: [LaVie](https://github.com/Vchitect/LaVie) and [SEINE](https://github.com/Vchitect/SEINE), we thank all the contributors for open-sourcing.
130
-
131
-
132
- ## License
133
- The code and model weights are licensed under [LICENSE](LICENSE).
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Cinemo
3
+ app_file: demo.py
4
+ sdk: gradio
5
+ sdk_version: 4.39.0
6
+ tags:
7
+ - Image-2-Video
8
+ - LLM
9
+ - Large Language Model
10
+ short_description: Multimodal Image-to-Video
11
+ emoji: πŸŽ₯
12
+ colorFrom: green
13
+ colorTo: indigo
14
+ ---
15
+ ## Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models<br><sub>Official PyTorch Implementation</sub>
16
+
17
+
18
+ [![Arxiv](https://img.shields.io/badge/Arxiv-b31b1b.svg)](https://arxiv.org/abs/2407.15642)
19
+ [![Project Page](https://img.shields.io/badge/Project-Website-blue)](https://maxin-cn.github.io/cinemo_project/)
20
+
21
+
22
+ This repo contains pre-trained weights, and sampling code for our paper exploring image animation with motion diffusion models (Cinemo). You can find more visualizations on our [project page](https://maxin-cn.github.io/cinemo_project/).
23
+
24
+ In this project, we propose a novel method called Cinemo, which can perform motion-controllable image animation with strong consistency and smoothness. To improve motion smoothness, Cinemo learns the distribution of motion residuals, rather than directly generating subsequent frames. Additionally, a structural similarity index-based method is proposed to control the motion intensity. Furthermore, we propose a noise refinement technique based on discrete cosine transformation to ensure temporal consistency. These three methods help Cinemo generate highly consistent, smooth, and motion-controlled image animation results. Compared to previous methods, Cinemo offers simpler and more precise user control and better generative performance.
25
+
26
+ <div align="center">
27
+ <img src="visuals/pipeline.svg">
28
+ </div>
29
+
30
+ ## News
31
+
32
+ - (πŸ”₯ New) Jul. 23, 2024. πŸ’₯ Our paper is released on [arxiv](https://arxiv.org/abs/2407.15642).
33
+
34
+ - (πŸ”₯ New) Jun. 2, 2024. πŸ’₯ The inference code is released. The checkpoint can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main).
35
+
36
+
37
+ ## Setup
38
+
39
+ First, download and set up the repo:
40
+
41
+ ```bash
42
+ git clone https://github.com/maxin-cn/Cinemo
43
+ cd Cinemo
44
+ ```
45
+
46
+ We provide an [`environment.yml`](environment.yml) file that can be used to create a Conda environment. If you only want
47
+ to run pre-trained models locally on CPU, you can remove the `cudatoolkit` and `pytorch-cuda` requirements from the file.
48
+
49
+ ```bash
50
+ conda env create -f environment.yml
51
+ conda activate cinemo
52
+ ```
53
+
54
+
55
+ ## Animation
56
+
57
+ You can sample from our **pre-trained Cinemo models** with [`animation.py`](pipelines/animation.py). Weights for our pre-trained Cinemo model can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main). The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:
58
+
59
+ ```bash
60
+ bash pipelines/animation.sh
61
+ ```
62
+
63
+ All related checkpoints will download automatically and then you will get the following results,
64
+
65
+ <table style="width:100%; text-align:center;">
66
+ <tr>
67
+ <td align="center">Input image</td>
68
+ <td align="center">Output video</td>
69
+ <td align="center">Input image</td>
70
+ <td align="center">Output video</td>
71
+ </tr>
72
+ <tr>
73
+ <td align="center"><img src="visuals/animations/people_walking/0.jpg" width="100%"></td>
74
+ <td align="center"><img src="visuals/animations/people_walking/people_walking.gif" width="100%"></td>
75
+ <td align="center"><img src="visuals/animations/sea_swell/0.jpg" width="100%"></td>
76
+ <td align="center"><img src="visuals/animations/sea_swell/sea_swell.gif" width="100%"></td>
77
+ </tr>
78
+ <tr>
79
+ <td align="center" colspan="2">"People Walking"</td>
80
+ <td align="center" colspan="2">"Sea Swell"</td>
81
+ </tr>
82
+ <tr>
83
+ <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/0.jpg" width="100%"></td>
84
+ <td align="center"><img src="visuals/animations/girl_dancing_under_the_stars/girl_dancing_under_the_stars.gif" width="100%"></td>
85
+ <td align="center"><img src="visuals/animations/dragon_glowing_eyes/0.jpg" width="100%"></td>
86
+ <td align="center"><img src="visuals/animations/dragon_glowing_eyes/dragon_glowing_eyes.gif" width="100%"></td>
87
+ </tr>
88
+ <tr>
89
+ <td align="center" colspan="2">"Girl Dancing under the Stars"</td>
90
+ <td align="center" colspan="2">"Dragon Glowing Eyes"</td>
91
+ </tr>
92
+
93
+ </table>
94
+
95
+
96
+ ## Other Applications
97
+
98
+ You can also utilize Cinemo for other applications, such as motion transfer and video editing:
99
+
100
+ ```bash
101
+ bash pipelines/video_editing.sh
102
+ ```
103
+
104
+ All related checkpoints will download automatically and you will get the following results,
105
+
106
+ <table style="width:100%; text-align:center;">
107
+ <tr>
108
+ <td align="center">Input video</td>
109
+ <td align="center">First frame</td>
110
+ <td align="center">Edited first frame</td>
111
+ <td align="center">Output video</td>
112
+ </tr>
113
+ <tr>
114
+ <td align="center"><img src="visuals/video_editing/origin/a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
115
+ <td align="center"><img src="visuals/video_editing/origin/0.jpg" width="100%"></td>
116
+ <td align="center"><img src="visuals/video_editing/edit/0.jpg" width="100%"></td>
117
+ <td align="center"><img src="visuals/video_editing/edit/editing_a_corgi_walking_in_the_park_at_sunrise_oil_painting_style.gif" width="100%"></td>
118
+ </tr>
119
+
120
+ </table>
121
+
122
+
123
+
124
+ ## Citation
125
+ If you find this work useful for your research, please consider citing it.
126
+ ```bibtex
127
+ @article{ma2024cinemo,
128
+ title={Cinemo: Latent Diffusion Transformer for Video Generation},
129
+ author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
130
+ journal={arXiv preprint arXiv:2407.15642},
131
+ year={2024}
132
+ }
133
+ ```
134
+
135
+
136
+ ## Acknowledgments
137
+ Cinemo has been greatly inspired by the following amazing works and teams: [LaVie](https://github.com/Vchitect/LaVie) and [SEINE](https://github.com/Vchitect/SEINE), we thank all the contributors for open-sourcing.
138
+
139
+
140
+ ## License
141
+ The code and model weights are licensed under [LICENSE](LICENSE).