zzc0208 commited on
Commit
221b639
·
verified ·
1 Parent(s): f1f9265

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -395
README.md CHANGED
@@ -1,401 +1,25 @@
1
- <p align="center" style="border-radius: 10px">
2
- <img src="asset/logo.png" width="35%" alt="logo"/>
3
- </p>
 
 
 
 
 
 
 
 
 
4
 
5
- # ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
6
 
7
- ### <div align="center"> ICLR 2025 Oral Presentation <div>
8
 
9
- <div align="center">
10
- <a href="https://nvlabs.github.io/Sana/"><img src="https://img.shields.io/static/v1?label=Project&message=Github&color=blue&logo=github-pages"></a> &ensp;
11
- <a href="https://hanlab.mit.edu/projects/sana/"><img src="https://img.shields.io/static/v1?label=Page&message=MIT&color=darkred&logo=github-pages"></a> &ensp;
12
- <a href="https://arxiv.org/abs/2410.10629"><img src="https://img.shields.io/static/v1?label=Arxiv&message=Sana&color=red&logo=arxiv"></a> &ensp;
13
- <a href="https://nv-sana.mit.edu/"><img src="https://img.shields.io/static/v1?label=Demo:6x3090&message=MIT&color=yellow"></a> &ensp;
14
- <a href="https://nv-sana.mit.edu/4bit/"><img src="https://img.shields.io/static/v1?label=Demo:1x3090&message=4bit&color=yellow"></a> &ensp;
15
- <a href="https://nv-sana.mit.edu/ctrlnet/"><img src="https://img.shields.io/static/v1?label=Demo:1x3090&message=ControlNet&color=yellow"></a> &ensp;
16
- <a href="https://replicate.com/chenxwh/sana"><img src="https://img.shields.io/static/v1?label=API:H100&message=Replicate&color=pink"></a> &ensp;
17
- <a href="https://discord.gg/rde6eaE5Ta"><img src="https://img.shields.io/static/v1?label=Discuss&message=Discord&color=purple&logo=discord"></a> &ensp;
18
- </div>
19
 
20
- <p align="center" border-radius="10px">
21
- <img src="asset/Sana.jpg" width="90%" alt="teaser_page1"/>
22
- </p>
23
 
24
- ## 💡 Introduction
25
 
26
- We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution.
27
- Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.
28
- Core designs include:
29
-
30
- (1) [**DC-AE**](https://hanlab.mit.edu/projects/dc-ae): unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. \
31
- (2) **Linear DiT**: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. \
32
- (3) **Decoder-only text encoder**: we replaced T5 with a modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. \
33
- (4) **Efficient training and sampling**: we propose **Flow-DPM-Solver** to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.
34
-
35
- As a result, Sana-0.6B is very competitive with modern giant diffusion models (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.
36
-
37
- <p align="center" border-raduis="10px">
38
- <img src="asset/model-incremental.jpg" width="90%" alt="teaser_page2"/>
39
- </p>
40
-
41
- ## 🔥🔥 News
42
-
43
- - (🔥 New) \[2025/2/10\] 🚀Sana + ControlNet is released. [\[Guidance\]](asset/docs/sana_controlnet.md) | [\[Model\]](asset/docs/model_zoo.md) | [\[Demo\]](https://nv-sana.mit.edu/ctrlnet/)
44
- - (🔥 New) \[2025/1/30\] Release CAME-8bit optimizer code. Saving more GPU memory during training. [\[How to config\]](https://github.com/NVlabs/Sana/blob/main/configs/sana_config/1024ms/Sana_1600M_img1024_CAME8bit.yaml#L86)
45
- - (🔥 New) \[2025/1/29\] 🎉 🎉 🎉**SANA 1.5 is out! Figure out how to do efficient training & inference scaling!** 🚀[\[Tech Report\]](https://arxiv.org/abs/2501.18427)
46
- - (🔥 New) \[2025/1/24\] 4bit-Sana is released, powered by [SVDQuant and Nunchaku](https://github.com/mit-han-lab/nunchaku) inference engine. Now run your Sana within **8GB** GPU VRAM [\[Guidance\]](asset/docs/4bit_sana.md) [\[Demo\]](https://svdquant.mit.edu/) [\[Model\]](asset/docs/model_zoo.md)
47
- - (🔥 New) \[2025/1/24\] DCAE-1.1 is released, better reconstruction quality. [\[Model\]](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.1) [\[diffusers\]](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers)
48
- - (🔥 New) \[2025/1/23\] **Sana is accepted as Oral by ICLR-2025.** 🎉🎉🎉
49
-
50
- ______________________________________________________________________
51
-
52
- - (🔥 New) \[2025/1/12\] DC-AE tiling makes Sana-4K inferences 4096x4096px images within 22GB GPU memory. With model offload and 8bit/4bit quantize. The 4K Sana run within **8GB** GPU VRAM. [\[Guidance\]](asset/docs/model_zoo.md#-3-4k-models)
53
- - (🔥 New) \[2025/1/11\] Sana code-base license changed to Apache 2.0.
54
- - (🔥 New) \[2025/1/10\] Inference Sana with 8bit quantization.[\[Guidance\]](asset/docs/8bit_sana.md#quantization)
55
- - (🔥 New) \[2025/1/8\] 4K resolution [Sana models](asset/docs/model_zoo.md) is supported in [Sana-ComfyUI](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) and [work flow](asset/docs/ComfyUI/Sana_FlowEuler_4K.json) is also prepared. [\[4K guidance\]](asset/docs/ComfyUI/comfyui.md)
56
- - (🔥 New) \[2025/1/8\] 1.6B 4K resolution [Sana models](asset/docs/model_zoo.md) are released: [\[BF16 pth\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16) or [\[BF16 diffusers\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers). 🚀 Get your 4096x4096 resolution images within 20 seconds! Find more samples in [Sana page](https://nvlabs.github.io/Sana/). Thanks [SUPIR](https://github.com/Fanghua-Yu/SUPIR) for their wonderful work and support.
57
- - (🔥 New) \[2025/1/2\] Bug in the `diffusers` pipeline is solved. [Solved PR](https://github.com/huggingface/diffusers/pull/10431)
58
- - (🔥 New) \[2025/1/2\] 2K resolution [Sana models](asset/docs/model_zoo.md) is supported in [Sana-ComfyUI](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) and [work flow](asset/docs/ComfyUI/Sana_FlowEuler_2K.json) is also prepared.
59
- - ✅ \[2024/12\] 1.6B 2K resolution [Sana models](asset/docs/model_zoo.md) are released: [\[BF16 pth\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16) or [\[BF16 diffusers\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers). 🚀 Get your 2K resolution images within 4 seconds! Find more samples in [Sana page](https://nvlabs.github.io/Sana/). Thanks [SUPIR](https://github.com/Fanghua-Yu/SUPIR) for their wonderful work and support.
60
- - ✅ \[2024/12\] `diffusers` supports Sana-LoRA fine-tuning! Sana-LoRA's training and convergence speed is super fast. [\[Guidance\]](asset/docs/sana_lora_dreambooth.md) or [\[diffusers docs\]](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sana.md).
61
- - ✅ \[2024/12\] `diffusers` has Sana! [All Sana models in diffusers safetensors](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) are released and diffusers pipeline `SanaPipeline`, `SanaPAGPipeline`, `DPMSolverMultistepScheduler(with FlowMatching)` are all supported now. We prepare a [Model Card](asset/docs/model_zoo.md) for you to choose.
62
- - ✅ \[2024/12\] 1.6B BF16 [Sana model](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) is released for stable fine-tuning.
63
- - ✅ \[2024/12\] We release the [ComfyUI node](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) for Sana. [\[Guidance\]](asset/docs/ComfyUI/comfyui.md)
64
- - ✅ \[2024/11\] All multi-linguistic (Emoji & Chinese & English) SFT models are released: [1.6B-512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing), [1.6B-1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing), [600M-512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px), [600M-1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px). The metric performance is shown [here](#performance)
65
- - ✅ \[2024/11\] Sana Replicate API is launching at [Sana-API](https://replicate.com/chenxwh/sana).
66
- - ✅ \[2024/11\] 1.6B [Sana models](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) are released.
67
- - ✅ \[2024/11\] Training & Inference & Metrics code are released.
68
- - ✅ \[2024/11\] Working on [`diffusers`](https://github.com/huggingface/diffusers/pull/9982).
69
- - \[2024/10\] [Demo](https://nv-sana.mit.edu/) is released.
70
- - \[2024/10\] [DC-AE Code](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md) and [weights](https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b) are released!
71
- - \[2024/10\] [Paper](https://arxiv.org/abs/2410.10629) is on Arxiv!
72
-
73
- ## Performance
74
-
75
- | Methods (1024x1024) | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👇 | CLIP 👆 | GenEval 👆 | DPG 👆 |
76
- |-----------------------------------------------------------------------------------------------------|------------------------|-------------|------------|---------|-------------|--------------|-------------|-------------|
77
- | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | _0.67_ | 84.0 |
78
- | **Sana-0.6B** | 1.7 | 0.9 | 0.6 | 39.5× | _5.81_ | 28.36 | 0.64 | 83.6 |
79
- | **[Sana-0.6B-MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px)** | 1.7 | 0.9 | 0.6 | 39.5× | **5.61** | <u>28.80</u> | <u>0.68</u> | _84.2_ |
80
- | **Sana-1.6B** | 1.0 | 1.2 | 1.6 | 23.3× | <u>5.76</u> | _28.67_ | 0.66 | **84.8** |
81
- | **[Sana-1.6B-MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing)** | 1.0 | 1.2 | 1.6 | 23.3× | 5.92 | **28.94** | **0.69** | <u>84.5</u> |
82
-
83
- <details>
84
- <summary><h3>Click to show all</h3></summary>
85
-
86
- | Methods | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 |
87
- |------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------|
88
- | _**512 × 512 resolution**_ | | | | | | | | |
89
- | PixArt-α | 1.5 | 1.2 | 0.6 | 1.0× | 6.14 | 27.55 | 0.48 | 71.6 |
90
- | PixArt-Σ | 1.5 | 1.2 | 0.6 | 1.0× | _6.34_ | _27.62_ | <u>0.52</u> | _79.5_ |
91
- | **Sana-0.6B** | 6.7 | 0.8 | 0.6 | 5.0× | <u>5.67</u> | <u>27.92</u> | _0.64_ | <u>84.3</u> |
92
- | **Sana-1.6B** | 3.8 | 0.6 | 1.6 | 2.5× | **5.16** | **28.19** | **0.66** | **85.5** |
93
- | _**1024 × 1024 resolution**_ | | | | | | | | |
94
- | LUMINA-Next | 0.12 | 9.1 | 2.0 | 2.8× | 7.58 | 26.84 | 0.46 | 74.6 |
95
- | SDXL | 0.15 | 6.5 | 2.6 | 3.5× | 6.63 | _29.03_ | 0.55 | 74.7 |
96
- | PlayGroundv2.5 | 0.21 | 5.3 | 2.6 | 4.9× | _6.09_ | **29.13** | 0.56 | 75.5 |
97
- | Hunyuan-DiT | 0.05 | 18.2 | 1.5 | 1.2× | 6.54 | 28.19 | 0.63 | 78.9 |
98
- | PixArt-Σ | 0.4 | 2.7 | 0.6 | 9.3× | 6.15 | 28.26 | 0.54 | 80.5 |
99
- | DALLE3 | - | - | - | - | - | - | _0.67_ | 83.5 |
100
- | SD3-medium | 0.28 | 4.4 | 2.0 | 6.5× | 11.92 | 27.83 | 0.62 | <u>84.1</u> |
101
- | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | _0.67_ | _84.0_ |
102
- | FLUX-schnell | 0.5 | 2.1 | 12.0 | 11.6× | 7.94 | 28.14 | **0.71** | **84.8** |
103
- | **Sana-0.6B** | 1.7 | 0.9 | 0.6 | **39.5×** | <u>5.81</u> | 28.36 | 0.64 | 83.6 |
104
- | **Sana-1.6B** | 1.0 | 1.2 | 1.6 | **23.3×** | **5.76** | <u>28.67</u> | <u>0.66</u> | **84.8** |
105
-
106
- </details>
107
-
108
- ## Contents
109
-
110
- - [Env](#-1-dependencies-and-installation)
111
- - [Demo](#-2-how-to-play-with-sana-inference)
112
- - [Model Zoo](asset/docs/model_zoo.md)
113
- - [Training](#-3-how-to-train-sana)
114
- - [Testing](#-4-metric-toolkit)
115
- - [TODO](#to-do-list)
116
- - [Citation](#bibtex)
117
-
118
- # 🔧 1. Dependencies and Installation
119
-
120
- - Python >= 3.10.0 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
121
- - [PyTorch >= 2.0.1+cu12.1](https://pytorch.org/)
122
-
123
- ```bash
124
- git clone https://github.com/NVlabs/Sana.git
125
- cd Sana
126
-
127
- ./environment_setup.sh sana
128
- # or you can install each components step by step following environment_setup.sh
129
- ```
130
-
131
- # 💻 2. How to Play with Sana (Inference)
132
-
133
- ## 💰Hardware requirement
134
-
135
- - 9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference.
136
- - All the tests are done on A100 GPUs. Different GPU version may be different.
137
-
138
- ## 🔛 Choose your model: [Model card](asset/docs/model_zoo.md)
139
-
140
- ## 🔛 Quick start with [Gradio](https://www.gradio.app/guides/quickstart)
141
-
142
- ```bash
143
- # official online demo
144
- DEMO_PORT=15432 \
145
- python app/app_sana.py \
146
- --share \
147
- --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
148
- --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
149
- --image_size=1024
150
- ```
151
-
152
- ### 1. How to use `SanaPipeline` with `🧨diffusers`
153
-
154
- > \[!IMPORTANT\]
155
- > Upgrade your `diffusers>=0.32.0.dev` to make the `SanaPipeline` and `SanaPAGPipeline` available!
156
- >
157
- > ```bash
158
- > pip install git+https://github.com/huggingface/diffusers
159
- > ```
160
- >
161
- > Make sure to specify `pipe.transformer` to default `torch_dtype` and `variant` according to [Model Card](asset/docs/model_zoo.md).
162
- >
163
- > Set `pipe.text_encoder` to BF16 and `pipe.vae` to FP32 or BF16. For more info, [docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana#sanapipeline) are here.
164
-
165
- ```python
166
- # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
167
- import torch
168
- from diffusers import SanaPipeline
169
-
170
- pipe = SanaPipeline.from_pretrained(
171
- "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
172
- variant="bf16",
173
- torch_dtype=torch.bfloat16,
174
- )
175
- pipe.to("cuda")
176
-
177
- pipe.vae.to(torch.bfloat16)
178
- pipe.text_encoder.to(torch.bfloat16)
179
-
180
- prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
181
- image = pipe(
182
- prompt=prompt,
183
- height=1024,
184
- width=1024,
185
- guidance_scale=4.5,
186
- num_inference_steps=20,
187
- generator=torch.Generator(device="cuda").manual_seed(42),
188
- )[0]
189
-
190
- image[0].save("sana.png")
191
- ```
192
-
193
- ### 2. How to use `SanaPAGPipeline` with `🧨diffusers`
194
-
195
- ```python
196
- # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
197
- import torch
198
- from diffusers import SanaPAGPipeline
199
-
200
- pipe = SanaPAGPipeline.from_pretrained(
201
- "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
202
- variant="fp16",
203
- torch_dtype=torch.float16,
204
- pag_applied_layers="transformer_blocks.8",
205
- )
206
- pipe.to("cuda")
207
-
208
- pipe.text_encoder.to(torch.bfloat16)
209
- pipe.vae.to(torch.bfloat16)
210
-
211
- prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
212
- image = pipe(
213
- prompt=prompt,
214
- guidance_scale=5.0,
215
- pag_scale=2.0,
216
- num_inference_steps=20,
217
- generator=torch.Generator(device="cuda").manual_seed(42),
218
- )[0]
219
- image[0].save('sana.png')
220
- ```
221
-
222
- <details>
223
- <summary><h3>3. How to use Sana in this repo</h3></summary>
224
-
225
- ```python
226
- import torch
227
- from app.sana_pipeline import SanaPipeline
228
- from torchvision.utils import save_image
229
-
230
- device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
231
- generator = torch.Generator(device=device).manual_seed(42)
232
-
233
- sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
234
- sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px_BF16/checkpoints/Sana_1600M_1024px_BF16.pth")
235
- prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
236
-
237
- image = sana(
238
- prompt=prompt,
239
- height=1024,
240
- width=1024,
241
- guidance_scale=5.0,
242
- pag_guidance_scale=2.0,
243
- num_inference_steps=18,
244
- generator=generator,
245
- )
246
- save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1))
247
- ```
248
-
249
- </details>
250
-
251
- <details>
252
- <summary><h3>4. Run Sana (Inference) with Docker</h3></summary>
253
-
254
- ```
255
- # Pull related models
256
- huggingface-cli download google/gemma-2b-it
257
- huggingface-cli download google/shieldgemma-2b
258
- huggingface-cli download mit-han-lab/dc-ae-f32c32-sana-1.0
259
- huggingface-cli download Efficient-Large-Model/Sana_1600M_1024px
260
-
261
- # Run with docker
262
- docker build . -t sana
263
- docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
264
- -v ~/.cache:/root/.cache \
265
- sana
266
- ```
267
-
268
- </details>
269
-
270
- ## 🔛 Run inference with TXT or JSON files
271
-
272
- ```bash
273
- # Run samples in a txt file
274
- python scripts/inference.py \
275
- --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
276
- --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
277
- --txt_file=asset/samples/samples_mini.txt
278
-
279
- # Run samples in a json file
280
- python scripts/inference.py \
281
- --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
282
- --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
283
- --json_file=asset/samples/samples_mini.json
284
- ```
285
-
286
- where each line of [`asset/samples/samples_mini.txt`](asset/samples/samples_mini.txt) contains a prompt to generate
287
-
288
- # 🔥 3. How to Train Sana
289
-
290
- ## 💰Hardware requirement
291
-
292
- - 32GB VRAM is required for both 0.6B and 1.6B model's training
293
-
294
- ### 1). Train with image-text pairs in directory
295
-
296
- We provide a training example here and you can also select your desired config file from [config files dir](configs/sana_config) based on your data structure.
297
-
298
- To launch Sana training, you will first need to prepare data in the following formats. [Here](asset/example_data) is an example for the data structure for reference.
299
-
300
- ```bash
301
- asset/example_data
302
- ├── AAA.txt
303
- ├── AAA.png
304
- ├── BCC.txt
305
- ├── BCC.png
306
- ├── ......
307
- ├── CCC.txt
308
- └── CCC.png
309
- ```
310
-
311
- Then Sana's training can be launched via
312
-
313
- ```bash
314
- # Example of training Sana 0.6B with 512x512 resolution from scratch
315
- bash train_scripts/train.sh \
316
- configs/sana_config/512ms/Sana_600M_img512.yaml \
317
- --data.data_dir="[asset/example_data]" \
318
- --data.type=SanaImgDataset \
319
- --model.multi_scale=false \
320
- --train.train_batch_size=32
321
-
322
- # Example of fine-tuning Sana 1.6B with 1024x1024 resolution
323
- bash train_scripts/train.sh \
324
- configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
325
- --data.data_dir="[asset/example_data]" \
326
- --data.type=SanaImgDataset \
327
- --model.load_from=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
328
- --model.multi_scale=false \
329
- --train.train_batch_size=8
330
- ```
331
-
332
- ### 2). Train with image-text pairs in directory
333
-
334
- We also provide conversion scripts to convert your data to the required format. You can refer to the [data conversion scripts](asset/data_conversion_scripts) for more details.
335
-
336
- ```bash
337
- python tools/convert_ImgDataset_to_WebDatasetMS_format.py
338
- ```
339
-
340
- Then Sana's training can be launched via
341
-
342
- ```bash
343
- # Example of training Sana 0.6B with 512x512 resolution from scratch
344
- bash train_scripts/train.sh \
345
- configs/sana_config/512ms/Sana_600M_img512.yaml \
346
- --data.data_dir="[asset/example_data_tar]" \
347
- --data.type=SanaWebDatasetMS \
348
- --model.multi_scale=true \
349
- --train.train_batch_size=32
350
- ```
351
-
352
- # 💻 4. Metric toolkit
353
-
354
- Refer to [Toolkit Manual](asset/docs/metrics_toolkit.md).
355
-
356
- # 💪To-Do List
357
-
358
- We will try our best to release
359
-
360
- - \[✅\] Training code
361
- - \[✅\] Inference code
362
- - \[✅\] Model zoo
363
- - \[✅\] ComfyUI
364
- - \[✅\] DC-AE Diffusers
365
- - \[✅\] Sana merged in Diffusers(https://github.com/huggingface/diffusers/pull/9982)
366
- - \[✅\] LoRA training by [@paul](https://github.com/sayakpaul)(`diffusers`: https://github.com/huggingface/diffusers/pull/10234)
367
- - \[✅\] 2K/4K resolution models.(Thanks [@SUPIR](https://github.com/Fanghua-Yu/SUPIR) to provide a 4K super-resolution model)
368
- - \[✅\] 8bit / 4bit Laptop development
369
- - \[💻\] ControlNet (train & inference & models)
370
- - \[💻\] Larger model size
371
- - \[💻\] Better re-construction F32/F64 VAEs.
372
- - \[💻\] **Sana1.5 (Focus on: Human body / Human face / Text rendering / Realism / Efficiency)**
373
-
374
- # 🤗Acknowledgements
375
-
376
- **Thanks to the following open-sourced codebase for their wonderful work and codebase!**
377
-
378
- - [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha)
379
- - [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma)
380
- - [Efficient-ViT](https://github.com/mit-han-lab/efficientvit)
381
- - [ComfyUI_ExtraModels](https://github.com/city96/ComfyUI_ExtraModels)
382
- - [SVDQuant and Nunchaku](https://github.com/mit-han-lab/nunchaku)
383
- - [diffusers](https://github.com/huggingface/diffusers)
384
-
385
- ## 🌟 Star History
386
-
387
- [![Star History Chart](https://api.star-history.com/svg?repos=NVlabs/Sana&type=Date)](https://star-history.com/#NVlabs/sana&Date)
388
-
389
- # 📖BibTeX
390
-
391
- ```
392
- @misc{xie2024sana,
393
- title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer},
394
- author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han},
395
- year={2024},
396
- eprint={2410.10629},
397
- archivePrefix={arXiv},
398
- primaryClass={cs.CV},
399
- url={https://arxiv.org/abs/2410.10629},
400
- }
401
- ```
 
1
+ ---
2
+ title: Twig-V0-Alpha-Demo
3
+ emoji: 🖼
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ short_description: Twig-v0-t2i
12
+ ---
13
 
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
 
16
+ # Links
17
 
18
+ * [Swarmeta-AI/Twig-v0-alpha](https://huggingface.co/Swarmeta-AI/Twig-v0-alpha)
 
 
 
 
 
 
 
 
 
19
 
 
 
 
20
 
21
+ # Depending
22
 
23
+ * PyTorch version: 2.6.0
24
+ * Google gemma-2b-it - text_encoder
25
+ * skd_version: 4.44.0