Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ Language: **English**
|
|
11 |
|
12 |
## Instructions
|
13 |
|
14 |
-
The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.
|
15 |
|
16 |
We provide two types of trained LoRA weights for you to test.
|
17 |
|
@@ -21,47 +21,50 @@ Language: **English**
|
|
21 |
cd HunyuanDiT
|
22 |
# Use the huggingface-cli tool to download the model.
|
23 |
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
|
|
|
|
|
|
|
24 |
```
|
25 |
|
26 |
## Training
|
27 |
|
28 |
-
We provide three types of weights for fine-tuning
|
29 |
-
|
30 |
-
Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter.
|
31 |
|
32 |
-
|
33 |
|
34 |
If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
|
35 |
|
|
|
|
|
|
|
36 |
```bash
|
37 |
-
model='DiT-g/2'
|
38 |
-
task_flag="lora_porcelain_ema_rank64"
|
39 |
-
|
40 |
-
index_file=dataset/porcelain/jsons/porcelain.json
|
41 |
-
results_dir=./log_EXP
|
42 |
-
batch_size=1
|
43 |
-
image_size=1024
|
44 |
-
grad_accu_steps=2
|
45 |
-
warmup_num_steps=0
|
46 |
-
lr=0.0001
|
47 |
-
ckpt_every=100
|
48 |
-
ckpt_latest_every=2000
|
49 |
-
rank=64
|
50 |
-
max_training_steps=2000
|
51 |
|
52 |
PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
|
53 |
--task-flag ${task_flag} \
|
54 |
--model ${model} \
|
55 |
-
--
|
56 |
--rank ${rank} \
|
57 |
-
--resume
|
58 |
-
--resume ${
|
59 |
-
--ema-to-module \
|
60 |
--lr ${lr} \
|
61 |
-
--noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.
|
62 |
--predict-type v_prediction \
|
63 |
-
--uncond-p 0
|
64 |
-
--uncond-p-t5 0
|
65 |
--index-file ${index_file} \
|
66 |
--random-flip \
|
67 |
--batch-size ${batch_size} \
|
@@ -110,33 +113,28 @@ Make sure you have activated the conda environment before running the following
|
|
110 |
```shell
|
111 |
# jade style
|
112 |
|
113 |
-
# By default, we start a Chinese UI.
|
114 |
-
python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
115 |
|
116 |
# Using Flash Attention for acceleration.
|
117 |
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
118 |
|
119 |
# You can disable the enhancement model if the GPU memory is insufficient.
|
120 |
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
|
121 |
-
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
122 |
|
123 |
# Start with English UI
|
124 |
-
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
125 |
|
126 |
# porcelain style
|
127 |
|
128 |
-
# By default, we start a Chinese UI.
|
129 |
-
python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
130 |
-
|
131 |
# Using Flash Attention for acceleration.
|
132 |
-
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
133 |
|
134 |
# You can disable the enhancement model if the GPU memory is insufficient.
|
135 |
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
|
136 |
-
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
137 |
|
138 |
# Start with English UI
|
139 |
-
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
140 |
```
|
141 |
|
142 |
|
@@ -148,30 +146,24 @@ We provide several commands to quick start:
|
|
148 |
# jade style
|
149 |
|
150 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
151 |
-
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
152 |
|
153 |
# Only Text-to-Image. Torch mode
|
154 |
-
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
155 |
-
|
156 |
-
# Only Text-to-Image. Flash Attention mode
|
157 |
-
python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
158 |
|
159 |
# Generate an image with other image sizes.
|
160 |
-
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
161 |
|
162 |
# porcelain style
|
163 |
|
164 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
165 |
-
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
166 |
|
167 |
# Only Text-to-Image. Torch mode
|
168 |
-
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
169 |
-
|
170 |
-
# Only Text-to-Image. Flash Attention mode
|
171 |
-
python sample_t2i.py --infer-mode fa --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
172 |
|
173 |
# Generate an image with other image sizes.
|
174 |
-
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
|
175 |
```
|
176 |
|
177 |
|
@@ -209,7 +201,7 @@ def load_hunyuan_dit_lora(transformer_state_dict, lora_state_dict, lora_scale):
|
|
209 |
|
210 |
return transformer_state_dict
|
211 |
|
212 |
-
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.
|
213 |
pipe.to("cuda")
|
214 |
|
215 |
from safetensors import safe_open
|
|
|
11 |
|
12 |
## Instructions
|
13 |
|
14 |
+
The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2).
|
15 |
|
16 |
We provide two types of trained LoRA weights for you to test.
|
17 |
|
|
|
21 |
cd HunyuanDiT
|
22 |
# Use the huggingface-cli tool to download the model.
|
23 |
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
|
24 |
+
|
25 |
+
# Quick start
|
26 |
+
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora-ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
27 |
```
|
28 |
|
29 |
## Training
|
30 |
|
31 |
+
We provide three types of weights for fine-tuning LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.
|
|
|
|
|
32 |
|
33 |
+
Here is an example for LoRA with HunYuanDiT v1.2, we load the `distill` weights into the main model and perform LoRA fine-tuning through the `resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt` setting.
|
34 |
|
35 |
If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
|
36 |
|
37 |
+
If you want to train LoRA with HunYuanDiT v1.1, you could add `--use-style-cond`, `--size-cond 1024 1024` and `--beta-end 0.03`.
|
38 |
+
|
39 |
+
|
40 |
```bash
|
41 |
+
model='DiT-g/2' # model type
|
42 |
+
task_flag="lora_porcelain_ema_rank64" # task flag
|
43 |
+
resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt # resume checkpoint
|
44 |
+
index_file=dataset/porcelain/jsons/porcelain.json # the selected data indices
|
45 |
+
results_dir=./log_EXP # save root for results
|
46 |
+
batch_size=1 # training batch size
|
47 |
+
image_size=1024 # training image resolution
|
48 |
+
grad_accu_steps=2 # gradient accumulation steps
|
49 |
+
warmup_num_steps=0 # warm-up steps
|
50 |
+
lr=0.0001 # learning rate
|
51 |
+
ckpt_every=100 # create a ckpt every a few steps.
|
52 |
+
ckpt_latest_every=2000 # create a ckpt named `latest.pt` every a few steps.
|
53 |
+
rank=64 # rank of lora
|
54 |
+
max_training_steps=2000 # Maximum training iteration steps
|
55 |
|
56 |
PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
|
57 |
--task-flag ${task_flag} \
|
58 |
--model ${model} \
|
59 |
+
--training-parts lora \
|
60 |
--rank ${rank} \
|
61 |
+
--resume \
|
62 |
+
--resume-module-root ${resume_module_root} \
|
|
|
63 |
--lr ${lr} \
|
64 |
+
--noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
|
65 |
--predict-type v_prediction \
|
66 |
+
--uncond-p 0 \
|
67 |
+
--uncond-p-t5 0 \
|
68 |
--index-file ${index_file} \
|
69 |
--random-flip \
|
70 |
--batch-size ${batch_size} \
|
|
|
113 |
```shell
|
114 |
# jade style
|
115 |
|
|
|
|
|
116 |
|
117 |
# Using Flash Attention for acceleration.
|
118 |
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
|
119 |
|
120 |
# You can disable the enhancement model if the GPU memory is insufficient.
|
121 |
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
|
122 |
+
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
|
123 |
|
124 |
# Start with English UI
|
125 |
+
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
|
126 |
|
127 |
# porcelain style
|
128 |
|
|
|
|
|
|
|
129 |
# Using Flash Attention for acceleration.
|
130 |
+
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
131 |
|
132 |
# You can disable the enhancement model if the GPU memory is insufficient.
|
133 |
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
|
134 |
+
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
135 |
|
136 |
# Start with English UI
|
137 |
+
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
138 |
```
|
139 |
|
140 |
|
|
|
146 |
# jade style
|
147 |
|
148 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
149 |
+
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
|
150 |
|
151 |
# Only Text-to-Image. Torch mode
|
152 |
+
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
|
|
|
|
|
|
|
153 |
|
154 |
# Generate an image with other image sizes.
|
155 |
+
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
|
156 |
|
157 |
# porcelain style
|
158 |
|
159 |
# Prompt Enhancement + Text-to-Image. Torch mode
|
160 |
+
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
161 |
|
162 |
# Only Text-to-Image. Torch mode
|
163 |
+
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
|
|
|
|
|
|
164 |
|
165 |
# Generate an image with other image sizes.
|
166 |
+
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
|
167 |
```
|
168 |
|
169 |
|
|
|
201 |
|
202 |
return transformer_state_dict
|
203 |
|
204 |
+
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers", torch_dtype=torch.float16)
|
205 |
pipe.to("cuda")
|
206 |
|
207 |
from safetensors import safe_open
|