Safetensors
English
Zhiminli commited on
Commit
28b6504
1 Parent(s): 59b294b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -49
README.md CHANGED
@@ -11,7 +11,7 @@ Language: **English**
11
 
12
  ## Instructions
13
 
14
- The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1).
15
 
16
  We provide two types of trained LoRA weights for you to test.
17
 
@@ -21,47 +21,50 @@ Language: **English**
21
  cd HunyuanDiT
22
  # Use the huggingface-cli tool to download the model.
23
  huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
 
 
 
24
  ```
25
 
26
  ## Training
27
 
28
- We provide three types of weights for fine-tuning HY-DiT LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.
29
-
30
- Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter.
31
 
32
- If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter.
33
 
34
  If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
35
 
 
 
 
36
  ```bash
37
- model='DiT-g/2' # model type
38
- task_flag="lora_porcelain_ema_rank64" # task flag
39
- resume=./ckpts/t2i/model/ # resume checkpoint
40
- index_file=dataset/porcelain/jsons/porcelain.json # the selected data indices
41
- results_dir=./log_EXP # save root for results
42
- batch_size=1 # training batch size
43
- image_size=1024 # training image resolution
44
- grad_accu_steps=2 # gradient accumulation steps
45
- warmup_num_steps=0 # warm-up steps
46
- lr=0.0001 # learning rate
47
- ckpt_every=100 # create a ckpt every a few steps.
48
- ckpt_latest_every=2000 # create a ckpt named `latest.pt` every a few steps.
49
- rank=64 # rank of lora
50
- max_training_steps=2000 # Maximum training iteration steps
51
 
52
  PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
53
  --task-flag ${task_flag} \
54
  --model ${model} \
55
- --training_parts lora \
56
  --rank ${rank} \
57
- --resume-split \
58
- --resume ${resume} \
59
- --ema-to-module \
60
  --lr ${lr} \
61
- --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
62
  --predict-type v_prediction \
63
- --uncond-p 0.44 \
64
- --uncond-p-t5 0.44 \
65
  --index-file ${index_file} \
66
  --random-flip \
67
  --batch-size ${batch_size} \
@@ -110,33 +113,28 @@ Make sure you have activated the conda environment before running the following
110
  ```shell
111
  # jade style
112
 
113
- # By default, we start a Chinese UI.
114
- python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
115
 
116
  # Using Flash Attention for acceleration.
117
  python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
118
 
119
  # You can disable the enhancement model if the GPU memory is insufficient.
120
  # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
121
- python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
122
 
123
  # Start with English UI
124
- python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
125
 
126
  # porcelain style
127
 
128
- # By default, we start a Chinese UI.
129
- python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
130
-
131
  # Using Flash Attention for acceleration.
132
- python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
133
 
134
  # You can disable the enhancement model if the GPU memory is insufficient.
135
  # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
136
- python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
137
 
138
  # Start with English UI
139
- python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
140
  ```
141
 
142
 
@@ -148,30 +146,24 @@ We provide several commands to quick start:
148
  # jade style
149
 
150
  # Prompt Enhancement + Text-to-Image. Torch mode
151
- python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
152
 
153
  # Only Text-to-Image. Torch mode
154
- python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
155
-
156
- # Only Text-to-Image. Flash Attention mode
157
- python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
158
 
159
  # Generate an image with other image sizes.
160
- python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
161
 
162
  # porcelain style
163
 
164
  # Prompt Enhancement + Text-to-Image. Torch mode
165
- python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
166
 
167
  # Only Text-to-Image. Torch mode
168
- python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
169
-
170
- # Only Text-to-Image. Flash Attention mode
171
- python sample_t2i.py --infer-mode fa --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
172
 
173
  # Generate an image with other image sizes.
174
- python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
175
  ```
176
 
177
 
@@ -209,7 +201,7 @@ def load_hunyuan_dit_lora(transformer_state_dict, lora_state_dict, lora_scale):
209
 
210
  return transformer_state_dict
211
 
212
- pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers", torch_dtype=torch.float16)
213
  pipe.to("cuda")
214
 
215
  from safetensors import safe_open
 
11
 
12
  ## Instructions
13
 
14
+ The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2).
15
 
16
  We provide two types of trained LoRA weights for you to test.
17
 
 
21
  cd HunyuanDiT
22
  # Use the huggingface-cli tool to download the model.
23
  huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
24
+
25
+ # Quick start
26
+ python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora-ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
27
  ```
28
 
29
  ## Training
30
 
31
+ We provide three types of weights for fine-tuning LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.
 
 
32
 
33
+ Here is an example for LoRA with HunYuanDiT v1.2, we load the `distill` weights into the main model and perform LoRA fine-tuning through the `resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt` setting.
34
 
35
  If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
36
 
37
+ If you want to train LoRA with HunYuanDiT v1.1, you could add `--use-style-cond`, `--size-cond 1024 1024` and `--beta-end 0.03`.
38
+
39
+
40
  ```bash
41
+ model='DiT-g/2' # model type
42
+ task_flag="lora_porcelain_ema_rank64" # task flag
43
+ resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt # resume checkpoint
44
+ index_file=dataset/porcelain/jsons/porcelain.json # the selected data indices
45
+ results_dir=./log_EXP # save root for results
46
+ batch_size=1 # training batch size
47
+ image_size=1024 # training image resolution
48
+ grad_accu_steps=2 # gradient accumulation steps
49
+ warmup_num_steps=0 # warm-up steps
50
+ lr=0.0001 # learning rate
51
+ ckpt_every=100 # create a ckpt every a few steps.
52
+ ckpt_latest_every=2000 # create a ckpt named `latest.pt` every a few steps.
53
+ rank=64 # rank of lora
54
+ max_training_steps=2000 # Maximum training iteration steps
55
 
56
  PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
57
  --task-flag ${task_flag} \
58
  --model ${model} \
59
+ --training-parts lora \
60
  --rank ${rank} \
61
+ --resume \
62
+ --resume-module-root ${resume_module_root} \
 
63
  --lr ${lr} \
64
+ --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
65
  --predict-type v_prediction \
66
+ --uncond-p 0 \
67
+ --uncond-p-t5 0 \
68
  --index-file ${index_file} \
69
  --random-flip \
70
  --batch-size ${batch_size} \
 
113
  ```shell
114
  # jade style
115
 
 
 
116
 
117
  # Using Flash Attention for acceleration.
118
  python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
119
 
120
  # You can disable the enhancement model if the GPU memory is insufficient.
121
  # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
122
+ python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
123
 
124
  # Start with English UI
125
+ python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
126
 
127
  # porcelain style
128
 
 
 
 
129
  # Using Flash Attention for acceleration.
130
+ python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
131
 
132
  # You can disable the enhancement model if the GPU memory is insufficient.
133
  # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
134
+ python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
135
 
136
  # Start with English UI
137
+ python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
138
  ```
139
 
140
 
 
146
  # jade style
147
 
148
  # Prompt Enhancement + Text-to-Image. Torch mode
149
+ python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
150
 
151
  # Only Text-to-Image. Torch mode
152
+ python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
 
 
 
153
 
154
  # Generate an image with other image sizes.
155
+ python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
156
 
157
  # porcelain style
158
 
159
  # Prompt Enhancement + Text-to-Image. Torch mode
160
+ python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
161
 
162
  # Only Text-to-Image. Torch mode
163
+ python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 
 
 
164
 
165
  # Generate an image with other image sizes.
166
+ python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
167
  ```
168
 
169
 
 
201
 
202
  return transformer_state_dict
203
 
204
+ pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers", torch_dtype=torch.float16)
205
  pipe.to("cuda")
206
 
207
  from safetensors import safe_open