Tencent-Hunyuan
/

HYDiT-LoRA

Safetensors

English

Model card Files Files and versions Community

Zhiminli commited on Jul 9, 2024

Commit

28b6504

verified ·

1 Parent(s): 59b294b

Update README.md

Browse files

Files changed (1) hide show

README.md +41 -49

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ Language: **English**
 ## Instructions
- The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1).
  We provide two types of trained LoRA weights for you to test.
@@ -21,47 +21,50 @@ Language: **English**
 cd HunyuanDiT
 # Use the huggingface-cli tool to download the model.
 huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
 ```
 ## Training
-We provide three types of weights for fine-tuning HY-DiT LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.
-Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter.
-If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter.
 If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
 ```bash
-model='DiT-g/2'                                        # model type
-task_flag="lora_porcelain_ema_rank64"                  # task flag
-resume=./ckpts/t2i/model/                              # resume checkpoint
-index_file=dataset/porcelain/jsons/porcelain.json      # the selected data indices
-results_dir=./log_EXP                                  # save root for results
-batch_size=1                                           # training batch size
-image_size=1024                                        # training image resolution
-grad_accu_steps=2                                      # gradient accumulation steps
-warmup_num_steps=0                                     # warm-up steps
-lr=0.0001                                              # learning rate
-ckpt_every=100                                         # create a ckpt every a few steps.
-ckpt_latest_every=2000                                 # create a ckpt named `latest.pt` every a few steps.
-rank=64                                                # rank of lora
-max_training_steps=2000                                # Maximum training iteration steps
 PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
     --task-flag ${task_flag} \
     --model ${model} \
-    --training_parts lora \
     --rank ${rank} \
-    --resume-split \
-    --resume ${resume} \
-    --ema-to-module \
     --lr ${lr} \
-    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
     --predict-type v_prediction \
-    --uncond-p 0.44 \
-    --uncond-p-t5 0.44 \
     --index-file ${index_file} \
     --random-flip \
     --batch-size ${batch_size} \
@@ -110,33 +113,28 @@ Make sure you have activated the conda environment before running the following
 ```shell
 # jade style
-# By default, we start a Chinese UI.
-python app/hydit_app.py  --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # Using Flash Attention for acceleration.
 python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # You can disable the enhancement model if the GPU memory is insufficient.
 # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
-python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/jade
 # Start with English UI
-python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # porcelain style
-# By default, we start a Chinese UI.
-python app/hydit_app.py  --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
 # Using Flash Attention for acceleration.
-python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
 # You can disable the enhancement model if the GPU memory is insufficient.
 # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
-python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/porcelain
 # Start with English UI
-python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
 ```
@@ -148,30 +146,24 @@ We provide several commands to quick start:
 # jade style
 # Prompt Enhancement + Text-to-Image. Torch mode
-python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # Only Text-to-Image. Torch mode
-python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
-# Only Text-to-Image. Flash Attention mode
-python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt  ./ckpts/t2i/lora/jade
 # Generate an image with other image sizes.
-python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # porcelain style
 # Prompt Enhancement + Text-to-Image. Torch mode
-python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
 # Only Text-to-Image. Torch mode
-python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
-# Only Text-to-Image. Flash Attention mode
-python sample_t2i.py --infer-mode fa --prompt "青花瓷风格，一只猫在追蝴蝶"  --load-key ema --lora_ckpt  ./ckpts/t2i/lora/porcelain
 # Generate an image with other image sizes.
-python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
 ```
@@ -209,7 +201,7 @@ def load_hunyuan_dit_lora(transformer_state_dict, lora_state_dict, lora_scale):
     return transformer_state_dict
-pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers", torch_dtype=torch.float16)
 pipe.to("cuda")
 from safetensors import safe_open

 ## Instructions
+ The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2).
  We provide two types of trained LoRA weights for you to test.
 cd HunyuanDiT
 # Use the huggingface-cli tool to download the model.
 huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
+# Quick start
+python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --no-enhance --load-key ema --lora-ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 ```
 ## Training
+We provide three types of weights for fine-tuning LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.
+Here is an example for LoRA with HunYuanDiT v1.2, we load the `distill` weights into the main model and perform LoRA fine-tuning through the `resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt` setting.
 If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.
+If you want to train LoRA with HunYuanDiT v1.1, you could add `--use-style-cond`, `--size-cond 1024 1024` and `--beta-end 0.03`.
 ```bash
+model='DiT-g/2'                                                   # model type
+task_flag="lora_porcelain_ema_rank64"                             # task flag
+resume_module_root=./ckpts/t2i/model/pytorch_model_distill.pt     # resume checkpoint
+index_file=dataset/porcelain/jsons/porcelain.json                 # the selected data indices
+results_dir=./log_EXP                                             # save root for results
+batch_size=1                                                      # training batch size
+image_size=1024                                                   # training image resolution
+grad_accu_steps=2                                                 # gradient accumulation steps
+warmup_num_steps=0                                                # warm-up steps
+lr=0.0001                                                         # learning rate
+ckpt_every=100                                                    # create a ckpt every a few steps.
+ckpt_latest_every=2000                                            # create a ckpt named `latest.pt` every a few steps.
+rank=64                                                           # rank of lora
+max_training_steps=2000                                           # Maximum training iteration steps
 PYTHONPATH=./ deepspeed hydit/train_deepspeed.py \
     --task-flag ${task_flag} \
     --model ${model} \
+    --training-parts lora \
     --rank ${rank} \
+    --resume \
+    --resume-module-root ${resume_module_root} \
     --lr ${lr} \
+    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
     --predict-type v_prediction \
+    --uncond-p 0 \
+    --uncond-p-t5 0 \
     --index-file ${index_file} \
     --random-flip \
     --batch-size ${batch_size} \
 ```shell
 # jade style
 # Using Flash Attention for acceleration.
 python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
 # You can disable the enhancement model if the GPU memory is insufficient.
 # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
+python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/jade --infer-mode fa
 # Start with English UI
+python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
 # porcelain style
 # Using Flash Attention for acceleration.
+python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 # You can disable the enhancement model if the GPU memory is insufficient.
 # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
+python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/porcelain --infer-mode fa
 # Start with English UI
+python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 ```
 # jade style
 # Prompt Enhancement + Text-to-Image. Torch mode
+python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
 # Only Text-to-Image. Torch mode
+python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
 # Generate an image with other image sizes.
+python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade --infer-mode fa
 # porcelain style
 # Prompt Enhancement + Text-to-Image. Torch mode
+python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 # Only Text-to-Image. Torch mode
+python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 # Generate an image with other image sizes.
+python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain --infer-mode fa
 ```
     return transformer_state_dict
+pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers", torch_dtype=torch.float16)
 pipe.to("cuda")
 from safetensors import safe_open