--- license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt language: - en --- # HunyuanDiT LoRA Language: **English** ## Instructions The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT). We provide two types of trained LoRA weights for you to test. Then download the model using the following commands: ```bash cd HunyuanDiT # Use the huggingface-cli tool to download the model. huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora ``` ## Training We provide three types of weights for fine-tuning HY-DiT LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights. Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter. If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter. If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter. ```bash model='DiT-g/2' # model type task_flag="lora_jade_ema_rank64" # task flag resume=./ckpts/t2i/model/ # resume checkpoint index_file=dataset/index_v2_json/jade.json # index file results_dir=./log_EXP # save root for results batch_size=1 # training batch size image_size=1024 # training image resolution grad_accu_steps=2 # gradient accumulation steps warmup_num_steps=0 # warm-up steps lr=0.0001 # learning rate ckpt_every=100 # create a ckpt every a few steps. ckpt_latest_every=2000 # create a ckpt named `latest.pt` every a few steps. rank=64 # rank of lora PYTHONPATH=./ deepspeed hydit/train_large_deepspeed.py \ --task-flag ${task_flag} \ --model ${model} \ --training_parts lora \ --rank ${rank} \ --resume-split \ --resume ${resume} \ --ema-to-module \ --lr ${lr} \ --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \ --predict-type v_prediction \ --uncond-p 0.44 \ --uncond-p-t5 0.44 \ --index-file ${index_file} \ --random-flip \ --batch-size ${batch_size} \ --image-size ${image_size} \ --global-seed 999 \ --grad-accu-steps ${grad_accu_steps} \ --warmup-num-steps ${warmup_num_steps} \ --use-flash-attn \ --use-fp16 \ --ema-dtype fp32 \ --results-dir ${results_dir} \ --ckpt-every ${ckpt_every} \ --max-training-steps ${max_training_steps}\ --ckpt-latest-every ${ckpt_latest_every} \ --log-every 10 \ --deepspeed \ --deepspeed-optimizer \ --use-zero-stage 2 \ --qk-norm \ --rope-img base512 \ --rope-real \ "$@" ``` Recommended parameter settings | Parameter | Description | Recommended Parameter Value | Note| |:---------------:|:---------:|:---------------------------------------------------:|:--:| | `--batch_size` | Training batch size | 1 | Depends on GPU memory| | `--grad-accu-steps` | Size of gradient accumulation | 2 | | | `--rank` | Rank of lora | 64 | 8-128 are all possible| | `--max-training-steps` | Training steps | 2000 | Varies with the amount of training data, about 2000 steps are enough for 100 images| | `--lr` | Learning rate | 0.0001 | | | ## Inference ### Using Gradio Make sure you have activated the conda environment before running the following command. > ⚠️ Important Reminder: > We recommend not using prompt enhance, as it may lead to the disappearance of style words. ```shell # porcelain style # By default, we start a Chinese UI. python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # Using Flash Attention for acceleration. python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # You can disable the enhancement model if the GPU memory is insufficient. # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag. python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # Start with English UI python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # jade style # By default, we start a Chinese UI. python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # Using Flash Attention for acceleration. python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # You can disable the enhancement model if the GPU memory is insufficient. # The enhancement will be unavailable until you restart the app without the `--no-enhance` flag. python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # Start with English UI python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain ``` ### Using Command Line We provide several commands to quick start: ```shell # porcelain style # Prompt Enhancement + Text-to-Image. Torch mode python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # Only Text-to-Image. Torch mode python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # Only Text-to-Image. Flash Attention mode python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # Generate an image with other image sizes. python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade # jade style # Prompt Enhancement + Text-to-Image. Torch mode python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # Only Text-to-Image. Torch mode python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # Only Text-to-Image. Flash Attention mode python sample_t2i.py --infer-mode fa --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain # Generate an image with other image sizes. python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain ``` More example prompts can be found in [example_prompts.txt](example_prompts.txt)