File size: 6,137 Bytes
2e7bc51 0792c6b 2e7bc51 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
<a href="https://arxiv.org/pdf/2302.13848.pdf"><img src="https://img.shields.io/badge/arXiv-2302.13848-b31b1b.svg" height=22.5></a>
<a href="https://huggingface.co/spaces/ELITE-library/ELITE"><img src="https://img.shields.io/static/v1?label=HuggingFace&message=gradio demo&color=darkgreen" height=22.5></a>
## Getting Started
----
### Environment Setup
```shell
git clone https://github.com/csyxwei/ELITE.git
cd ELITE
conda create -n elite python=3.9
conda activate elite
pip install -r requirements.txt
```
### Pretrained Models
We provide the pretrained checkpoints in [Google Drive](https://drive.google.com/drive/folders/1VkiVZzA_i9gbfuzvHaLH2VYh7kOTzE0x?usp=sharing). One can download them and save to the directory `checkpoints`.
### Setting up Diffusers
Our code is built on the [diffusers](https://github.com/huggingface/diffusers/), and you can follow the guideline [here](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion#cat-toy-example) to set it.
### Customized Generation
We provide the testing dataset in [test_datasets](./test_datasets), which contains both images and object masks. For testing, you can run,
```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR='./test_datasets/'
CUDA_VISIBLE_DEVICES=0 python inference_local.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--test_data_dir=$DATA_DIR \
--output_dir="./outputs/local_mapping" \
--suffix="object" \
--template="a photo of a S" \
--llambda="0.8" \
--global_mapper_path="./checkpoints/global_mapper.pt" \
--local_mapper_path="./checkpoints/local_mapper.pt"
```
or you can use the shell script:
```
bash inference_local.sh
```
If you want to test your customized dataset, you should align the image to ensure the object is at the center of image, and also provide the corresponding object mask. The object mask can be obtained by [image-matting-app](https://huggingface.co/spaces/SankarSrin/image-matting-app), or other image matting methods.
## Training
----
### Preparing Dataset
We use the **test** dataset of Open-Images V6 to train our ELITE. You can prepare the dataset as follows:
- Download Open-Images test dataset from [CVDF's site](https://github.com/cvdfoundation/open-images-dataset#download-images-with-bounding-boxes-annotations) and unzip it to the directory `datasets/Open_Images/images/test`.
- Download attribute names file `oidv6-attributes-description.csv` of Open-Images test dataset from [Open-Images official site](https://storage.googleapis.com/openimages/web/download_v7.html#download-manually) and save it to the directory `datasets/Open_Images/annotations/`.
- Download bbox annotations file `test-annotations-bbox.csv` of Open-Images test dataset from [Open-Images official site](https://storage.googleapis.com/openimages/web/download_v7.html#download-manually) and save it to the directory `datasets/Open_Images/annotations/`.
- Download segmentation annotations of Open-Images test dataset from [Open-Images official site](https://storage.googleapis.com/openimages/web/download_v7.html#download-manually) and unzip them to the directory `datasets/Open_Images/segs/test`. And put the `test-annotations-object-segmentation.csv` into `datasets/Open_Images/annotations/`.
- Obtain the mask bbox by running the following command:
```shell
python data_scripts/cal_bbox_by_seg.py
```
The final data structure is like this:
```
datasets
βββ Open_Images
β βββ annotations
β β βββ oidv6-class-descriptions.csv
β β βββ test-annotations-object-segmentation.csv
β β βββ test-annotations-bbox.csv
β βββ images
β β βββ test
β β β βββ xxx.jpg
β β β βββ ...
β βββ segs
β β βββ test
β β β βββ xxx.png
β β β βββ ...
β β βββ test_bbox_dict.npy
```
### Training Global Mapping Network
To train the global mapping network, run the following command:
```Shell
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR='./datasets/Open_Images/'
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file 4_gpu.json --main_process_port 25656 train_global.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--placeholder_token="S" \
--resolution=512 \
--train_batch_size=4 \
--gradient_accumulation_steps=4 \
--max_train_steps=200000 \
--learning_rate=1e-06 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="./elite_experiments/global_mapping" \
--save_steps 200
```
or you can use the shell script:
```shell
bash train_global.sh
```
### Training Local Mapping Network
After the global mapping is trained, you can train the local mapping by running the following command:
```Shell
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATA_DIR='/home/weiyuxiang/datasets/Open_Images/'
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file 4_gpu.json --main_process_port 25657 train_local.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--placeholder_token="S" \
--resolution=512 \
--train_batch_size=2 \
--gradient_accumulation_steps=4 \
--max_train_steps=200000 \
--learning_rate=1e-5 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--global_mapper_path "./elite_experiments/global_mapping/mapper_070000.pt" \
--output_dir="./elite_experiments/local_mapping" \
--save_steps 200
```
or you can use the shell script:
```shell
bash train_local.sh
```
## Citation
```
@article{wei2023elite,
title={ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation},
author={Wei, Yuxiang and Zhang, Yabo and Ji, Zhilong and Bai, Jinfeng and Zhang, Lei and Zuo, Wangmeng},
journal={arXiv preprint arXiv:2302.13848},
year={2023}
}
```
## Acknowledgements
This code is built on [diffusers](https://github.com/huggingface/diffusers/). We thank the authors for sharing the codes. |