Add pipeline tag and library name to model card
#1
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,3 +1,144 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: image-to-image
|
4 |
+
library_name: diffusers
|
5 |
+
---
|
6 |
+
|
7 |
+
# PhotoDoodle
|
8 |
+
|
9 |
+
> **PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data**
|
10 |
+
> <br>
|
11 |
+
> [Huang Shijie](https://scholar.google.com/citations?user=HmqYYosAAAAJ),
|
12 |
+
> [Yiren Song](https://scholar.google.com.hk/citations?user=L2YS0jgAAAAJ),
|
13 |
+
> [Yuxuan Zhang](https://xiaojiu-z.github.io/YuxuanZhang.github.io/),
|
14 |
+
> [Hailong Guo](https://github.com/logn-2024),
|
15 |
+
> Xueyin Wang,
|
16 |
+
> and
|
17 |
+
> [Mike Zheng Shou](https://sites.google.com/view/showlab),
|
18 |
+
> [Liu Jiaming](https://scholar.google.com/citations?user=SmL7oMQAAAAJ&hl=en)
|
19 |
+
> <br>
|
20 |
+
> [Show Lab](https://sites.google.com/view/showlab), National University of Singapore
|
21 |
+
> <br>
|
22 |
+
|
23 |
+
<a href="https://arxiv.org/abs/2502.14397"><img src="https://img.shields.io/badge/ariXv-2502.14397-A42C25.svg" alt="arXiv"></a>
|
24 |
+
<a href="https://huggingface.co/nicolaus-huang/PhotoDoodle"><img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"></a>
|
25 |
+
<a href="https://huggingface.co/datasets/nicolaus-huang/PhotoDoodle/"><img src="https://img.shields.io/badge/🤗_HuggingFace-Dataset-ffbd45.svg" alt="HuggingFace"></a>
|
26 |
+
|
27 |
+
<br>
|
28 |
+
|
29 |
+
<img src='./assets/teaser.png' width='100%' />
|
30 |
+
|
31 |
+
|
32 |
+
## Quick Start
|
33 |
+
### Configuration
|
34 |
+
#### 1. **Environment setup**
|
35 |
+
```bash
|
36 |
+
git clone [email protected]:showlab/PhotoDoodle.git
|
37 |
+
cd PhotoDoodle
|
38 |
+
|
39 |
+
conda create -n doodle python=3.11.10
|
40 |
+
conda activate doodle
|
41 |
+
```
|
42 |
+
#### 2. **Requirements installation**
|
43 |
+
```bash
|
44 |
+
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
|
45 |
+
pip install --upgrade -r requirements.txt
|
46 |
+
```
|
47 |
+
|
48 |
+
|
49 |
+
### 2. Inference
|
50 |
+
We provided the intergration of diffusers pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:
|
51 |
+
|
52 |
+
```bash
|
53 |
+
from src.pipeline_pe_clone import FluxPipeline
|
54 |
+
import torch
|
55 |
+
from PIL import Image
|
56 |
+
|
57 |
+
pretrained_model_name_or_path = "black-forest-labs/FLUX.1-dev"
|
58 |
+
pipeline = FluxPipeline.from_pretrained(
|
59 |
+
pretrained_model_name_or_path,
|
60 |
+
torch_dtype=torch.bfloat16,
|
61 |
+
).to('cuda')
|
62 |
+
|
63 |
+
pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle", weight_name="pretrain.safetensors")
|
64 |
+
pipeline.fuse_lora()
|
65 |
+
pipeline.unload_lora_weights()
|
66 |
+
|
67 |
+
pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle", weight_name="sksmagiceffects.safetensors")
|
68 |
+
|
69 |
+
height=768
|
70 |
+
width=512
|
71 |
+
|
72 |
+
validation_image = "assets/1.png"
|
73 |
+
validation_prompt = "add a halo and wings for the cat by sksmagiceffects"
|
74 |
+
condition_image = Image.open(validation_image).resize((height, width)).convert("RGB")
|
75 |
+
|
76 |
+
result = pipeline(prompt=validation_prompt,
|
77 |
+
condition_image=condition_image,
|
78 |
+
height=height,
|
79 |
+
width=width,
|
80 |
+
guidance_scale=3.5,
|
81 |
+
num_inference_steps=20,
|
82 |
+
max_sequence_length=512).images[0]
|
83 |
+
|
84 |
+
result.save("output.png")
|
85 |
+
```
|
86 |
+
|
87 |
+
or simply run the inference script:
|
88 |
+
```
|
89 |
+
python inference.py
|
90 |
+
```
|
91 |
+
|
92 |
+
|
93 |
+
|
94 |
+
### 3. Weights
|
95 |
+
You can download the trained checkpoints of PhotoDoodle for inference. Below are the details of available models, checkpoint name are also trigger words.
|
96 |
+
|
97 |
+
You would need to load and fuse the `pretrained ` checkpoints model in order to load the other models.
|
98 |
+
|
99 |
+
| **Model** | **Description** | **Resolution** |
|
100 |
+
| :----------------------------------------------------------: | :---------------------------------------------------------: | :------------: |
|
101 |
+
| [pretrained](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/pretrain.safetensors) | PhotoDoodle model trained on `SeedEdit` dataset | 768, 768 |
|
102 |
+
| [sksmonstercalledlulu](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksmonstercalledlulu.safetensors) | PhotoDoodle model trained on `Cartoon monster` dataset | 768, 512 |
|
103 |
+
| [sksmagiceffects](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksmagiceffects.safetensors) | PhotoDoodle model trained on `3D effects` dataset | 768, 512 |
|
104 |
+
| [skspaintingeffects ](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/skspaintingeffects.safetensors) | PhotoDoodle model trained on `Flowing color blocks` dataset | 768, 512 |
|
105 |
+
| [sksedgeeffect ](https://huggingface.co/nicolaus-huang/PhotoDoodle/blob/main/sksedgeeffect.safetensors) | PhotoDoodle model trained on `Hand-drawn outline` dataset | 768, 512 |
|
106 |
+
|
107 |
+
|
108 |
+
### 4. Dataset
|
109 |
+
<span id="dataset_setting"></span>
|
110 |
+
#### 2.1 Settings for dataset
|
111 |
+
The training process uses a paired dataset stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes the source image path, the target (modified) image path, and a caption describing the modification.
|
112 |
+
|
113 |
+
Example format:
|
114 |
+
|
115 |
+
```json
|
116 |
+
{"source": "path/to/source.jpg", "target": "path/to/modified.jpg", "caption": "Instruction of modifications"}
|
117 |
+
{"source": "path/to/source2.jpg", "target": "path/to/modified2.jpg", "caption": "Another instruction"}
|
118 |
+
```
|
119 |
+
|
120 |
+
We have uploaded our datasets to [Hugging Face](https://huggingface.co/datasets/nicolaus-huang/PhotoDoodle).
|
121 |
+
|
122 |
+
|
123 |
+
### 5. Results
|
124 |
+
|
125 |
+

|
126 |
+
|
127 |
+
|
128 |
+
### 6. Acknowledgments
|
129 |
+
|
130 |
+
1. Thanks to **[Yuxuan Zhang](https://xiaojiu-z.github.io/YuxuanZhang.github.io/)** and **[Hailong Guo](mailto:[email protected])** for providing the code base.
|
131 |
+
2. Thanks to **[Diffusers](https://github.com/huggingface/diffusers)** for the open-source project.
|
132 |
+
|
133 |
+
## Citation
|
134 |
+
```
|
135 |
+
@misc{huang2025photodoodlelearningartisticimage,
|
136 |
+
title={PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data},
|
137 |
+
author={Shijie Huang and Yiren Song and Yuxuan Zhang and Hailong Guo and Xueyin Wang and Mike Zheng Shou and Jiaming Liu},
|
138 |
+
year={2025},
|
139 |
+
eprint={2502.14397},
|
140 |
+
archivePrefix={arXiv},
|
141 |
+
primaryClass={cs.CV},
|
142 |
+
url={https://arxiv.org/abs/2502.14397},
|
143 |
+
}
|
144 |
+
```
|