File size: 2,755 Bytes
ead9475
 
 
 
2e44bd2
ead9475
70ee66c
a0edead
 
 
 
 
 
ead9475
 
 
 
 
f584ce5
 
 
 
 
 
 
 
 
ead9475
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70ee66c
 
ead9475
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: apache-2.0
---

# IterComp(ICLR 2025)

Official Repository of the paper: *[IterComp](https://arxiv.org/abs/2410.07171)*.
<p align="left">
  <a href='https://arxiv.org/abs/2410.07171'>
  <img src='https://img.shields.io/badge/Arxiv-2410.07171-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a> 
  <a href='https://github.com/YangLing0818/IterComp'>
    <img src='https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github&logoColor=white'></a> 
</p>

<img src="./itercomp.png" style="zoom:50%;" />

## News🔥🔥🔥

**[2025.02]** We open-source three composition-aware reward models in [HuggingFace Repo](https://huggingface.co/comin/IterComp/tree/main/reward_models), which can be used for preference learning and as **new image generation evaluators**.

**[2025.02]** We enhance IterComp-RPG with LLMs that possess the strongest reasoning capabilities, including [**DeepSeek-R1**](https://github.com/deepseek-ai/DeepSeek-R1), [**OpenAI o3-mini**](https://openai.com/index/openai-o3-mini/), and [**OpenAI o1**](https://openai.com/index/learning-to-reason-with-llms/) to achieve outstanding compositional image generation under complex prompts.

**[2025.01]** IterComp is accepted by ICLR 2025!!!

**[2024.10]** Checkpoints of base diffusion model are publicly available on [HuggingFace Repo](https://huggingface.co/comin/IterComp).

**[2024.10]** Our main code of IterComp is released.

## Introduction

IterComp is one of the new State-of-the-Art compositional generation methods. In this repository, we release the model training from  [SDXL Base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) .

## Text-to-Image Usage

```python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("comin/IterComp", torch_dtype=torch.float16, use_safetensors=True)
pipe.to("cuda")
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()

prompt = "An astronaut riding a green horse"
image = pipe(prompt=prompt).images[0]
image.save("output.png")
```

IterComp can **serve as a powerful backbone for various compositional generation methods**, such as [RPG](https://github.com/YangLing0818/RPG-DiffusionMaster) and [Omost](https://github.com/lllyasviel/Omost). We recommend integrating IterComp into these approaches to achieve more advanced compositional generation results.

## Citation

```
@article{zhang2024itercomp,
  title={IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation},
  author={Zhang, Xinchen and Yang, Ling and Li, Guohao and Cai, Yaqi and Xie, Jiake and  Tang, Yong and Yang, Yujiu and Wang, Mengdi and Cui, Bin},
  journal={arXiv preprint arXiv:2410.07171},
  year={2024}
}
```

##