|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- instancediffusion |
|
- layout-to-image |
|
library_name: diffusers |
|
--- |
|
|
|
# Diffusers 🧨 port of [InstanceDiffusion: Instance-level Control for Image Generation (CVPR 2024)](https://arxiv.org/abs/2402.03290) |
|
|
|
- Original authors: Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra |
|
- Original github repo by authors: https://github.com/frank-xwang/InstanceDiffusion |
|
- Converted to Diffusers: Kyeongryeol Go |
|
|
|
# Checkpoint |
|
- original checkpoint: https://huggingface.co/xudongw/InstanceDiffusion/resolve/main/instancediffusion_sd15.pth |
|
- original configuration yaml: https://github.com/frank-xwang/InstanceDiffusion/blob/main/configs/test_sd15.yaml |
|
|
|
|
|
# Install |
|
|
|
StableDiffusionINSTDIFFPipeline is yet merged into diffusers. Please refer to the forked version. |
|
|
|
```bash |
|
git clone -b instancediffusion https://github.com/gokyeongryeol/diffusers.git |
|
cd diffusers & pip install -e . |
|
``` |
|
|
|
# Example Usage |
|
|
|
```python |
|
import torch |
|
from diffusers import StableDiffusionINSTDIFFPipeline |
|
|
|
pipe = StableDiffusionINSTDIFFPipeline.from_pretrained( |
|
"kyeongry/instancediffusion_sd15", |
|
# variant="fp16", torch_dtype=torch.float16, |
|
) |
|
pipe = pipe.to("cuda") |
|
|
|
prompt = "a yellow American robin, brown Maltipoo dog, a gray British Shorthair in a stream, alongside with trees and rocks" |
|
negative_prompt = "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality" |
|
|
|
# normalized (xmin,ymin,xmax,ymax) |
|
boxes = [ |
|
[0.0, 0.099609375, 0.349609375, 0.548828125], |
|
[0.349609375, 0.19921875, 0.6484375, 0.498046875], |
|
[0.6484375, 0.19921875, 0.998046875, 0.697265625], |
|
[0.0, 0.69921875, 1.0, 0.998046875], |
|
] |
|
phrases = [ |
|
"a gray British Shorthair standing on a rock in the woods", |
|
"a yellow American robin standing on the rock", |
|
"a brown Maltipoo dog standing on the rock", |
|
"a close up of a small waterfall in the woods", |
|
] |
|
|
|
image = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
instdiff_phrases=phrases, |
|
instdiff_boxes=boxes, |
|
instdiff_scheduled_sampling_alpha=0.8, # proportion of using gated-self-attention |
|
instdiff_scheduled_sampling_beta=0.36, # proportion of using multi-instance sampler |
|
guidance_scale=7.5, |
|
output_type="pil", |
|
num_inference_steps=50, |
|
).images[0] |
|
|
|
image.save("./instancediffusion-sd15-layout2image-generation.jpg") |
|
``` |
|
|
|
# Sample Output |
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/640f071006c3b5ca883ea2d6/G1YVfIhmr0OABbzmPAc91.jpeg) |
|
|
|
|