🧩 TokenCompose SD14 Model Card
🎬CVPR 2024
TokenCompose_SD14_A is a latent text-to-image diffusion model finetuned from the Stable-Diffusion-v1-4 checkpoint at resolution 512x512 on the VSR split of COCO image-caption pairs for 24,000 steps with a learning rate of 5e-6. The training objective involves token-level grounding terms in addition to denoising loss for enhanced multi-category instance composition and photorealism. The "_A/B" postfix indicates different finetuning runs of the model using the same above configurations.
📄 Paper
Please follow this link.
🧨Example Usage
We strongly recommend using the 🤗Diffuser library to run our model.
import torch
from diffusers import StableDiffusionPipeline
model_id = "mlpc-lab/TokenCompose_SD14_A"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
pipe = pipe.to(device)
prompt = "A cat and a wine glass"
image = pipe(prompt).images[0]
image.save("cat_and_wine_glass.png")
⬆️Improvements over SD14
Method | Multi-category Instance Composition | Photorealism | Efficiency | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Object Accuracy | COCO | ADE20K | FID (COCO) | FID (Flickr30K) | Latency | |||||||
MG2 | MG3 | MG4 | MG5 | MG2 | MG3 | MG4 | MG5 | |||||
SD 1.4 | 29.86 | 90.721.33 | 50.740.89 | 11.680.45 | 0.880.21 | 89.810.40 | 53.961.14 | 16.521.13 | 1.890.34 | 20.88 | 71.46 | 7.540.17 |
TokenCompose (Ours) | 52.15 | 98.080.40 | 76.161.04 | 28.810.95 | 3.280.48 | 97.750.34 | 76.931.09 | 33.921.47 | 6.210.62 | 20.19 | 71.13 | 7.560.14 |
📰 Citation
@InProceedings{Wang2024TokenCompose,
author = {Wang, Zirui and Sha, Zhizhou and Ding, Zheng and Wang, Yilin and Tu, Zhuowen},
title = {TokenCompose: Text-to-Image Diffusion with Token-level Supervision},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {8553-8564}
}
- Downloads last month
- 83
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.