🍰 Hybrid-sd-small-vae-xl for Stable Diffusion XL

Hybrid-sd-small-vae-xl is a pruned-finetuned version VAE which uses the same "latent API" as the base model SDXL-VAE. It has smaller size, faster inference speed, as well as well-performed image generation in image saturation and image clarity compared to SDXL. Specifically，we decreses parameters from original 83.65M to 62.395M, inferece time from 1802.60ms to 611.78ms, roughly save up to 43.7% memory usage (31023MiB -> 17469MiB) without lossing T2I generation quality. The model is useful for real-time previewing of the SDXL generation process, and you are very welcome to try it !!!!!!

Index Table

Model	Params (M)	Decoder inference time (ms)	Decoder GPU Memory Usage (MiB)
SDXL	83.65	1802.60	31023
Hybrid-sd-small-vae-xl	62.395 ↓	611.78 ↓	17469 ↓

T2I Comparison using one A100 GPU, The image order from left to right : SDXL-VAE -> Hybrid-sd-small-vae-xl

This repo contains .safetensors versions of the Hybrid-sd-small-vae-xl weights. For SD1.x, use Hybrid-sd-small-vae instead (the SD and SDXL VAEs are incompatible).

Using in 🧨 diffusers

Firstly download our repository to load the AutoencoderKL

git clone https://github.com/bytedance/Hybrid-SD/tree/main

from bytenn_autoencoder_kl import AutoencoderKL
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
)

vae = AutoencoderKL.from_pretrained('cqyan/hybrid-sd-small-vae-xl', torch_dtype=torch.float16)
pipe.vae = vae
pipe = pipe.to("cuda")
prompt = "A warm and loving family portrait, highly detailed, hyper-realistic, 8k resolution, photorealistic, soft and natural lighting"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("family.png")

cqyan
/

hybrid-sd-small-vae-xl

🍰 Hybrid-sd-small-vae-xl for Stable Diffusion XL

Using in 🧨 diffusers

Model tree for cqyan/hybrid-sd-small-vae-xl