File size: 2,767 Bytes
b3959a8
 
aa66e55
 
 
b3959a8
 
 
aa66e55
b3959a8
aa66e55
 
 
b3959a8
aa66e55
 
 
 
 
b3959a8
aa66e55
 
 
 
 
 
 
 
 
b3959a8
 
aa66e55
b3959a8
aa66e55
b3959a8
 
aa66e55
 
 
b3959a8
aa66e55
 
 
 
 
b3959a8
aa66e55
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
library_name: diffusers
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
- etri-vilab/koala-700m
---


# 🍰 Hybrid-sd-xl-700m for Stable Diffusion XL

[Hybrid-sd-xl-700m](https://huggingface.co/cqyan/hybrid-sd-xl-700m) is a pruned-finetuned version UNet which is based on SDXL1.0 and Koala-700M. 
Compared to 2560M parameter size of SDXL Unet, our UNet is 3.2x smaller, 2.4x faster, as well as better-performed in generating more realistic photographic image, displaying more details and rich variety of backgroud. We used a few training tricks and resourceful dataset to finetune the model, thus ended up with higher image aesthetics and image authenticity. Hybrid-sd-xl-700m is useful for real-time previewing of the SDXL generation process, and you are very welcome to try it !!!!!! 
<br>

**Index table**
| Model | Params (M) | UNet 1-step inference time (ms) | GPU Memory Usage (MiB) | 
|----|----|----|----|
| SDXL | 2560 | 448.47 | 18431 |
| **Hybrid-sd-xl-700m**| **780 ↓** | **185.93 ↓** | **10651 ↓** |

T2I Comparison using one A100 GPU, The image order from left to right :  SDXL1.0  ->  Koala-700M  ->  Hybrid-sd-xl-700m
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/HkKRO9asKInTMF5eZ3sO7.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/O-M8PTI0QDyazN93FVtQV.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/6TEEYH-jegsWfSgyUr56u.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/q4ovAxeYy420eVd5FnD7O.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/IFDIuHenYHEANIad7br8y.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/64vrWfTBEFIWilT9V-0uq.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/2sMuufgomfBdsM-6xcBtA.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/664afcc45fdb7108205a15c3/uWJWWfPT4_34zAtFhYJqE.png)


This repo contains `.safetensors` versions of the Hybrid-sd-xl-700m weights.

## Using in 🧨 diffusers


```python
import torch
from diffusers import StableDiffusionXLPipeline,UNet2DConditionModel

unet = UNet2DConditionModel.from_pretrained('cqyan/hybrid-sd-xl-700m')
pipe = StableDiffusionXLPipeline.from_pretrained(
        'stabilityai/stable-diffusion-xl-base-1.0',
        unet = unet,
        torch_dtype=torch.float16)

prompt = "full body, cat dressed as a Viking, with weapon in his paws, battle coloring, glow hyper-detail, hyper-realism, cinematic"
image = pipe(prompt, num_inference_steps=25, guidance_scale=7).images[0]
image.save("cat.png")
```