File size: 3,621 Bytes
f41c186 5965218 f41c186 5965218 7b8da70 82ca67d 628b5d3 34dc9d8 628b5d3 7b8da70 628b5d3 5965218 628b5d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
license: mit
base_model:
- Shitao/OmniGen-v1
pipeline_tag: text-to-image
tags:
- image-to-image
---
This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). For info about OmniGen see the [original model card](https://huggingface.co/Shitao/OmniGen-v1).
- 8-bit weights: [gryan/OmniGen-v1-bnb-8bit](https://huggingface.co/gryan/OmniGen-v1-bnb-8bit)
- 4-bit (fp16, nf4) weights: [gryan/OmniGen-v1-fp16-bnb-4bit](https://huggingface.co/gryan/OmniGen-v1-fp16-bnb-4bit) -- for older GPUs (< Ampere/RTX 30xx) / Colab users.
## Usage
Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started.
> [!IMPORTANT]
> NOTE: This feature is not officially supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151).
```python
from OmniGen import OmniGenPipeline, OmniGen
# pass the quantized model in the pipeline
model = OmniGen.from_pretrained('gryan/OmniGen-v1-bnb-4bit')
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model)
# proceed as normal!
## Text to Image
images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=1024,
width=1024,
guidance_scale=2.5,
seed=0,
)
images[0].save("example_t2i.png") # save output PIL Image
## Multi-modal to Image
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
input_images=["./imgs/test_cases/two_man.jpg"],
height=1024,
width=1024,
guidance_scale=2.5,
img_guidance_scale=1.6,
seed=0
)
images[0].save("example_ti2i.png") # save output PIL image
```
## Image Comparisons
<img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
<img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
## Performance
For 4bit-nf4 quantized model on RTX 3090 GPU(24G):
| Settings | Only Text | Text + Single Image | Text + Two Images |
|:-------------|:----------:|:-------------------:|:---------------------:|
| use_kv_cache=False | 6.8G, 1m16s | 7.2G, 3m30s | 7.7G, 5m47s |
| use_kv_cache | 9.9G, 1m14s | 20.4G†, 8m5s | OOM (36.7G†, >1h10m) |
| use_kv_cache,offload_kv_cache | 6.8G, 1m16s | 7.2G, 2m49s | 8.4G, 4m3s |
| use_kv_cache,offload_kv_cache,separate_cfg_infer | 6.8G, 1m20s | 7.0G, 2m31s | 7.4G, 3m31s |
| use_kv_cache,offload_kv_cache,offload_model* | 5.0G, 1m35s | 6.0G, 3m7s | 8.0G, 4m21s |
| use_kv_cache,offload_kv_cache,separate_cfg_infer,offload_model* | 5.0G, 1m58s | 5.3G, 3m29s | 5.6G, 4m19s |
- † - memory_reserved > 24gb, RAM spillover
- \* - only VAE offload. Model loaded in 4bit cannot be offloaded.
See original [inference settings table](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources) for bfloat16 performance. |