Update README.md
Browse files
README.md
CHANGED
@@ -7,9 +7,68 @@ tags:
|
|
7 |
- image-to-image
|
8 |
---
|
9 |
|
10 |
-
> [!NOTE]
|
11 |
-
> This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). See the original model card for more info.
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
<img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
|
14 |
<img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
|
15 |
-
<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- image-to-image
|
8 |
---
|
9 |
|
|
|
|
|
10 |
|
11 |
+
This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1).
|
12 |
+
|
13 |
+
See the [original model card](https://huggingface.co/Shitao/OmniGen-v1) for more info.
|
14 |
+
|
15 |
+
## Usage
|
16 |
+
Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started.
|
17 |
+
|
18 |
+
> [!IMPORTANT]
|
19 |
+
> NOTE: This feature is not offically supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151).
|
20 |
+
|
21 |
+
```python
|
22 |
+
from OmniGen import OmniGenPipeline, OmniGen
|
23 |
+
|
24 |
+
# pass the quantized model in the pipeline
|
25 |
+
model = OmniGen.from_pretrained('gryan/OmniGen-v1-bnb-nf4')
|
26 |
+
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model)
|
27 |
+
|
28 |
+
# proceed as normal!
|
29 |
+
|
30 |
+
## Text to Image
|
31 |
+
images = pipe(
|
32 |
+
prompt="A curly-haired man in a red shirt is drinking tea.",
|
33 |
+
height=1024,
|
34 |
+
width=1024,
|
35 |
+
guidance_scale=2.5,
|
36 |
+
seed=0,
|
37 |
+
)
|
38 |
+
images[0].save("example_t2i.png") # save output PIL Image
|
39 |
+
|
40 |
+
## Multi-modal to Image
|
41 |
+
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
|
42 |
+
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
|
43 |
+
images = pipe(
|
44 |
+
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
|
45 |
+
input_images=["./imgs/test_cases/two_man.jpg"],
|
46 |
+
height=1024,
|
47 |
+
width=1024,
|
48 |
+
guidance_scale=2.5,
|
49 |
+
img_guidance_scale=1.6,
|
50 |
+
seed=0
|
51 |
+
)
|
52 |
+
images[0].save("example_ti2i.png") # save output PIL image
|
53 |
+
```
|
54 |
+
|
55 |
+
## Image Comparisons
|
56 |
<img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
|
57 |
<img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
|
58 |
+
<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
|
59 |
+
|
60 |
+
## Performance
|
61 |
+
For 4bit-nf4 quantized model on RTX 3090 GPU(24G):
|
62 |
+
| Settings | Only Text | Text + Single Image | Text + Two Images |
|
63 |
+
|:-------------|:----------:|:-------------------:|:---------------------:|
|
64 |
+
| use_kv_cache=False | 6.8G, 1m16s | 7.2G, 3m30s | 7.7G, 5m47s |
|
65 |
+
| use_kv_cache | 9.9G, 1m14s | 20.4G†, 8m5s | OOM (36.7G†, >1h10m) |
|
66 |
+
| use_kv_cache,offload_kv_cache | 6.8G, 1m16s | 7.2G, 2m49s | 8.4G, 4m3s |
|
67 |
+
| use_kv_cache,offload_kv_cache,separate_cfg_infer | 6.8G, 1m20s | 7.0G, 2m31s | 7.4G, 3m31s |
|
68 |
+
| use_kv_cache,offload_kv_cache,offload_model* | 5.0G, 1m35s | 6.0G, 3m7s | 8.0G, 4m21s |
|
69 |
+
| use_kv_cache,offload_kv_cache,separate_cfg_infer,offload_model* | 5.0G, 1m58s | 5.3G, 3m29s | 5.6G, 4m19s |
|
70 |
+
|
71 |
+
- † - memory_reserved > 24gb, RAM spillover
|
72 |
+
- \* - only VAE offload. Model loaded in 4bit cannot be offloaded.
|
73 |
+
|
74 |
+
See original [inference settings table](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources) for bfloat16 performance.
|