gryan
/

OmniGen-v1-bnb-4bit

@@ -7,9 +7,68 @@ tags:
 - image-to-image
 ---
-> [!NOTE]
-> This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). See the original model card for more info.
 <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
 <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
-<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">

 - image-to-image
 ---
+This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1).
+See the [original model card](https://huggingface.co/Shitao/OmniGen-v1) for more info.
+## Usage
+Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started.
+> [!IMPORTANT]
+> NOTE: This feature is not offically supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151).
+```python
+from OmniGen import OmniGenPipeline, OmniGen
+# pass the quantized model in the pipeline
+model = OmniGen.from_pretrained('gryan/OmniGen-v1-bnb-nf4')
+pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model)
+# proceed as normal!
+## Text to Image
+images = pipe(
+    prompt="A curly-haired man in a red shirt is drinking tea.",
+    height=1024,
+    width=1024,
+    guidance_scale=2.5,
+    seed=0,
+)
+images[0].save("example_t2i.png")  # save output PIL Image
+## Multi-modal to Image
+# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
+# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
+images = pipe(
+    prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
+    input_images=["./imgs/test_cases/two_man.jpg"],
+    height=1024,
+    width=1024,
+    guidance_scale=2.5,
+    img_guidance_scale=1.6,
+    seed=0
+)
+images[0].save("example_ti2i.png")  # save output PIL image
+```
+## Image Comparisons
 <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
 <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
+<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
+## Performance
+For 4bit-nf4 quantized model on RTX 3090 GPU(24G):
+| Settings     |  Only Text | Text + Single Image |  Text + Two Images    |
+|:-------------|:----------:|:-------------------:|:---------------------:|
+| use_kv_cache=False                                              | 6.8G, 1m16s  | 7.2G, 3m30s   | 7.7G, 5m47s   |
+| use_kv_cache                                                    | 9.9G, 1m14s  | 20.4G†, 8m5s  | OOM (36.7G†, >1h10m)  |
+| use_kv_cache,offload_kv_cache                                   | 6.8G, 1m16s  | 7.2G, 2m49s   | 8.4G, 4m3s  |
+| use_kv_cache,offload_kv_cache,separate_cfg_infer                | 6.8G, 1m20s  | 7.0G, 2m31s   | 7.4G, 3m31s  |
+| use_kv_cache,offload_kv_cache,offload_model*                    | 5.0G, 1m35s  | 6.0G, 3m7s    | 8.0G, 4m21s  |
+| use_kv_cache,offload_kv_cache,separate_cfg_infer,offload_model* | 5.0G, 1m58s  | 5.3G, 3m29s   | 5.6G, 4m19s |
+- † - memory_reserved > 24gb, RAM spillover
+- \* - only VAE offload. Model loaded in 4bit cannot be offloaded.
+See original [inference settings table](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources) for bfloat16 performance.