gryan commited on
Commit
628b5d3
·
verified ·
1 Parent(s): c0e85b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -7,9 +7,68 @@ tags:
7
  - image-to-image
8
  ---
9
 
10
- > [!NOTE]
11
- > This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). See the original model card for more info.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
14
  <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
15
- <img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - image-to-image
8
  ---
9
 
 
 
10
 
11
+ This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1).
12
+
13
+ See the [original model card](https://huggingface.co/Shitao/OmniGen-v1) for more info.
14
+
15
+ ## Usage
16
+ Set up your environment by following the original [Quick Start Guide](https://huggingface.co/Shitao/OmniGen-v1#5-quick-start) before getting started.
17
+
18
+ > [!IMPORTANT]
19
+ > NOTE: This feature is not offically supported yet. You'll need to install the repo from [this pull request](https://github.com/VectorSpaceLab/OmniGen/pull/151).
20
+
21
+ ```python
22
+ from OmniGen import OmniGenPipeline, OmniGen
23
+
24
+ # pass the quantized model in the pipeline
25
+ model = OmniGen.from_pretrained('gryan/OmniGen-v1-bnb-nf4')
26
+ pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", model=model)
27
+
28
+ # proceed as normal!
29
+
30
+ ## Text to Image
31
+ images = pipe(
32
+ prompt="A curly-haired man in a red shirt is drinking tea.",
33
+ height=1024,
34
+ width=1024,
35
+ guidance_scale=2.5,
36
+ seed=0,
37
+ )
38
+ images[0].save("example_t2i.png") # save output PIL Image
39
+
40
+ ## Multi-modal to Image
41
+ # In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
42
+ # You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
43
+ images = pipe(
44
+ prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
45
+ input_images=["./imgs/test_cases/two_man.jpg"],
46
+ height=1024,
47
+ width=1024,
48
+ guidance_scale=2.5,
49
+ img_guidance_scale=1.6,
50
+ seed=0
51
+ )
52
+ images[0].save("example_ti2i.png") # save output PIL image
53
+ ```
54
+
55
+ ## Image Comparisons
56
  <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
57
  <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
58
+ <img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
59
+
60
+ ## Performance
61
+ For 4bit-nf4 quantized model on RTX 3090 GPU(24G):
62
+ | Settings | Only Text | Text + Single Image | Text + Two Images |
63
+ |:-------------|:----------:|:-------------------:|:---------------------:|
64
+ | use_kv_cache=False | 6.8G, 1m16s | 7.2G, 3m30s | 7.7G, 5m47s |
65
+ | use_kv_cache | 9.9G, 1m14s | 20.4G†, 8m5s | OOM (36.7G†, >1h10m) |
66
+ | use_kv_cache,offload_kv_cache | 6.8G, 1m16s | 7.2G, 2m49s | 8.4G, 4m3s |
67
+ | use_kv_cache,offload_kv_cache,separate_cfg_infer | 6.8G, 1m20s | 7.0G, 2m31s | 7.4G, 3m31s |
68
+ | use_kv_cache,offload_kv_cache,offload_model* | 5.0G, 1m35s | 6.0G, 3m7s | 8.0G, 4m21s |
69
+ | use_kv_cache,offload_kv_cache,separate_cfg_infer,offload_model* | 5.0G, 1m58s | 5.3G, 3m29s | 5.6G, 4m19s |
70
+
71
+ - † - memory_reserved > 24gb, RAM spillover
72
+ - \* - only VAE offload. Model loaded in 4bit cannot be offloaded.
73
+
74
+ See original [inference settings table](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources) for bfloat16 performance.