Rypo commited on
Commit
c0e85b3
·
1 Parent(s): 5965218

remove vae, trim readme

Browse files
.gitattributes CHANGED
@@ -33,7 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
- demo_cases.png filter=lfs diff=lfs merge=lfs -text
37
  assets/text_only_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
38
  assets/single_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
39
  assets/double_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
36
  assets/text_only_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
37
  assets/single_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
38
  assets/double_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -8,219 +8,8 @@ tags:
8
  ---
9
 
10
  > [!NOTE]
11
- > This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1).
12
 
13
  <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
14
  <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
15
- <img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
16
-
17
- Original model card:
18
-
19
- ---
20
-
21
- <h1 align="center">OmniGen: Unified Image Generation</h1>
22
-
23
- More information please refer to our repo: https://github.com/VectorSpaceLab/OmniGen
24
-
25
- <p align="center">
26
- <a href="https://vectorspacelab.github.io/OmniGen/">
27
- <img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
28
- </a>
29
- <a href="https://arxiv.org/abs/2409.11340">
30
- <img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2409.11340-b31b1b.svg">
31
- </a>
32
- <a href="https://huggingface.co/spaces/Shitao/OmniGen">
33
- <img alt="License" src="https://img.shields.io/badge/HF%20Demo-🤗-lightblue">
34
- </a>
35
- <a href="https://huggingface.co/Shitao/OmniGen-v1">
36
- <img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
37
- </a>
38
- <a href="https://replicate.com/chenxwh/omnigen">
39
- <img alt="Build" src="https://replicate.com/chenxwh/omnigen/badge">
40
- </a>
41
- </p>
42
-
43
- <h4 align="center">
44
- <p>
45
- <a href=#1-news>News</a> |
46
- <a href=#3-methodology>Methodology</a> |
47
- <a href=#4-what-can-omnigen-do>Capabilities</a> |
48
- <a href=#5-quick-start>Quick Start</a> |
49
- <a href="#6-finetune">Finetune</a> |
50
- <a href="#license">License</a> |
51
- <a href="#citation">Citation</a>
52
- <p>
53
- </h4>
54
-
55
-
56
-
57
- ## 1. News
58
- - 2024-10-28: We release new version of inference code, optimizing the memory usage and time cost. You can refer to [docs/inference.md](docs/inference.md#requiremented-resources) for detailed information.
59
- - 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](docs/inference.md) Train: [docs/fine-tuning.md](docs/fine-tuning.md)
60
- - 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co/spaces/Shitao/OmniGen)
61
-
62
-
63
- ## 2. Overview
64
-
65
- OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
66
-
67
- Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
68
-
69
- Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.
70
-
71
- If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: [email protected], [email protected], [email protected]. We welcome any feedback to help us improve the model.
72
-
73
-
74
-
75
- ## 3. Methodology
76
-
77
- You can see details in our [paper](https://arxiv.org/abs/2409.11340).
78
-
79
-
80
-
81
- ## 4. What Can OmniGen do?
82
-
83
- OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen doesn't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according to the text prompt.**
84
- We showcase some examples in [inference.ipynb](inference.ipynb). And in [inference_demo.ipynb](inference_demo.ipynb), we show an interesting pipeline to generate and modify an image.
85
-
86
- You can control the image generation flexibly via OmniGen
87
- ![demo](demo_cases.png)
88
-
89
- If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
90
-
91
-
92
-
93
- ## 5. Quick Start
94
-
95
-
96
- ### Using OmniGen
97
- Install via Github:
98
- ```bash
99
- git clone https://github.com/staoxiao/OmniGen.git
100
- cd OmniGen
101
- pip install -e .
102
- ```
103
-
104
- You also can create a new environment to avoid conflicts:
105
- ```
106
- # Create a python 3.10.12 conda env (you could also use virtualenv)
107
- conda create -n omnigen python=3.10.12
108
- conda activate omnigen
109
-
110
- # Install pytorch with your CUDA version, e.g.
111
- pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
112
-
113
- git clone https://github.com/staoxiao/OmniGen.git
114
- cd OmniGen
115
- pip install -e .
116
- ```
117
-
118
- Here are some examples:
119
- ```python
120
- from OmniGen import OmniGenPipeline
121
-
122
- pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
123
- # Note: Your local model path is also acceptable, such as 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)', where all files in your_local_model_path should be organized as https://huggingface.co/Shitao/OmniGen-v1/tree/main
124
-
125
-
126
- ## Text to Image
127
- images = pipe(
128
- prompt="A curly-haired man in a red shirt is drinking tea.",
129
- height=1024,
130
- width=1024,
131
- guidance_scale=2.5,
132
- seed=0,
133
- )
134
- images[0].save("example_t2i.png") # save output PIL Image
135
-
136
- ## Multi-modal to Image
137
- # In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
138
- # You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
139
- images = pipe(
140
- prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
141
- input_images=["./imgs/test_cases/two_man.jpg"],
142
- height=1024,
143
- width=1024,
144
- guidance_scale=2.5,
145
- img_guidance_scale=1.6,
146
- seed=0
147
- )
148
- images[0].save("example_ti2i.png") # save output PIL image
149
- ```
150
- - If out of memory, you can set `offload_model=True`. If the inference time is too long when inputting multiple images, you can reduce the `max_input_image_size`. For the required resources and the method to run OmniGen efficiently, please refer to [docs/inference.md#requiremented-resources](docs/inference.md#requiremented-resources).
151
- - For more examples of image generation, you can refer to [inference.ipynb](inference.ipynb) and [inference_demo.ipynb](inference_demo.ipynb)
152
- - For more details about the argument in inference, please refer to [docs/inference.md](docs/inference.md).
153
-
154
-
155
- ### Using Diffusers
156
-
157
- Coming soon.
158
-
159
-
160
- ### Gradio Demo
161
-
162
- We construct an online demo in [Huggingface](https://huggingface.co/spaces/Shitao/OmniGen).
163
-
164
- For the local gradio demo, you need to install `pip install gradio spaces`, and then you can run:
165
- ```python
166
- pip install gradio spaces
167
- python app.py
168
- ```
169
-
170
- #### Use Google Colab
171
- To use with Google Colab, please use the following command:
172
-
173
- ```
174
- !git clone https://github.com/staoxiao/OmniGen.git
175
- %cd OmniGen
176
- !pip install -e .
177
- !pip install gradio spaces
178
- !python app.py --share
179
- ```
180
-
181
- ## 6. Finetune
182
- We provide a training script `train.py` to fine-tune OmniGen.
183
- Here is a toy example about LoRA finetune:
184
- ```bash
185
- accelerate launch --num_processes=1 train.py \
186
- --model_name_or_path Shitao/OmniGen-v1 \
187
- --batch_size_per_device 2 \
188
- --condition_dropout_prob 0.01 \
189
- --lr 1e-3 \
190
- --use_lora \
191
- --lora_rank 8 \
192
- --json_file ./toy_data/toy_subject_data.jsonl \
193
- --image_path ./toy_data/images \
194
- --max_input_length_limit 18000 \
195
- --keep_raw_resolution \
196
- --max_image_size 1024 \
197
- --gradient_accumulation_steps 1 \
198
- --ckpt_every 10 \
199
- --epochs 200 \
200
- --log_every 1 \
201
- --results_dir ./results/toy_finetune_lora
202
- ```
203
-
204
- Please refer to [docs/fine-tuning.md](docs/fine-tuning.md) for more details (e.g. full finetune).
205
-
206
- ### Contributors:
207
- Thank all our contributors for their efforts and warmly welcome new members to join in!
208
-
209
- <a href="https://github.com/VectorSpaceLab/OmniGen/graphs/contributors">
210
- <img src="https://contrib.rocks/image?repo=VectorSpaceLab/OmniGen" />
211
- </a>
212
-
213
- ## License
214
- This repo is licensed under the [MIT License](LICENSE).
215
-
216
-
217
- ## Citation
218
- If you find this repository useful, please consider giving a star ⭐ and citation
219
- ```
220
- @article{xiao2024omnigen,
221
- title={Omnigen: Unified image generation},
222
- author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
223
- journal={arXiv preprint arXiv:2409.11340},
224
- year={2024}
225
- }
226
- ```
 
8
  ---
9
 
10
  > [!NOTE]
11
+ > This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). See the original model card for more info.
12
 
13
  <img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
14
  <img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
15
+ <img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
demo_cases.png DELETED

Git LFS Details

  • SHA256: 0517c97c947f8226f0f39b4ca2ac61b058e52faa59ec5085668062d0162dd21e
  • Pointer size: 132 Bytes
  • Size of remote file: 3.42 MB
vae/config.json DELETED
@@ -1,31 +0,0 @@
1
- {
2
- "_class_name": "AutoencoderKL",
3
- "_diffusers_version": "0.18.0.dev0",
4
- "_name_or_path": ".",
5
- "act_fn": "silu",
6
- "block_out_channels": [
7
- 128,
8
- 256,
9
- 512,
10
- 512
11
- ],
12
- "down_block_types": [
13
- "DownEncoderBlock2D",
14
- "DownEncoderBlock2D",
15
- "DownEncoderBlock2D",
16
- "DownEncoderBlock2D"
17
- ],
18
- "in_channels": 3,
19
- "latent_channels": 4,
20
- "layers_per_block": 2,
21
- "norm_num_groups": 32,
22
- "out_channels": 3,
23
- "sample_size": 1024,
24
- "scaling_factor": 0.13025,
25
- "up_block_types": [
26
- "UpDecoderBlock2D",
27
- "UpDecoderBlock2D",
28
- "UpDecoderBlock2D",
29
- "UpDecoderBlock2D"
30
- ]
31
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
vae/diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1598f3d24932bcfe6634e8b618ea1e30ab1d57f5aad13a6d2de446d2199f2341
3
- size 334643268