thingthatis Warlord-K commited on
Commit
73b4ee9
0 Parent(s):

Duplicate from segmind/SSD-1B

Browse files

Co-authored-by: Yatharth Gupta <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-to-image
5
+ - ultra-realistic
6
+ - text-to-image
7
+ - stable-diffusion
8
+ - distilled-model
9
+ - knowledge-distillation
10
+ pinned: true
11
+ datasets:
12
+ - zzliang/GRIT
13
+ - wanng/midjourney-v5-202304-clean
14
+ library_name: diffusers
15
+ ---
16
+
17
+ # Segmind Stable Diffusion 1B (SSD-1B) Model Card
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/WveKcu7q5PyZEwNezyyMC.png)
20
+
21
+ ## Demo
22
+
23
+ Try out the model at [Segmind SSD-1B](https://www.segmind.com/models/ssd-1b) for ⚡ fastest inference. You can also try it on [🤗 Spaces](https://huggingface.co/spaces/segmind/Segmind-Stable-Diffusion)
24
+
25
+ ## Model Description
26
+
27
+ The Segmind Stable Diffusion Model (SSD-1B) is a **distilled 50% smaller** version of the Stable Diffusion XL (SDXL), offering a **60% speedup** while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts.
28
+
29
+ This model employs a knowledge distillation strategy, where it leverages the teachings of several expert models in succession, including SDXL, ZavyChromaXL, and JuggernautXL, to combine their strengths and produce impressive visual outputs.
30
+
31
+ Special thanks to the HF team 🤗 especially [Sayak](https://huggingface.co/sayakpaul), [Patrick](https://github.com/patrickvonplaten) and [Poli](https://huggingface.co/multimodalart) for their collaboration and guidance on this work.
32
+
33
+ ## Image Comparision (SDXL-1.0 vs SSD-1B)
34
+
35
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/mOM_OMxbivVBELad1QQYj.png)
36
+
37
+ ## Usage:
38
+ This model can be used via the 🧨 Diffusers library.
39
+
40
+ Make sure to install diffusers from source by running
41
+ ```
42
+ pip install git+https://github.com/huggingface/diffusers
43
+ ```
44
+
45
+ In addition, please install `transformers`, `safetensors` and `accelerate`:
46
+ ```
47
+ pip install transformers accelerate safetensors
48
+ ```
49
+
50
+ To use the model, you can run the following:
51
+
52
+ ```py
53
+ from diffusers import StableDiffusionXLPipeline
54
+ import torch
55
+ pipe = StableDiffusionXLPipeline.from_pretrained("segmind/SSD-1B", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
56
+ pipe.to("cuda")
57
+ # if using torch < 2.0
58
+ # pipe.enable_xformers_memory_efficient_attention()
59
+ prompt = "An astronaut riding a green horse" # Your prompt here
60
+ neg_prompt = "ugly, blurry, poor quality" # Negative prompt here
61
+ image = pipe(prompt=prompt, negative_prompt=neg_prompt).images[0]
62
+ ```
63
+ ### Update: Our model should now be usable in ComfyUI.
64
+ ### Please do use negative prompting, and a CFG around 9.0 for the best quality!
65
+ ### Model Description
66
+
67
+ - **Developed by:** [Segmind](https://www.segmind.com/)
68
+ - **Developers:** [Yatharth Gupta](https://huggingface.co/Warlord-K) and [Vishnu Jaddipal](https://huggingface.co/Icar).
69
+ - **Model type:** Diffusion-based text-to-image generative model
70
+ - **License:** Apache 2.0
71
+ - **Distilled From** [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
72
+
73
+
74
+ ### Key Features
75
+
76
+ - **Text-to-Image Generation:** The model excels at generating images from text prompts, enabling a wide range of creative applications.
77
+
78
+ - **Distilled for Speed:** Designed for efficiency, this model offers a 60% speedup, making it a practical choice for real-time applications and scenarios where rapid image generation is essential.
79
+
80
+ - **Diverse Training Data:** Trained on diverse datasets, the model can handle a variety of textual prompts and generate corresponding images effectively.
81
+
82
+ - **Knowledge Distillation:** By distilling knowledge from multiple expert models, the Segmind Stable Diffusion Model combines their strengths and minimizes their limitations, resulting in improved performance.
83
+
84
+ ### Model Architecture
85
+
86
+ The SSD-1B Model is a 1.3B Parameter Model which has several layers removed from the Base SDXL Model
87
+
88
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/Qa8Ow-moLQhOvzp-5kGt4.png)
89
+
90
+ ### Training info
91
+
92
+ These are the key hyperparameters used during training:
93
+
94
+ * Steps: 251000
95
+ * Learning rate: 1e-5
96
+ * Batch size: 32
97
+ * Gradient accumulation steps: 4
98
+ * Image resolution: 1024
99
+ * Mixed-precision: fp16
100
+
101
+ ### Multi-Resolution Support
102
+
103
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/IwIaIB4nBdMx6Vs5q82cL.jpeg)
104
+
105
+ SSD-1B can support the following output resolutions.
106
+
107
+ * 1024 x 1024 (1:1 Square)
108
+ * 1152 x 896 (9:7)
109
+ * 896 x 1152 (7:9)
110
+ * 1216 x 832 (19:13)
111
+ * 832 x 1216 (13:19)
112
+ * 1344 x 768 (7:4 Horizontal)
113
+ * 768 x 1344 (4:7 Vertical)
114
+ * 1536 x 640 (12:5 Horizontal)
115
+ * 640 x 1536 (5:12 Vertical)
116
+
117
+
118
+ ### Speed Comparision
119
+
120
+ We have observed that SSD-1B is upto 60% faster than the Base SDXL Model. Below is a comparision on an A100 80GB.
121
+
122
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/TyymF1OkUjXLrHUp1XF0t.png)
123
+
124
+ Below are the speed up metrics on a RTX 4090 GPU.
125
+
126
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62039c2d91d53938a643317d/moMZrlDr-HTFkZlqWHUjQ.png)
127
+
128
+ ### Model Sources
129
+
130
+ For research and development purposes, the SSD-1B Model can be accessed via the Segmind AI platform. For more information and access details, please visit [Segmind](https://www.segmind.com/models/ssd-1b).
131
+
132
+ ## Uses
133
+
134
+
135
+ ### Direct Use
136
+
137
+ The Segmind Stable Diffusion Model is suitable for research and practical applications in various domains, including:
138
+
139
+ - **Art and Design:** It can be used to generate artworks, designs, and other creative content, providing inspiration and enhancing the creative process.
140
+
141
+ - **Education:** The model can be applied in educational tools to create visual content for teaching and learning purposes.
142
+
143
+ - **Research:** Researchers can use the model to explore generative models, evaluate its performance, and push the boundaries of text-to-image generation.
144
+
145
+ - **Safe Content Generation:** It offers a safe and controlled way to generate content, reducing the risk of harmful or inappropriate outputs.
146
+
147
+ - **Bias and Limitation Analysis:** Researchers and developers can use the model to probe its limitations and biases, contributing to a better understanding of generative models' behavior.
148
+
149
+ ### Downstream Use
150
+
151
+ The Segmind Stable Diffusion Model can also be used directly with the 🧨 Diffusers library training scripts for further training, including:
152
+
153
+ - **[LoRA](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora_sdxl.py):**
154
+ ```bash
155
+ export MODEL_NAME="segmind/SSD-1B"
156
+ export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
157
+ export DATASET_NAME="lambdalabs/pokemon-blip-captions"
158
+
159
+ accelerate launch train_text_to_image_lora_sdxl.py \
160
+ --pretrained_model_name_or_path=$MODEL_NAME \
161
+ --pretrained_vae_model_name_or_path=$VAE_NAME \
162
+ --dataset_name=$DATASET_NAME --caption_column="text" \
163
+ --resolution=1024 --random_flip \
164
+ --train_batch_size=1 \
165
+ --num_train_epochs=2 --checkpointing_steps=500 \
166
+ --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
167
+ --mixed_precision="fp16" \
168
+ --seed=42 \
169
+ --output_dir="sd-pokemon-model-lora-ssd" \
170
+ --validation_prompt="cute dragon creature" --report_to="wandb" \
171
+ --push_to_hub
172
+ ```
173
+
174
+ - **[Fine-Tune](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_sdxl.py):**
175
+ ```bash
176
+ export MODEL_NAME="segmind/SSD-1B"
177
+ export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
178
+ export DATASET_NAME="lambdalabs/pokemon-blip-captions"
179
+
180
+ accelerate launch train_text_to_image_sdxl.py \
181
+ --pretrained_model_name_or_path=$MODEL_NAME \
182
+ --pretrained_vae_model_name_or_path=$VAE_NAME \
183
+ --dataset_name=$DATASET_NAME \
184
+ --enable_xformers_memory_efficient_attention \
185
+ --resolution=512 --center_crop --random_flip \
186
+ --proportion_empty_prompts=0.2 \
187
+ --train_batch_size=1 \
188
+ --gradient_accumulation_steps=4 --gradient_checkpointing \
189
+ --max_train_steps=10000 \
190
+ --use_8bit_adam \
191
+ --learning_rate=1e-06 --lr_scheduler="constant" --lr_warmup_steps=0 \
192
+ --mixed_precision="fp16" \
193
+ --report_to="wandb" \
194
+ --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5 \
195
+ --checkpointing_steps=5000 \
196
+ --output_dir="ssd-pokemon-model" \
197
+ --push_to_hub
198
+ ```
199
+ - **[Dreambooth LoRA](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_sdxl.py):**
200
+ ```bash
201
+ export MODEL_NAME="segmind/SSD-1B"
202
+ export INSTANCE_DIR="dog"
203
+ export OUTPUT_DIR="lora-trained-xl"
204
+ export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
205
+
206
+ accelerate launch train_dreambooth_lora_sdxl.py \
207
+ --pretrained_model_name_or_path=$MODEL_NAME \
208
+ --instance_data_dir=$INSTANCE_DIR \
209
+ --pretrained_vae_model_name_or_path=$VAE_PATH \
210
+ --output_dir=$OUTPUT_DIR \
211
+ --mixed_precision="fp16" \
212
+ --instance_prompt="a photo of sks dog" \
213
+ --resolution=1024 \
214
+ --train_batch_size=1 \
215
+ --gradient_accumulation_steps=4 \
216
+ --learning_rate=1e-5 \
217
+ --report_to="wandb" \
218
+ --lr_scheduler="constant" \
219
+ --lr_warmup_steps=0 \
220
+ --max_train_steps=500 \
221
+ --validation_prompt="A photo of sks dog in a bucket" \
222
+ --validation_epochs=25 \
223
+ --seed="0" \
224
+ --push_to_hub
225
+ ```
226
+
227
+ ### Out-of-Scope Use
228
+
229
+ The SSD-1B Model is not suitable for creating factual or accurate representations of people, events, or real-world information. It is not intended for tasks requiring high precision and accuracy.
230
+
231
+ ## Limitations and Bias
232
+
233
+ Limitations & Bias
234
+ The SSD-1B Model has some challenges in embodying absolute photorealism, especially in human depictions. While it grapples with incorporating clear text and maintaining the fidelity of complex compositions due to its autoencoding approach, these hurdles pave the way for future enhancements. Importantly, the model's exposure to a diverse dataset, though not a panacea for ingrained societal and digital biases, represents a foundational step towards more equitable technology. Users are encouraged to interact with this pioneering tool with an understanding of its current limitations, fostering an environment of conscious engagement and anticipation for its continued evolution.
SSD-1B-modelspec.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cb406ec0662e91570a79f3c4fb8f0ea5325bffe6af5d9382edae838698f72bd
3
+ size 4465933050
SSD-1B.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bf1ce6b065a6b969ab02dc8e8fa21eb20ee189b10935c49ce68c77a7e432c1c
3
+ size 4465671506
model_index.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionXLPipeline",
3
+ "_diffusers_version": "0.19.0",
4
+ "force_zeros_for_empty_prompt": true,
5
+ "scheduler": [
6
+ "diffusers",
7
+ "EulerDiscreteScheduler"
8
+ ],
9
+ "text_encoder": [
10
+ "transformers",
11
+ "CLIPTextModel"
12
+ ],
13
+ "text_encoder_2": [
14
+ "transformers",
15
+ "CLIPTextModelWithProjection"
16
+ ],
17
+ "tokenizer": [
18
+ "transformers",
19
+ "CLIPTokenizer"
20
+ ],
21
+ "tokenizer_2": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "unet": [
26
+ "diffusers",
27
+ "UNet2DConditionModel"
28
+ ],
29
+ "vae": [
30
+ "diffusers",
31
+ "AutoencoderKL"
32
+ ]
33
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "EulerDiscreteScheduler",
3
+ "_diffusers_version": "0.19.0",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "interpolation_type": "linear",
9
+ "num_train_timesteps": 1000,
10
+ "prediction_type": "epsilon",
11
+ "sample_max_value": 1.0,
12
+ "set_alpha_to_one": false,
13
+ "skip_prk_steps": true,
14
+ "steps_offset": 1,
15
+ "timestep_spacing": "leading",
16
+ "trained_betas": null,
17
+ "use_karras_sigmas": false
18
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModel"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "quick_gelu",
10
+ "hidden_size": 768,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.29.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5487ea0eee9c9a9bff8abd097908d4deff3ae1fa87b3b67397f8b9538139d447
3
+ size 246144864
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8377b1ca9d88fe06ec483dd7b3cfc62e5e8dbf8ddd252f455e79d659fa0553c5
3
+ size 492265880
text_encoder_2/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModelWithProjection"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 1280,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5120,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 20,
18
+ "num_hidden_layers": 32,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 1280,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.29.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder_2/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3df577f6e3799c8e1bd9b40e30133710e02e8e25d0ce48cdcc790e7dfe12d6d
3
+ size 1389382880
text_encoder_2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b84f413eebecbd049b72874c1df533a516510cb5a2489ae58c7e320209cf0ebe
3
+ size 2778702976
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "clean_up_tokenization_spaces": true,
12
+ "do_lower_case": true,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "<|endoftext|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "errors": "replace",
22
+ "model_max_length": 77,
23
+ "pad_token": "<|endoftext|>",
24
+ "tokenizer_class": "CLIPTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<|endoftext|>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer_2/tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "clean_up_tokenization_spaces": true,
12
+ "do_lower_case": true,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "<|endoftext|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "errors": "replace",
22
+ "model_max_length": 77,
23
+ "pad_token": "!",
24
+ "tokenizer_class": "CLIPTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<|endoftext|>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer_2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.19.0.dev0",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "block_out_channels": [
14
+ 320,
15
+ 640,
16
+ 1280
17
+ ],
18
+ "center_input_sample": false,
19
+ "class_embed_type": null,
20
+ "class_embeddings_concat": false,
21
+ "conv_in_kernel": 3,
22
+ "conv_out_kernel": 3,
23
+ "cross_attention_dim": 2048,
24
+ "cross_attention_norm": null,
25
+ "down_block_types": [
26
+ "DownBlock2D",
27
+ "CrossAttnDownBlock2D",
28
+ "CrossAttnDownBlock2D"
29
+ ],
30
+ "downsample_padding": 1,
31
+ "dual_cross_attention": false,
32
+ "encoder_hid_dim": null,
33
+ "encoder_hid_dim_type": null,
34
+ "flip_sin_to_cos": true,
35
+ "freq_shift": 0,
36
+ "in_channels": 4,
37
+ "layers_per_block": 2,
38
+ "mid_block_only_cross_attention": null,
39
+ "mid_block_scale_factor": 1,
40
+ "mid_block_type": "UNetMidBlock2D",
41
+ "norm_eps": 1e-05,
42
+ "norm_num_groups": 32,
43
+ "num_attention_heads": null,
44
+ "num_class_embeds": null,
45
+ "only_cross_attention": false,
46
+ "out_channels": 4,
47
+ "projection_class_embeddings_input_dim": 2816,
48
+ "resnet_out_scale_factor": 1.0,
49
+ "resnet_skip_time_act": false,
50
+ "resnet_time_scale_shift": "default",
51
+ "sample_size": 128,
52
+ "time_cond_proj_dim": null,
53
+ "time_embedding_act_fn": null,
54
+ "time_embedding_dim": null,
55
+ "time_embedding_type": "positional",
56
+ "timestep_post_act": null,
57
+ "transformer_layers_per_block": [
58
+ [1],
59
+ [2,2],
60
+ [4,4]
61
+ ],
62
+ "reverse_transformer_layers_per_block": [
63
+ [4,4,10],
64
+ [2,1,1],
65
+ 1
66
+ ],
67
+ "up_block_types": [
68
+ "CrossAttnUpBlock2D",
69
+ "CrossAttnUpBlock2D",
70
+ "UpBlock2D"
71
+ ],
72
+ "upcast_attention": null,
73
+ "use_linear_projection": true
74
+ }
unet/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40d8ea9159f3e875278dacc7879442d58c45850cf13c62f5e26681061c51829a
3
+ size 2662790608
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02ed8ebd0ed55aec686fcf20946d7a1659a31f9f8d9c3798cd254ba6b67434ca
3
+ size 5325459760
vae/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.19.0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "force_upcast": true,
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 1024,
24
+ "scaling_factor": 0.13025,
25
+ "up_block_types": [
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D",
29
+ "UpDecoderBlock2D"
30
+ ]
31
+ }
vae/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6353737672c94b96174cb590f711eac6edf2fcce5b6e91aa9d73c5adc589ee48
3
+ size 167335342
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78f6189c8492013e3cac81637a1f657f790a237387f8a9dfd6bfa5fee28eb646
3
+ size 334643268