charleselena ehsanakh commited on
Commit
4df2220
0 Parent(s):

Duplicate from playgroundai/playground-v2-1024px-aesthetic

Browse files

Co-authored-by: Ehsan Akhgari <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
LICENSE.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Playground v2 Community License
2
+
3
+ **Release Date:** December 5, 2023
4
+
5
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Playground Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Playground v2 distributed by Playground at [https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) or other authorized channel.
8
+
9
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
10
+
11
+ “Playground v2” means the diffusion-based text-to-image generative models and software and algorithms, including checkpoints, trained model weights, and other elements of the foregoing distributed by Playground at [https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) or other authorized channel.
12
+
13
+ “Playground Materials” means, collectively, Playground v2 and related Documentation (and any portion thereof) made available under this Agreement.
14
+
15
+ “Playground” or “we” means Mighty Computing, Inc. dba Playground AI<sup>TM</sup>.
16
+
17
+ By using or distributing any portion or element of the Playground Materials, you agree to be bound by this Agreement.
18
+
19
+ ## 1. License Rights and Redistribution.
20
+
21
+ ### a. Grant of Rights.
22
+ You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Playground’s intellectual property or other rights owned by Playground embodied in the Playground Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Playground Materials. Subject to the restrictions herein, this permissive license is available for free for research and commercial use (by an entity or individual).
23
+
24
+ ### b. Redistribution and Use.
25
+ i. If you distribute or make the Playground Materials, or any derivative works thereof, available to any third party, you shall provide a copy of this Agreement to such third party.
26
+
27
+ ii. If you receive Playground Materials, or any derivative works thereof, from an authorized Licensee as part of an integrated end user product, then Section 2 of this Agreement will not apply to you.
28
+
29
+ iii. You must retain in all copies of the Playground Materials that you distribute the following attribution notice within a “Notice” text file distributed as a part of such copies: “Playground v2 is licensed under the Playground v2 Community License.”
30
+
31
+ iv. Your use of the Playground Materials must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Use Restrictions set forth in Attachment A. You shall require all of your users who use Playground v2 or any derivative works thereof, to comply with the terms of this section and the restrictions in Attachment A.
32
+
33
+ v. You will not use the Playground Materials or any output or results of the Playground Materials to improve any other text-to-image generative model (excluding Playground v2 or derivative works thereof).
34
+
35
+ ## 2. Additional Commercial Terms.
36
+ If, at any time, (a) image generation or image editing is a core business or product of Licensee’s and (b) the total monthly unique users (MUU) of the products or services made available by or for Licensee, or Licensee’s affiliates, for such products or services is greater than 1 million MUUs in the preceding calendar month, then immediately thereafter you must request a license from Playground. Playground may grant this license to you in its sole discretion and you are not authorized to exercise any of the rights under this Agreement unless or until Playground otherwise expressly grants you such rights as a Licensee.
37
+
38
+ ## 3. Disclaimer of Warranty.
39
+ UNLESS REQUIRED BY APPLICABLE LAW, THE PLAYGROUND MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE PLAYGROUND MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE PLAYGROUND MATERIALS AND ANY OUTPUT AND RESULTS.
40
+
41
+ ## 4. Limitation of Liability.
42
+ IN NO EVENT WILL PLAYGROUND OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF PLAYGROUND OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
43
+
44
+ ## 5. Intellectual Property.
45
+
46
+ a. No trademark licenses are granted under this Agreement, and in connection with the Playground Materials, neither Playground nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Playground Materials.
47
+
48
+ b. Subject to Playground’s ownership of Playground Materials and derivatives made by or for Playground, with respect to any derivative works and modifications of the Playground Materials that are made by you, as between you and Playground, you are and will be the owner of such derivative works and modifications.
49
+
50
+ c. If you institute litigation or other proceedings against Playground or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Playground Materials or Playground v2 outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Playground from and against any claim by any third party arising out of or related to your use or distribution of the Playground Materials.
51
+
52
+ ## 6. Term and Termination.
53
+ The term of this Agreement will commence upon your acceptance of this Agreement or access to the Playground Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Playground may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Playground Materials. Sections 3, 4 and 7 shall survive the termination of this Agreement.
54
+
55
+ ## 7. Governing Law and Jurisdiction.
56
+ This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
57
+
58
+
59
+ **Attachment A - Use Restrictions**
60
+
61
+ You agree not to use Playground v2 or any derivative works thereof:
62
+
63
+ <ol type="a">
64
+ <li>In any way that violates any applicable national, federal, state, local or international law or regulation;</li>
65
+ <li>For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;</li>
66
+ <li>To generate or disseminate verifiably false information and/or content with the purpose of harming others;</li>
67
+ <li>To generate or disseminate personal identifiable information that can be used to harm an individual;</li>
68
+ <li>To defame, disparage or otherwise harass others;</li>
69
+ <li>For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;</li>
70
+ <li>For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;</li>
71
+ <li>To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;</li>
72
+ <li>For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;</li>
73
+ <li>To provide medical advice and medical results interpretation;</li>
74
+ <li>To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).</li>
75
+ </ol>
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: playground-v2-community
4
+ license_link: https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/blob/main/LICENSE.md
5
+ tags:
6
+ - text-to-image
7
+ - playground
8
+ inference:
9
+ parameters:
10
+ guidance_scale: 3.0
11
+ ---
12
+ # Playground v2 – 1024px Aesthetic Model
13
+
14
+ This repository contains a model that generates highly aesthetic images of resolution 1024x1024. You can use the model with Hugging Face 🧨 Diffusers.
15
+
16
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/p0up5GNQgO0vVIiJ672K7.png)
17
+
18
+
19
+ **Playground v2** is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at [Playground](https://playground.com).
20
+
21
+ Images generated by Playground v2 are favored **2.5** times more than those produced by Stable Diffusion XL, according to Playground’s [user study](#user-study).
22
+
23
+ We are thrilled to release [intermediate checkpoints](#intermediate-base-models) at different training stages, including evaluation metrics, to the community. We hope this will encourage further research into foundational models for image generation.
24
+
25
+ Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for automatic evaluation of a model’s aesthetic quality.
26
+
27
+ Please see our [blog](https://blog.playgroundai.com/playground-v2/) for more details.
28
+
29
+ ### Model Description
30
+
31
+ - **Developed by:** [Playground](https://playground.com)
32
+ - **Model type:** Diffusion-based text-to-image generative model
33
+ - **License:** [Playground v2 Community License](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/blob/main/LICENSE.md)
34
+ - **Summary:** This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)). It follows the same architecture as [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
35
+
36
+ ### Using the model with 🧨 Diffusers
37
+
38
+ Install diffusers >= 0.24.0 and some dependencies:
39
+ ```
40
+ pip install transformers accelerate safetensors
41
+ ```
42
+
43
+ To use the model, run the following snippet.
44
+
45
+ **Note**: It is recommend to use **`guidance_scale=3.0`**.
46
+
47
+ ```python
48
+ from diffusers import DiffusionPipeline
49
+ import torch
50
+
51
+ pipe = DiffusionPipeline.from_pretrained(
52
+ "playgroundai/playground-v2-1024px-aesthetic",
53
+ torch_dtype=torch.float16,
54
+ use_safetensors=True,
55
+ add_watermarker=False,
56
+ variant="fp16"
57
+ )
58
+ pipe.to("cuda")
59
+
60
+ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
61
+ image = pipe(prompt=prompt, guidance_scale=3.0).images[0]
62
+ ```
63
+
64
+ ### Using the model with Automatic1111/ComfyUI
65
+
66
+ In order to use the model with software such as Automatic1111 or ComfyUI you can use [`playground-v2.fp16.safetensors`](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic/blob/main/playground-v2.fp16.safetensors) file.
67
+
68
+ ### User Study
69
+
70
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/8VzBkSYaUU3dt509Co9sk.png)
71
+
72
+ According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored **2.5** times more than those produced by [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
73
+
74
+ We report user preference metrics on [PartiPrompts](https://github.com/google-research/parti), following standard practice, and on an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks.
75
+
76
+ During the user study, we give users instructions to evaluate image pairs based on both (1) their aesthetic preference and (2) the image-text alignment.
77
+
78
+ ### MJHQ-30K Benchmark
79
+
80
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/o3Bt62qFsTO9DkeX2yLua.png)
81
+
82
+ | Model | Overall FID |
83
+ | ------------------------------------- | ----- |
84
+ | SDXL-1-0-refiner | 9.55 |
85
+ | [playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic) | **7.07** |
86
+
87
+ We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgroundai/MJHQ-30K), for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
88
+
89
+ We have curated a high-quality dataset from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
90
+
91
+ For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ-30K benchmark.
92
+
93
+ We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.
94
+
95
+ ### Intermediate Base Models
96
+
97
+ | Model | FID | Clip Score |
98
+ | ---------------------------- | ------ | ---------- |
99
+ | SDXL-1-0-refiner | 13.04 | 32.62 |
100
+ | [playground-v2-256px-base](https://huggingface.co/playgroundai/playground-v2-256px-base) | 9.83 | 31.90 |
101
+ | [playground-v2-512px-base](https://huggingface.co/playgroundai/playground-v2-512px-base) | 9.55 | 32.08 |
102
+
103
+
104
+ Apart from [playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic), we release intermediate checkpoints at different training stages to the community in order to foster foundation model research in pixels. Here, we report the FID score and CLIP score on the MSCOCO14 evaluation set for the reference purposes. (Note that our reported numbers may differ from the numbers reported in SDXL’s published results, as our prompt list may be different.)
105
+
106
+ ### How to cite us
107
+
108
+ ```
109
+ @misc{playground-v2,
110
+ url={[https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic](https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic)},
111
+ title={Playground v2},
112
+ author={Li, Daiqing and Kamko, Aleks and Sabet, Ali and Akhgari, Ehsan and Xu, Linmiao and Doshi, Suhail}
113
+ }
114
+ ```
model_index.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionXLPipeline",
3
+ "_diffusers_version": "0.24.0",
4
+ "feature_extractor": [
5
+ null,
6
+ null
7
+ ],
8
+ "force_zeros_for_empty_prompt": true,
9
+ "image_encoder": [
10
+ null,
11
+ null
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "EulerAncestralDiscreteScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "text_encoder_2": [
22
+ "transformers",
23
+ "CLIPTextModelWithProjection"
24
+ ],
25
+ "tokenizer": [
26
+ "transformers",
27
+ "CLIPTokenizer"
28
+ ],
29
+ "tokenizer_2": [
30
+ "transformers",
31
+ "CLIPTokenizer"
32
+ ],
33
+ "unet": [
34
+ "diffusers",
35
+ "UNet2DConditionModel"
36
+ ],
37
+ "vae": [
38
+ "diffusers",
39
+ "AutoencoderKL"
40
+ ]
41
+ }
playground-v2.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0411e988479884b1a3ecd184123efe38d051d8d0ef24270585a7d1d57499464a
3
+ size 6938042488
playground-v2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c7d38880d0940e6795158b7608ccef89217272b1f2a9331c5b0a2adffcd82c4
3
+ size 13875721944
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "EulerAncestralDiscreteScheduler",
3
+ "_diffusers_version": "0.25.0.dev0",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "interpolation_type": "linear",
9
+ "num_train_timesteps": 1000,
10
+ "prediction_type": "epsilon",
11
+ "sample_max_value": 1.0,
12
+ "set_alpha_to_one": false,
13
+ "sigma_max": null,
14
+ "sigma_min": null,
15
+ "skip_prk_steps": true,
16
+ "steps_offset": 1,
17
+ "timestep_spacing": "leading",
18
+ "timestep_type": "discrete",
19
+ "trained_betas": null,
20
+ "use_karras_sigmas": false
21
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModel"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "quick_gelu",
10
+ "hidden_size": 768,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:660c6f5b1abae9dc498ac2d21e1347d2abdb0cf6c0c0c8576cd796491d9a6cdd
3
+ size 246144152
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:778d02eb9e707c3fbaae0b67b79ea0d1399b52e624fb634f2f19375ae7c047c3
3
+ size 492265168
text_encoder/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b2a888f98b610f4666b5323f4012475cc752183ce3bbec3ccf25cf32cec03d7
3
+ size 492306586
text_encoder/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aaa9196f44f8283e6549b748927d0d24b91710c1a216be590e08458bb5d615c
3
+ size 246185562
text_encoder_2/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModelWithProjection"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 1280,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5120,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 20,
18
+ "num_hidden_layers": 32,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 1280,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "vocab_size": 49408
24
+ }
text_encoder_2/model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec310df2af79c318e24d20511b601a591ca8cd4f1fce1d8dff822a356bcdb1f4
3
+ size 1389382176
text_encoder_2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa5b2e6f4c2efc2d82e4b8312faec1a5540eabfc6415126c9a05c8436a530ef4
3
+ size 2778702264
text_encoder_2/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01854181cb926f2b305bae76ba3bbacf9f8f6eff785aeafb8b22a3e8fbe4b9b0
3
+ size 2778810142
text_encoder_2/pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c61bac6a0e10e9c430b1faeb8338347758f7b5ca98dfbd7abff85c3e2f4305ea
3
+ size 1389490462
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer_2/tokenizer_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "!",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49406": {
13
+ "content": "<|startoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "49407": {
21
+ "content": "<|endoftext|>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "bos_token": "<|startoftext|>",
30
+ "clean_up_tokenization_spaces": true,
31
+ "do_lower_case": true,
32
+ "eos_token": "<|endoftext|>",
33
+ "errors": "replace",
34
+ "model_max_length": 77,
35
+ "pad_token": "!",
36
+ "tokenizer_class": "CLIPTokenizer",
37
+ "unk_token": "<|endoftext|>"
38
+ }
tokenizer_2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.24.0",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "attention_type": "default",
14
+ "block_out_channels": [
15
+ 320,
16
+ 640,
17
+ 1280
18
+ ],
19
+ "center_input_sample": false,
20
+ "class_embed_type": null,
21
+ "class_embeddings_concat": false,
22
+ "conv_in_kernel": 3,
23
+ "conv_out_kernel": 3,
24
+ "cross_attention_dim": 2048,
25
+ "cross_attention_norm": null,
26
+ "down_block_types": [
27
+ "DownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D"
30
+ ],
31
+ "downsample_padding": 1,
32
+ "dropout": 0.0,
33
+ "dual_cross_attention": false,
34
+ "encoder_hid_dim": null,
35
+ "encoder_hid_dim_type": null,
36
+ "flip_sin_to_cos": true,
37
+ "freq_shift": 0,
38
+ "in_channels": 4,
39
+ "layers_per_block": 2,
40
+ "mid_block_only_cross_attention": null,
41
+ "mid_block_scale_factor": 1,
42
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
43
+ "norm_eps": 1e-05,
44
+ "norm_num_groups": 32,
45
+ "num_attention_heads": null,
46
+ "num_class_embeds": null,
47
+ "only_cross_attention": false,
48
+ "out_channels": 4,
49
+ "projection_class_embeddings_input_dim": 2816,
50
+ "resnet_out_scale_factor": 1.0,
51
+ "resnet_skip_time_act": false,
52
+ "resnet_time_scale_shift": "default",
53
+ "reverse_transformer_layers_per_block": null,
54
+ "sample_size": 128,
55
+ "time_cond_proj_dim": null,
56
+ "time_embedding_act_fn": null,
57
+ "time_embedding_dim": null,
58
+ "time_embedding_type": "positional",
59
+ "timestep_post_act": null,
60
+ "transformer_layers_per_block": [
61
+ 1,
62
+ 2,
63
+ 10
64
+ ],
65
+ "up_block_types": [
66
+ "CrossAttnUpBlock2D",
67
+ "CrossAttnUpBlock2D",
68
+ "UpBlock2D"
69
+ ],
70
+ "upcast_attention": false,
71
+ "use_linear_projection": true
72
+ }
unet/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44b44f6106e62ab27d03c7a986c72201d3896fc21278d4b965b4c0320aca90eb
3
+ size 10270604314
unet/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95dc67cb5960b6c61d3dd3ed60cf19ace14ee7e244931c32fbdbfcff0cb8ead1
3
+ size 5135669022
unet/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbff8944611b8ef85d5dc1b6527cefe7b7f65560d114e1e1f291ab51067626ce
3
+ size 5135149760
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28d1e15c3764365fcb2d32bca5c3158617a06dcaf3d84fb9af4d0aa1f5d1c1f8
3
+ size 10270077736
vae/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.24.0",
4
+ "act_fn": "silu",
5
+ "block_out_channels": [
6
+ 128,
7
+ 256,
8
+ 512,
9
+ 512
10
+ ],
11
+ "down_block_types": [
12
+ "DownEncoderBlock2D",
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D"
16
+ ],
17
+ "force_upcast": true,
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 1024,
24
+ "scaling_factor": 0.13025,
25
+ "up_block_types": [
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D",
29
+ "UpDecoderBlock2D"
30
+ ]
31
+ }
vae/diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9512825399e39027a15fd0c7360dd0fb762d6faf87558d94fcf94e041b53e9f9
3
+ size 334712578
vae/diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:814c655ef8cd535c57ae1bef01ecb06526839fede56b8dc6501c02525917fd20
3
+ size 167404866
vae/diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcb60880a46b63dea58e9bc591abe15f8350bde47b405f9c38f4be70c6161e68
3
+ size 167335342
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8aef7b00195ec3fa8caaa3434e7516eff7d658e1d30eafc9ad6b0e66e9e827e
3
+ size 334643268