PseudoTerminal X commited on
Commit
b7843ec
1 Parent(s): d7cd53c

Trained for 0 epochs and 1000 steps.

Browse files

Trained with datasets ['text-embeds-pixart-filter', 'photo-concept-bucket', 'midjourney-v6-520k-raw', 'sfwbooru', 'nijijourney-v6-520k-raw', 'dalle3']
Learning rate 1e-06, batch size 24, and 1 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: terminusresearch/pixart-900m-1024-ft-v0.6
VAE: madebyollin/sdxl-vae-fp16-fix

README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ base_model: "terminusresearch/pixart-900m-1024-ft-v0.6"
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - simpletuner
10
+ - full
11
+
12
+ inference: true
13
+
14
+ ---
15
+
16
+ # pixart-900m-1024-vpred-zsnr
17
+
18
+ This is a full rank finetune derived from [terminusresearch/pixart-900m-1024-ft-v0.6](https://huggingface.co/terminusresearch/pixart-900m-1024-ft-v0.6).
19
+
20
+
21
+
22
+ The main validation prompt used during training was:
23
+
24
+ ```
25
+ ethnographic photography of teddy bear at a picnic, ears tucked behind a cozy hoodie looking darkly off to the stormy picnic skies
26
+ ```
27
+
28
+ ## Validation settings
29
+ - CFG: `7.5`
30
+ - CFG Rescale: `0.7`
31
+ - Steps: `25`
32
+ - Sampler: `None`
33
+ - Seed: `42`
34
+ - Resolutions: `1024x1024,1344x768,916x1152`
35
+
36
+ Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
37
+
38
+
39
+
40
+
41
+ <Gallery />
42
+
43
+ The text encoder **was not** trained.
44
+ You may reuse the base model text encoder for inference.
45
+
46
+
47
+ ## Training settings
48
+
49
+ - Training epochs: 0
50
+ - Training steps: 1000
51
+ - Learning rate: 1e-06
52
+ - Effective batch size: 192
53
+ - Micro-batch size: 24
54
+ - Gradient accumulation steps: 1
55
+ - Number of GPUs: 8
56
+ - Prediction type: epsilon
57
+ - Rescaled betas zero SNR: False
58
+ - Optimizer: AdamW, stochastic bf16
59
+ - Precision: Pure BF16
60
+ - Xformers: Not used
61
+
62
+
63
+ ## Datasets
64
+
65
+ ### photo-concept-bucket
66
+ - Repeats: 0
67
+ - Total number of images: ~567552
68
+ - Total number of aspect buckets: 1
69
+ - Resolution: 1.0 megapixels
70
+ - Cropped: True
71
+ - Crop style: random
72
+ - Crop aspect: square
73
+ ### midjourney-v6-520k-raw
74
+ - Repeats: 0
75
+ - Total number of images: ~390912
76
+ - Total number of aspect buckets: 1
77
+ - Resolution: 1.0 megapixels
78
+ - Cropped: True
79
+ - Crop style: random
80
+ - Crop aspect: square
81
+ ### sfwbooru
82
+ - Repeats: 0
83
+ - Total number of images: ~233664
84
+ - Total number of aspect buckets: 1
85
+ - Resolution: 1.0 megapixels
86
+ - Cropped: True
87
+ - Crop style: random
88
+ - Crop aspect: square
89
+ ### nijijourney-v6-520k-raw
90
+ - Repeats: 0
91
+ - Total number of images: ~415680
92
+ - Total number of aspect buckets: 1
93
+ - Resolution: 1.0 megapixels
94
+ - Cropped: True
95
+ - Crop style: random
96
+ - Crop aspect: square
97
+ ### dalle3
98
+ - Repeats: 0
99
+ - Total number of images: ~1121664
100
+ - Total number of aspect buckets: 1
101
+ - Resolution: 1.0 megapixels
102
+ - Cropped: True
103
+ - Crop style: random
104
+ - Crop aspect: square
105
+
106
+
107
+ ## Inference
108
+
109
+
110
+ ```python
111
+ import torch
112
+ from diffusers import DiffusionPipeline
113
+
114
+ model_id = 'pixart-900m-1024-vpred-zsnr'
115
+ pipeline = DiffusionPipeline.from_pretrained(model_id)
116
+
117
+ prompt = "ethnographic photography of teddy bear at a picnic, ears tucked behind a cozy hoodie looking darkly off to the stormy picnic skies"
118
+ negative_prompt = "blurry, cropped, ugly"
119
+
120
+ pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
121
+ image = pipeline(
122
+ prompt=prompt,
123
+ negative_prompt='blurry, cropped, ugly',
124
+ num_inference_steps=25,
125
+ generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
126
+ width=1152,
127
+ height=768,
128
+ guidance_scale=7.5,
129
+ guidance_rescale=0.7,
130
+ ).images[0]
131
+ image.save("output.png", format="PNG")
132
+ ```
133
+
optimizer.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f9f9a3a3f5451c22635b16e8cc7a837edd9ec41c43c63d460e8bb889a7a3472
3
+ size 5451415117
random_states_0.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f0edfc2c885f730ef911db10373dce0a3e814e4fdbb2de759c691606ecf21e3
3
+ size 16100
scheduler.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:efff19450f55a9358b76f3e5170171942761d1c5b9128683028d0c09b8a24573
3
+ size 1000
training_state-dalle3.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state-midjourney-v6-520k-raw.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state-nijijourney-v6-520k-raw.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state-photo-concept-bucket.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state-sfwbooru.json ADDED
The diff for this file is too large to render. See raw diff
 
training_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"global_step": 1000, "epoch_step": 1000, "epoch": 1, "exhausted_backends": [], "repeats": {}}
transformer/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "PixArtTransformer2DModel",
3
+ "_diffusers_version": "0.30.0.dev0",
4
+ "_name_or_path": "terminusresearch/pixart-900m-1024-ft-v0.6",
5
+ "activation_fn": "gelu-approximate",
6
+ "attention_bias": true,
7
+ "attention_head_dim": 72,
8
+ "attention_type": "default",
9
+ "caption_channels": 4096,
10
+ "cross_attention_dim": 1152,
11
+ "double_self_attention": false,
12
+ "dropout": 0.0,
13
+ "in_channels": 4,
14
+ "interpolation_scale": 2,
15
+ "norm_elementwise_affine": false,
16
+ "norm_eps": 1e-06,
17
+ "norm_num_groups": 32,
18
+ "norm_type": "ada_norm_single",
19
+ "num_attention_heads": 16,
20
+ "num_embeds_ada_norm": 1000,
21
+ "num_layers": 42,
22
+ "num_vector_embeds": null,
23
+ "only_cross_attention": false,
24
+ "out_channels": 8,
25
+ "patch_size": 2,
26
+ "sample_size": 128,
27
+ "upcast_attention": false,
28
+ "use_additional_conditions": false,
29
+ "use_linear_projection": false
30
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d877393a75fcacd20413a5c27afdd5bfce4ac8f15411d9236ef4ed7ced00081
3
+ size 1816969728