AlekseyCalvin commited on
Commit
b1b6907
1 Parent(s): b175bfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -7
README.md CHANGED
@@ -30,12 +30,6 @@ with anxious excitement, his famous bald spot sweatily glistening under warm lig
30
 
31
  output:
32
  url: sd35_4k_ii.jpg
33
- - text: >-
34
- HST style autochrome photo of realistic green-eyed black cat, with prominent
35
- regions of white fur, playing a piano and singing, amateur 2004 photograph
36
- shot on a cell phone in a Los Angeles apartment kitchen
37
- output:
38
- url: sd35_iv.jpg
39
  - text: >-
40
  HST style autochrome photo with title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! /
41
  of an ether-drugged Pikachu sitting in a white plastic cylindical stacked
@@ -57,7 +51,147 @@ Model trained with [AI Toolkit by Ostris](https://github.com/ostris/ai-toolkit)
57
 
58
  ## Trigger words
59
 
60
- HST style autochrome photo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ## Download model and use it with ComfyUI, AUTOMATIC1111, SD.Next, Invoke AI, etc.
63
 
 
30
 
31
  output:
32
  url: sd35_4k_ii.jpg
 
 
 
 
 
 
33
  - text: >-
34
  HST style autochrome photo with title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! /
35
  of an ether-drugged Pikachu sitting in a white plastic cylindical stacked
 
51
 
52
  ## Trigger words
53
 
54
+ **HST style autochrome photo**
55
+
56
+ ## Config Parameters
57
+ *Dim:256 Alpha:256 Optimizer:Adamw8bit LR:4e-5 * <br>
58
+ I SET THE CONFIG TO ONLY TRAIN A SINGLE BLOCK: <br>
59
+ Namely, MMDiT block 12. I used the same config syntax I've repeatedly used for training Flux. <br>
60
+ But I'm not sure single block training worked here, judging by the results and the super hefty checkpoint weights sizes.* <br>
61
+ **More info/config below!** <br>
62
+ Fine-tuned using the **Google Colab Notebook*** of **ai-toolkit**.<br>
63
+ I've used A100 via Colab Pro.
64
+ However, training SD3.5 may potentially work with Free Colab or lower VRAM in general:<br>
65
+ Especially if one were to use:<br> ...Say, *lower rank (try 4 or 8), dataset size (in terms of caching/bucketing/pre-loading impacts), 1 batch size, Adamw8bit optimizer, 512 resolution, maybe adding the /lowvram, true/ argument, and plausibly specifying alternate quantization variants.* <br>
66
+ Generally, VRAM expenditures for fine-tuning SD3.5 tend to be lower than for Flux during training.<br>
67
+ So, try it!<br>
68
+
69
+
70
+ **To use on Colab**, modify a Flux template Notebook from [here](https://github.com/ostris/ai-toolkit/tree/main/notebooks) with parameters from Ostris' example config for SD3.5 [here](https://github.com/ostris/ai-toolkit/blob/main/config/examples/train_lora_sd35_large_24gb.yaml)! <br>
71
+ **My Colab config report/example below!** <br> *(Including the version of block-specification network arguments syntax that works on ai-toolkit via Colab, at least for Flux...)* <br>
72
+
73
+ ```
74
+ from collections import OrderedDict
75
+
76
+ job_to_run = OrderedDict([
77
+ ('job', 'extension'),
78
+ ('config', OrderedDict([
79
+ # this name will be the folder and filename name
80
+ ('name', 'HSTsd3v'),
81
+ ('process', [
82
+ OrderedDict([
83
+ ('type', 'sd_trainer'),
84
+ # root folder to save training sessions/samples/weights
85
+ ('training_folder', '/content/drive/MyDrive/HSTsd3v'),
86
+ # uncomment to see performance stats in the terminal every N steps
87
+ ('performance_log_every', 600),
88
+ ('device', 'cuda:0'),
89
+ # if a trigger word is specified, it will be added to captions of training data if it does not already exist
90
+ # alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
91
+ ('HST', 'photo'),
92
+ # ('network', OrderedDict([
93
+ # ('type', 'lora'),
94
+ # ('linear', 64),
95
+ # ('linear_alpha', 64)
96
+ # ])),
97
+
98
+ ('network', OrderedDict([
99
+ ('type', 'lora'),
100
+ ('linear', 256),
101
+ ('linear_alpha', 256),
102
+ ('network_kwargs', OrderedDict([
103
+ ('only_if_contains', "transformer.transformer_blocks.{12}")]))
104
+ ])),
105
+ ('save', OrderedDict([
106
+ ('dtype', 'float16'), # precision to save
107
+ ('save_every', 250), # save every this many steps
108
+ ('push_to_hub', True),
109
+ ('hf_repo_id', 'AlekseyCalvin/HSTsd3v'),
110
+ ('hf_private', False),
111
+ ('max_step_saves_to_keep', 10) # how many intermittent saves to keep
112
+ ])),
113
+ ('datasets', [
114
+ # datasets are a folder of images. captions need to be txt files with the same name as the image
115
+ # for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently
116
+ # images will automatically be resized and bucketed into the resolution specified
117
+ OrderedDict([
118
+ ('folder_path', '/content/dataset'),
119
+ ('caption_ext', 'txt'),
120
+ ('caption_dropout_rate', 0.05), # will drop out the caption 5% of time
121
+ ('shuffle_tokens', False), # shuffle caption order, split by commas
122
+ ('cache_latents_to_disk', True), # leave this true unless you know what you're doing
123
+ ('resolution', [1024])
124
+ ])
125
+ ]),
126
+ ('train', OrderedDict([
127
+ ('batch_size', 1),
128
+ ('steps', 4000), # total number of steps to train 500 - 4000 is a good range
129
+ ('gradient_accumulation_steps', 1),
130
+ \
131
+ ('train_unet', True),
132
+ ('train_text_encoder', False), # May not fully work with SD3 yet
133
+ ('gradient_checkpointing', True), # need the on unless you have a ton of vram
134
+ ('noise_scheduler', 'flowmatch'), # for training only
135
+ ('timestep_type', 'linear'), # linear or sigmoid
136
+ ('optimizer', 'adamw8bit'),
137
+ ('lr', 4e-5),
138
+
139
+ # uncomment this to skip the pre training sample
140
+ ('skip_first_sample', True),
141
+
142
+ # uncomment to completely disable sampling
143
+ # ('disable_sampling', True),
144
+
145
+ # uncomment to use new vell curved weighting. Experimental but may produce better results
146
+ #('linear_timesteps', True),
147
+
148
+ # ema will smooth out learning, but could slow it down. Recommended to leave on.
149
+ ('ema_config', OrderedDict([
150
+ ('use_ema', True),
151
+ ('ema_decay', 0.99)
152
+ ])),
153
+
154
+ # will probably need this if gpu supports it for flux, other dtypes may not work correctly
155
+ ('dtype', 'bf16')
156
+ ])),
157
+ ('model', OrderedDict([
158
+ # huggingface model name or path
159
+ ('name_or_path', 'stabilityai/stable-diffusion-3.5-large'),
160
+ ('is_v3', True),
161
+ ('quantize', True), # run 8bit mixed precision
162
+ # low_vram is painfully slow to fuse in the adapter avoid it unless absolutely necessary
163
+ # ('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
164
+ ])),
165
+ ('sample', OrderedDict([
166
+ ('sampler', 'flowmatch'), # must match train.noise_scheduler
167
+ ('sample_every', 200), # sample every this many steps
168
+ ('width', 1024),
169
+ ('height', 1024),
170
+ ('prompts', [
171
+ # you can add [trigger] to the prompts here and it will be replaced with the trigger word
172
+ #'[trigger] holding a sign that says \'I LOVE PROMPTS!\'',
173
+ ' HST style communist poster with title text "JOIN RCA!", under an autochrome color photograph of Vladimir Lenin at a Cabaret in Zurich circa 1916, dancing with a red feathered drunken anarchist, an early conceptual artist. Singing to his new dancing partner, Lenin has a face full of contageous awe, his moderately blemished lined skin with visible pores flushing with anxious excitement, his bald spot sweatily glistening under warm lights. Behind, Krupskaya and Inessa Armand sit side-by-side at a bar stand uproriously laughing at the dancers.',
174
+ 'HST autochrome style analog dslr award-winning 8k art photo showing a nurse battling a giant scattered pill creature, above text caption of \PILLZAR! WHERE PILLS ARE!\, in a highly realistic modern American medical hospital ',
175
+ 'HST style photograph of a dark CIA agent Koala leaping at an excited Julian Assange and trying to steal his pills from his pockets, caption /PILLZAR! WHERE YOUR PILLS ARE!/, award-winning art photo',
176
+ 'HST style photo of realistic green-eyed black and white furred cat playing a piano and singing while pills rain from the sky, large 3d font caption text of /PILLZAR! WHERE YOUR PILLS ARE!/ amateur photo shot on a cell phone',
177
+ 'HST style autochrome photo poster with 3d title / PILLZAR! THAT IS WHERE YOUR PILLS ARE! / of an ether-drugged Pikachu sitting in a white plastic cylindical stacked medication dispenser with an unscrewable top, while a gowned Marina Tsvetaeva gently pets pikachu on the head, David Lynch and Mucha styles, detailed faces, in a European city circa 1920, lifelike anatomy'
178
+ ]),
179
+ ('neg', 'wrong, broken, warped, unrealistic, untextured, misspelling, messy, bad quality'), # not used on flux
180
+ ('seed', 42),
181
+ ('walk_seed', True),
182
+ ('guidance_scale', 4), # schnell does not do guidance
183
+ ('sample_steps', 25) # 1 - 4 works well
184
+ ]))
185
+ ])
186
+ ])
187
+ ])),
188
+ # you can add any additional meta info here. [name] is replaced with config name at top
189
+ ('meta', OrderedDict([
190
+ ('name', '[name]'),
191
+ ('version', '1.0')
192
+ ]))
193
+ ])
194
+ ```
195
 
196
  ## Download model and use it with ComfyUI, AUTOMATIC1111, SD.Next, Invoke AI, etc.
197