What is this?
This is not a permenantly released model yet. More like an open alpha work-in-progress.
Latest News: 2025/03/07
(Currently up to epoch 11)
SD1.5 base, with the SDXL VAE tacked on, and then retrained to actually WORK.
I think that this has reached parity with existing SD base, for human output. I'm going to keep training, but wanted to share what I had so far.
How to use
Use it like any other SD1.5 model
How it was trained
This is an fp32 model, trained with full fp32 precision, finetuned on a single 4090 (starting from the SD1.5 base + SDXL VAE).
The current version was trained on:
- opendiffusionai/laion2b-45ish-1120px,moondream captions
- opendiffusionai/laion2b-45ish-1120px, wd14 captions
- opendiffusionai/laion2b-squareish-1024px, moondream captions
- opendiffusionai/laion2b-squareish-1536px, wd14 captions.
The 1024px was a mistake: i meant to use the 1536px. the 1024px dataset is 3x as large!! But by the time I noticed it, it was 2 days in, and I thought "what the heck, let it run"
Recreating the model
All the datasets are already mentioned. Use img2dataset to download, from When you want to duplicate the 45ish images with wd14 captions, you dont need to redownload. You can just duplicate the directories using the method of your choice (I like using 'lndir' on linux) and then removing the redundant txt files.
Once that is done, you can use https://github.com/ppbrown/vlm-utils/blob/main/dataset_scripts/extracttxtfromjsonl.py to extract txt files from the jsonl.gz file
Finally, copy in the OneTrainer-XLsd32-phase1-LaionPlusWD.json config file for OneTrainer, define the "concept" files in OneTrainer, and start the training session.
Training samples
Here are some training samples.
Im taking samples every 2000 steps. The samples here give the impression that it is a somewhat linear prorgression, but it is definitely NOT. During one epoch, the output tends to cycle between various aspects of the dataset. I have deliberately cherrypicked samples that have looped back to a common root image.
Epoch 0 step 0
What the straight merge looks like, with no training: