What is this?

This is not a permenantly released model yet. More like an open alpha work-in-progress. I think that this has reached parity with existing SD base, for human output. I'm going to keep training, but wanted to share what I had so far.

SD1.5 base, with the SDXL VAE tacked on, and then retrained to actually WORK.

Latest News: 2025/03/21

(Currently up to epoch 32)

I... hit the wrong button on my desktop, and killed the program, just after 1 million steps :(

That's the bad news. The good news is, it probably had reached mostly where it was going to go on that dataset, so I'm swapping to our other large one.

How to use

Use it like any other SD1.5 model

How it was trained

This is an fp32 model, trained with full fp32 precision, finetuned on a single 4090 (starting from the SD1.5 base + SDXL VAE).

The current version was trained on:

opendiffusionai/laion2b-45ish-1120px,moondream captions
opendiffusionai/laion2b-45ish-1120px, wd14 captions
opendiffusionai/laion2b-squareish-1024px, moondream captions
opendiffusionai/laion2b-squareish-1536px, wd14 captions.

The 1024px was a mistake: i meant to use the 1536px. the 1024px dataset is 3x as large!! But by the time I noticed it, it was 2 days in, and I thought "what the heck, let it run"

Recreating the model

All the datasets are already mentioned. Use img2dataset to download, from When you want to duplicate the 45ish images with wd14 captions, you dont need to redownload. You can just duplicate the directories using the method of your choice (I like using 'lndir' on linux) and then removing the redundant txt files.

Once that is done, you can use https://github.com/ppbrown/vlm-utils/blob/main/dataset_scripts/extracttxtfromjsonl.py to extract txt files from the jsonl.gz file

Finally, copy in the OneTrainer-XLsd32-phase1-LaionPlusWD.json config file for OneTrainer, define the "concept" files in OneTrainer, and start the training session.

Training samples

Here are some training samples.

Im taking samples every 2000 steps. The samples here give the impression that it is a somewhat linear prorgression, but it is definitely NOT. During one epoch, the output tends to cycle between various aspects of the dataset. I have deliberately cherrypicked samples that have looped back to a common root image.