|
--- |
|
license: creativeml-openrail-m |
|
tags: |
|
- computer vision |
|
- stable-diffusion |
|
- stable-diffusion-2-1 |
|
- photography |
|
- photoreal |
|
--- |
|
|
|
# Deprecation notice |
|
|
|
This model was a research project that is deprecated in favour of ptx0/pseudo-flex-base |
|
|
|
# Capabilities |
|
|
|
This model is capable of producing photorealistic images of people. |
|
|
|
It retains much of the base 2.1-v model knowledge, as its text encoder is minimally tuned. |
|
|
|
# Limitations |
|
|
|
This model does not produce perfect results every time. |
|
|
|
This model cannot reproduce most real people. Instead, it makes "Derp-a-Like" equivalents to real people, which I prefer. |
|
|
|
This model is not great at abstract imagery or digital art, though it certainly can produce a variety of amazing art styles. |
|
|
|
# Dataset |
|
|
|
* cushman (8000 kodachrome slides from 1939 to 1969) |
|
* midjourney v5.1-filtered (about 22,000 upscaled v5.1 images) |
|
* national geographic (about 3-4,000 >1024x768 images of animals, wildlife, landscapes, history) |
|
* a small dataset of stock images of people vaping / smoking |
|
|
|
# Training parameters |
|
|
|
* polynomial learning rate scheduler shared between TE and Unet starting at 4e-8 and decaying to 1e-8 |
|
* batch size 15, gradient accumulations 10 => effective BS=150 |
|
* target is 30,000 steps but will likely stop sooner |
|
* terminal SNR enforced betas |
|
|
|
# Training goals |
|
|
|
* explore the effects of terminal SNR scheduling |
|
* improve faces, especially "at a distance" |
|
* improve composition, eg. completeness of resulting image |
|
* improve prompt comprehension, eg. "do what i want, even if it is weird" |
|
* retain / introduce a slightly colourful flavour due to the midjourney data |
|
* enhance understanding of the past, through the Cushman collection |
|
* retain the ability to produce natural landscapes and animals via National Geographic |
|
|
|
# Observations |
|
|
|
* at 1650 steps, we still haven't cracked the code on faces. |
|
* at 250 steps, we had amazing photoreal Mars landscapes that have carried forward mostly to 1650 steps |
|
* lighting and composition are at their best |
|
|
|
# Future work |
|
|
|
This model inspired the search for a solution to the proliferation issue that led me to ttj/flex-diffusion-2-1, which led to the creation of ptx0/pseudo-flex-base, another photoreal model with multiple aspect support. |
|
|
|
This model was trained **purely** on 768x768 square images, which were randomly resized and cropped. It can produce some higher resolution landscapes, but it cannot reliably do higher resolution subjects without deformities. |