File size: 2,461 Bytes
ca86f9b b8fe228 ca86f9b b8fe228 a4f3c31 b8fe228 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: creativeml-openrail-m
tags:
- computer vision
- stable-diffusion
- stable-diffusion-2-1
- photography
- photoreal
---
# Deprecation notice
This model was a research project that is deprecated in favour of ptx0/pseudo-flex-base
# Capabilities
This model is capable of producing photorealistic images of people.
It retains much of the base 2.1-v model knowledge, as its text encoder is minimally tuned.
# Limitations
This model does not produce perfect results every time.
This model cannot reproduce most real people. Instead, it makes "Derp-a-Like" equivalents to real people, which I prefer.
This model is not great at abstract imagery or digital art, though it certainly can produce a variety of amazing art styles.
# Dataset
* cushman (8000 kodachrome slides from 1939 to 1969)
* midjourney v5.1-filtered (about 22,000 upscaled v5.1 images)
* national geographic (about 3-4,000 >1024x768 images of animals, wildlife, landscapes, history)
* a small dataset of stock images of people vaping / smoking
# Training parameters
* polynomial learning rate scheduler shared between TE and Unet starting at 4e-8 and decaying to 1e-8
* batch size 15, gradient accumulations 10 => effective BS=150
* target is 30,000 steps but will likely stop sooner
* terminal SNR enforced betas
# Training goals
* explore the effects of terminal SNR scheduling
* improve faces, especially "at a distance"
* improve composition, eg. completeness of resulting image
* improve prompt comprehension, eg. "do what i want, even if it is weird"
* retain / introduce a slightly colourful flavour due to the midjourney data
* enhance understanding of the past, through the Cushman collection
* retain the ability to produce natural landscapes and animals via National Geographic
# Observations
* at 1650 steps, we still haven't cracked the code on faces.
* at 250 steps, we had amazing photoreal Mars landscapes that have carried forward mostly to 1650 steps
* lighting and composition are at their best
# Future work
This model inspired the search for a solution to the proliferation issue that led me to ttj/flex-diffusion-2-1, which led to the creation of ptx0/pseudo-flex-base, another photoreal model with multiple aspect support.
This model was trained **purely** on 768x768 square images, which were randomly resized and cropped. It can produce some higher resolution landscapes, but it cannot reliably do higher resolution subjects without deformities. |