Terminus XL Gamma
Model Details
Model Description
Terminus XL Gamma is a new state-of-the-art latent diffusion model that uses zero-terminal SNR noise schedule and velocity prediction objective at training and inference time.
Terminus is based on the same architecture as SDXL, and has the same layout. It has been trained on fewer steps with very high quality data captions via COCO and Midjourney.
This model will not be capable of as many concepts as SDXL, and some subjects will simply look very bad.
The objective of this model was to use min-SNR gamma loss to efficiently train a full model on a single A100-80G.
- Developed by: pseudoterminal X (@bghira)
- Funded by: pseudoterminal X (@bghira)
- Model type: Latent Diffusion
- License: openrail++
- Architecture: SDXL
Model Sources
- Repository: https://github.com/bghira/SimpleTuner
Uses
Direct Use
Terminus XL Gamma can be used for generating high-quality images given text prompts. It should particularly excel at inpainting tasks, where a zero-terminal SNR noise schedule allows it to more effectively retain contrast.
The model can be utilized in creative industries such as art, advertising, and entertainment to create visually appealing content.
Downstream Use
Terminus XL Gamma can be fine-tuned for specific tasks such as image super-resolution, style transfer, and more.
Out-of-Scope Use
The model is not designed for tasks outside of image generation. It should not be used to produce harmful content, or deceive others. Please use common sense.
Bias, Risks, and Limitations
The model might exhibit biases present in the training data. The generated images should be carefully reviewed to ensure they meet ethical and societal standards.
Recommendations
Users should be cautious of potential biases in the generated images and thoroughly review them before use.
Training Details
Training Data
This model's success largely depended on a somewhat small collection of very high quality data samples.
- LAION-HD, filtered down to EXIF samples without watermarks. Luminance value of samples capped to 100 (.5).
- Midjourney 5.2 dataset
ptx0/mj-general
with zero filtration.
Training Procedure
Preprocessing
Followed SDXL's pretraining procedure using crop conditional inputs and centre-cropped images with their full size as the input.
Trained on 512x512, followed by 768x768, and finally, ~1 megapixel multi-aspect training for the rest of the training time.
Images were downsampled while maintaining aspect ratio and cropped on 64 pixel increments. Many aspect ratios were trained, but only a few are likely to work fully.
Training Hyperparameters
- Training regime: bf16 mixed precision
- Learning rate: (4 \times 10^{-7}) to (8 \times 10^{-7}), cosine schedule
- Epochs: 60
- Batch size: 24 * 15 = 360
Speeds, Sizes, Times
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
[More Information Needed]
Environmental Impact
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
The model uses an SDXL-compatible latent diffusion architecture with a unique min-SNR augmented velocity objective.
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary
[More Information Needed]
More Information
[More Information Needed]
Model Card Authors
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 8