---
license: apache-2.0
language:
- en
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
pipeline_tag: text-to-image
tags:
- art
---
# SDXL-ProteusSigma Training with ZTSNR and NovelAI V3 Improvements

- [x] 10k dataset proof of concept (completed)[link](https://huggingface.co/dataautogpt3/ProteusSigma)

- [ ] 200k+ dataset finetune (in testing/training)

- [ ] 12M million dataset finetune (planned)

<style>
.logo-container {
    position: relative;
    text-align: center;
    margin: 40px 0;
}

.text-layer {
    font-family: 'Arial Black', 'Helvetica', sans-serif;
    font-size: 72px;
    font-weight: bold;
    white-space: nowrap;
}

.text-base {
    position: relative;
    color: #ff71ce;
    text-shadow: 2px 2px 0 #ff00ff;
}

.text-overlay {
    position: absolute;
    left: 50%;
    top: 50%;
    transform: translate(-49%, -47%); /* Slightly offset */
    color: #01cdfe;
    text-shadow: -2px -2px 0 #00ffff;
    opacity: 0.8;
    mix-blend-mode: screen;
}

.sigma {
    color: #00ffff;
    text-shadow: 
        2px 2px 0 #ff00ff,
        -2px -2px 0 #00ffff;
}
</style>

<div class="logo-container">
    <div class="text-layer text-overlay">
        Proteus<span class="sigma">Σ</span>
    </div>
    <div class="text-layer text-base">
        Proteus<span class="sigma">Σ</span>
    </div>
</div>

## Example Outputs

<style>
.gallery {
    display: flex;
    flex-direction: row;
    flex-wrap: wrap;
    gap: 10px;
    justify-content: center;
    align-items: center;
    width: 100%;
    padding: 10px;
}

.gallery-item {
    flex: 0 0 300px;
    margin: 0;
    position: relative;
}

.gallery-item.large {    /* New class for larger item */
    flex: 0 0 340px;
}

.gallery img {
    width: 300px;
    cursor: pointer;
    transition: transform 0.2s;
    border-radius: 8px;
}

.gallery-item.large img {  /* Larger size for last image */
    width: 512px;
}

.gallery img:hover {
    transform: scale(1.05);
}

.caption {
    position: absolute;
    bottom: 0;
    left: 0;
    right: 0;
    background: rgba(0, 0, 0, 0.4);
    color: white;
    padding: 8px;
    font-size: 11px;
    border-bottom-left-radius: 8px;
    border-bottom-right-radius: 8px;
    opacity: 0.7;
    transition: opacity 0.3s ease;
}

.gallery-item:hover .caption {
    opacity: 0.2;
}

.modal {
    display: none;
    position: fixed;
    z-index: 1000;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    background-color: rgba(0,0,0,0.9);
    padding: 20px;
    box-sizing: border-box;
}

.modal img {
    max-width: 90%;
    max-height: 90vh;
    margin: auto;
    display: block;
    position: relative;
    top: 50%;
    transform: translateY(-50%);
}

.modal.active {
    display: block;
}
</style>

<div class="gallery">
    <div class="gallery-item">
        <img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example.png" alt="Example Output 1" onclick="showImage(this.src)"/>
        <div class="caption">A digital illustration of a lich with long grey hair and beard, as a university professor wearing a formal suit and standing in front of a class, writing on a whiteboard. He holds a marker, writing complex equations or magical symbols on the whiteboard.</div>
    </div>
    <div class="gallery-item">
        <img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example2.png" alt="Example Output 2" onclick="showImage(this.src)"/>
        <div class="caption">A Candid Photo of a real short grey alien peering around a corner while trying to hide from the viewer in a living room, real photography, fujifilm superia, full HD, taken on a Canon EOS R5 F1.2 ISO100 35MM</div>
    </div>
    <div class="gallery-item">
        <img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example3.png" alt="Example Output 3" onclick="showImage(this.src)"/>
    </div>
    <div class="gallery-item">
        <img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example4.png" alt="Example Output 4" onclick="showImage(this.src)"/>
    </div>
    <div class="gallery-item large">  <!-- Added 'large' class -->
        <img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example5.png" alt="Example Output 5" onclick="showImage(this.src)"/>
    </div>
</div>

<div class="modal" onclick="this.classList.remove('active')">
    <img id="modal-img" src="" alt="Full size image"/>
</div>

<script>
function showImage(src) {
    document.getElementById('modal-img').src = src;
    document.querySelector('.modal').classList.add('active');
}
</script>


# Combined Proteus and Mobius datasets with ZTSNR and NovelAI V3 Improvements

# Recommended Inference Parameters

[Example ComfyUI workflow](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/src/inference/Comfyui-zsnrnode/ztsnr%2Bv-pred.json)

## Installation

1. Install the custom nodes:
```bash
cd /path/to/ComfyUI/custom_nodes
git clone https://github.com/DataCTE/SDXL-Training-Improvements.git
mv SDXL-Training-Improvements/src/inference/Comfyui-zsnrnode ./zsnrnode
```
Restart ComfyUI to load the new nodes
Load the example workflow from the link above

Recommended Settings

Sampler: dpmpp_2m
Scheduler: Karras (Normal noise schedule)
Steps: 28 (Optimal step count)
CFG: 3.0 to 5.5 (Classifier-free guidance scale)

## Model Details

- **Model Type:** SDXL Fine-tuned with ZTSNR and NovelAI V3 Improvements
- **Base Model:** stabilityai/stable-diffusion-xl-base-1.0
- **Training Dataset:** 10,000 high-quality images
- **License:** Apache 2.0

## Key Features

- Zero Terminal SNR (ZTSNR) implementation
- Increased σ_max ≈ 20000.0 (NovelAI research)
- High-resolution coherence enhancements
- Tag-based CLIP weighting
- VAE improvements

### Technical Specifications

- **Noise Schedule**: σ_max ≈ 20000.0 to σ_min ≈ 0.0292
- **Progressive Steps**: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292]
- **Resolution Scaling**: √(H×W)/1024

## Training Details

### Training Configuration
- **Learning Rate:** 1e-6
- **Batch Size:** 1
- **Gradient Accumulation Steps:** 1
- **Optimizer:** AdamW
- **Precision:** bfloat16
- **VAE Finetuning:** Enabled
- **VAE Learning Rate:** 1e-6

### CLIP Weight Configuration
- **Character Weight:** 1.5
- **Style Weight:** 1.2
- **Quality Weight:** 0.8
- **Setting Weight:** 1.0
- **Action Weight:** 1.1
- **Object Weight:** 0.9


## Performance Improvements

- 47% fewer artifacts at σ < 5.0
- Stable composition at σ > 12.4
- 31% better detail consistency
- Improved color accuracy
- Enhanced dark tone reproduction

## Repository and Resources

- **GitHub Repository:** [SDXL-Training-Improvements](https://github.com/DataCTE/SDXL-Training-Improvements)
- **Training Code:** Available in the repository
- **Documentation:** [Implementation Details](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/README.md)
- **Issues and Support:** [GitHub Issues](https://github.com/DataCTE/SDXL-Training-Improvements/issues)

## Citation

```bibtex
@article{ossa2024improvements,
  title={Improvements to SDXL in NovelAI Diffusion V3},
  author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
  journal={arXiv preprint arXiv:2409.15997v2},
  year={2024}
}
```