ProteusSigma / README.md

Update README.md

67b082e verified 2 months ago

7.23 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- stabilityai/stable-diffusion-xl-base-1.0
	pipeline_tag: text-to-image
	tags:
	- art
	---
	# SDXL-ProteusSigma Training with ZTSNR and NovelAI V3 Improvements

	- [x] 10k dataset proof of concept (completed)[link](https://huggingface.co/dataautogpt3/ProteusSigma)

	- [ ] 200k+ dataset finetune (in testing/training)

	- [ ] 12M million dataset finetune (planned)

	<style>
	.logo-container {
	position: relative;
	text-align: center;
	margin: 40px 0;
	}

	.text-layer {
	font-family: 'Arial Black', 'Helvetica', sans-serif;
	font-size: 72px;
	font-weight: bold;
	white-space: nowrap;
	}

	.text-base {
	position: relative;
	color: #ff71ce;
	text-shadow: 2px 2px 0 #ff00ff;
	}

	.text-overlay {
	position: absolute;
	left: 50%;
	top: 50%;
	transform: translate(-49%, -47%); /* Slightly offset */
	color: #01cdfe;
	text-shadow: -2px -2px 0 #00ffff;
	opacity: 0.8;
	mix-blend-mode: screen;
	}

	.sigma {
	color: #00ffff;
	text-shadow:
	2px 2px 0 #ff00ff,
	-2px -2px 0 #00ffff;
	}
	</style>

	<div class="logo-container">
	<div class="text-layer text-overlay">
	Proteus<span class="sigma">Σ</span>
	</div>
	<div class="text-layer text-base">
	Proteus<span class="sigma">Σ</span>
	</div>
	</div>

	## Example Outputs

	<style>
	.gallery {
	display: flex;
	flex-direction: row;
	flex-wrap: wrap;
	gap: 10px;
	justify-content: center;
	align-items: center;
	width: 100%;
	padding: 10px;
	}

	.gallery-item {
	flex: 0 0 300px;
	margin: 0;
	position: relative;
	}

	.gallery-item.large { /* New class for larger item */
	flex: 0 0 340px;
	}

	.gallery img {
	width: 300px;
	cursor: pointer;
	transition: transform 0.2s;
	border-radius: 8px;
	}

	.gallery-item.large img { /* Larger size for last image */
	width: 512px;
	}

	.gallery img:hover {
	transform: scale(1.05);
	}

	.caption {
	position: absolute;
	bottom: 0;
	left: 0;
	right: 0;
	background: rgba(0, 0, 0, 0.4);
	color: white;
	padding: 8px;
	font-size: 11px;
	border-bottom-left-radius: 8px;
	border-bottom-right-radius: 8px;
	opacity: 0.7;
	transition: opacity 0.3s ease;
	}

	.gallery-item:hover .caption {
	opacity: 0.2;
	}

	.modal {
	display: none;
	position: fixed;
	z-index: 1000;
	top: 0;
	left: 0;
	width: 100%;
	height: 100%;
	background-color: rgba(0,0,0,0.9);
	padding: 20px;
	box-sizing: border-box;
	}

	.modal img {
	max-width: 90%;
	max-height: 90vh;
	margin: auto;
	display: block;
	position: relative;
	top: 50%;
	transform: translateY(-50%);
	}

	.modal.active {
	display: block;
	}
	</style>

	<div class="gallery">
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example.png" alt="Example Output 1" onclick="showImage(this.src)"/>
	<div class="caption">A digital illustration of a lich with long grey hair and beard, as a university professor wearing a formal suit and standing in front of a class, writing on a whiteboard. He holds a marker, writing complex equations or magical symbols on the whiteboard.</div>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example2.png" alt="Example Output 2" onclick="showImage(this.src)"/>
	<div class="caption">A Candid Photo of a real short grey alien peering around a corner while trying to hide from the viewer in a living room, real photography, fujifilm superia, full HD, taken on a Canon EOS R5 F1.2 ISO100 35MM</div>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example3.png" alt="Example Output 3" onclick="showImage(this.src)"/>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example4.png" alt="Example Output 4" onclick="showImage(this.src)"/>
	</div>
	<div class="gallery-item large"> <!-- Added 'large' class -->
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example5.png" alt="Example Output 5" onclick="showImage(this.src)"/>
	</div>
	</div>

	<div class="modal" onclick="this.classList.remove('active')">
	<img id="modal-img" src="" alt="Full size image"/>
	</div>

	<script>
	function showImage(src) {
	document.getElementById('modal-img').src = src;
	document.querySelector('.modal').classList.add('active');
	}
	</script>


	# Combined Proteus and Mobius datasets with ZTSNR and NovelAI V3 Improvements

	# Recommended Inference Parameters

	[Example ComfyUI workflow](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/src/inference/Comfyui-zsnrnode/ztsnr%2Bv-pred.json)

	## Installation

	1. Install the custom nodes:
	```bash
	cd /path/to/ComfyUI/custom_nodes
	git clone https://github.com/DataCTE/SDXL-Training-Improvements.git
	mv SDXL-Training-Improvements/src/inference/Comfyui-zsnrnode ./zsnrnode
	```
	Restart ComfyUI to load the new nodes
	Load the example workflow from the link above

	Recommended Settings

	Sampler: dpmpp_2m

	Scheduler: Karras (Normal noise schedule)

	Steps: 28 (Optimal step count)

	CFG: 3.0 to 5.5 (Classifier-free guidance scale)

	## Model Details

	- Model Type: SDXL Fine-tuned with ZTSNR and NovelAI V3 Improvements
	- Base Model: stabilityai/stable-diffusion-xl-base-1.0
	- Training Dataset: 10,000 high-quality images
	- License: Apache 2.0

	## Key Features

	- Zero Terminal SNR (ZTSNR) implementation
	- Increased σ_max ≈ 20000.0 (NovelAI research)
	- High-resolution coherence enhancements
	- Tag-based CLIP weighting
	- VAE improvements

	### Technical Specifications

	- Noise Schedule: σ_max ≈ 20000.0 to σ_min ≈ 0.0292
	- Progressive Steps: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292]
	- Resolution Scaling: √(H×W)/1024

	## Training Details

	### Training Configuration
	- Learning Rate: 1e-6
	- Batch Size: 1
	- Gradient Accumulation Steps: 1
	- Optimizer: AdamW
	- Precision: bfloat16
	- VAE Finetuning: Enabled
	- VAE Learning Rate: 1e-6

	### CLIP Weight Configuration
	- Character Weight: 1.5
	- Style Weight: 1.2
	- Quality Weight: 0.8
	- Setting Weight: 1.0
	- Action Weight: 1.1
	- Object Weight: 0.9


	## Performance Improvements

	- 47% fewer artifacts at σ < 5.0
	- Stable composition at σ > 12.4
	- 31% better detail consistency
	- Improved color accuracy
	- Enhanced dark tone reproduction

	## Repository and Resources

	- GitHub Repository: [SDXL-Training-Improvements](https://github.com/DataCTE/SDXL-Training-Improvements)
	- Training Code: Available in the repository
	- Documentation: [Implementation Details](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/README.md)
	- Issues and Support: [GitHub Issues](https://github.com/DataCTE/SDXL-Training-Improvements/issues)

	## Citation

	```bibtex
	@article{ossa2024improvements,
	title={Improvements to SDXL in NovelAI Diffusion V3},
	author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
	journal={arXiv preprint arXiv:2409.15997v2},
	year={2024}
	}
	```