ProteusSigma / README.md

Update README.md

9cb5883 verified 3 months ago

7.33 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- stabilityai/stable-diffusion-xl-base-1.0
	pipeline_tag: text-to-image
	tags:
	- art
	---
	# SDXL-ProteusSigma Training with ZTSNR and NovelAI V3 Improvements

	- [x] 10k dataset proof of concept (completed)[link](https://huggingface.co/dataautogpt3/ProteusSigma)

	- [ ] 200k+ dataset finetune (in testing/training)

	- [ ] 12M million dataset finetune (planned)

	<style>
	.logo {
	width: 600px;
	margin: 20px auto;
	display: block;
	}

	.logo-text {
	font-family: 'Arial', sans-serif;
	font-weight: bold;
	fill: none;
	stroke: #00ffff;
	stroke-width: 2;
	stroke-linejoin: round;
	stroke-dasharray: 1000;
	stroke-dashoffset: 1000;
	animation: draw 3s ease forwards, glow 2s ease-in-out infinite alternate;
	}

	@keyframes draw {
	to {
	stroke-dashoffset: 0;
	}
	}

	@keyframes glow {
	from {
	filter: drop-shadow(0 0 2px #00ffff) drop-shadow(0 0 4px #00ffff);
	}
	to {
	filter: drop-shadow(0 0 4px #00ffff) drop-shadow(0 0 8px #00ffff);
	}
	}
	</style>

	<svg class="logo" viewBox="0 0 800 100" xmlns="http://www.w3.org/2000/svg">
	<defs>
	<linearGradient id="gradient" x1="0%" y1="0%" x2="100%" y2="0%">
	<stop offset="0%" style="stop-color:#00ffff;stop-opacity:1" />
	<stop offset="100%" style="stop-color:#0099ff;stop-opacity:1" />
	</linearGradient>
	</defs>
	<text x="50%" y="50%" text-anchor="middle" class="logo-text" dominant-baseline="middle" font-size="60px">
	Proteus<tspan fill="url(#gradient)" stroke="none">Σ</tspan>igma
	</text>
	<text x="50%" y="75%" text-anchor="middle" class="logo-text" dominant-baseline="middle" font-size="20px" stroke-width="1">
	STABLE DIFFUSION XL
	</text>
	</svg>

	## Example Outputs

	<style>
	.gallery {
	display: flex;
	flex-direction: row;
	flex-wrap: wrap;
	gap: 10px;
	justify-content: center;
	align-items: center;
	width: 100%;
	padding: 10px;
	}

	.gallery-item {
	flex: 0 0 300px;
	margin: 0;
	position: relative;
	}

	.gallery-item.large { /* New class for larger item */
	flex: 0 0 340px;
	}

	.gallery img {
	width: 300px;
	cursor: pointer;
	transition: transform 0.2s;
	border-radius: 8px;
	}

	.gallery-item.large img { /* Larger size for last image */
	width: 512px;
	}

	.gallery img:hover {
	transform: scale(1.05);
	}

	.caption {
	position: absolute;
	bottom: 0;
	left: 0;
	right: 0;
	background: rgba(0, 0, 0, 0.4);
	color: white;
	padding: 8px;
	font-size: 11px;
	border-bottom-left-radius: 8px;
	border-bottom-right-radius: 8px;
	opacity: 0.7;
	transition: opacity 0.3s ease;
	}

	.gallery-item:hover .caption {
	opacity: 0.2;
	}

	.modal {
	display: none;
	position: fixed;
	z-index: 1000;
	top: 0;
	left: 0;
	width: 100%;
	height: 100%;
	background-color: rgba(0,0,0,0.9);
	padding: 20px;
	box-sizing: border-box;
	}

	.modal img {
	max-width: 90%;
	max-height: 90vh;
	margin: auto;
	display: block;
	position: relative;
	top: 50%;
	transform: translateY(-50%);
	}

	.modal.active {
	display: block;
	}
	</style>

	<div class="gallery">
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example.png" alt="Example Output 1" onclick="showImage(this.src)"/>
	<div class="caption">A digital illustration of a lich with long grey hair and beard, as a university professor wearing a formal suit and standing in front of a class, writing on a whiteboard. He holds a marker, writing complex equations or magical symbols on the whiteboard.</div>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example2.png" alt="Example Output 2" onclick="showImage(this.src)"/>
	<div class="caption">A Candid Photo of a real short grey alien peering around a corner while trying to hide from the viewer in a living room, real photography, fujifilm superia, full HD, taken on a Canon EOS R5 F1.2 ISO100 35MM</div>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example3.png" alt="Example Output 3" onclick="showImage(this.src)"/>
	</div>
	<div class="gallery-item">
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example4.png" alt="Example Output 4" onclick="showImage(this.src)"/>
	</div>
	<div class="gallery-item large"> <!-- Added 'large' class -->
	<img src="https://huggingface.co/dataautogpt3/ProteusSigma/resolve/main/example5.png" alt="Example Output 5" onclick="showImage(this.src)"/>
	</div>
	</div>

	<div class="modal" onclick="this.classList.remove('active')">
	<img id="modal-img" src="" alt="Full size image"/>
	</div>

	<script>
	function showImage(src) {
	document.getElementById('modal-img').src = src;
	document.querySelector('.modal').classList.add('active');
	}
	</script>


	# Combined Proteus and Mobius datasets.

	# Recommended Inference Parameters


	[ComfyUI workflow](https://huggingface.co/dataautogpt3/sdxl-ztsnr-sigma-10k/blob/main/ComfyUI-test10k.json)

	"sampler": "euler_ancestral", # Best results with Euler Ancestral

	"scheduler": "normal", # Normal noise schedule

	"steps": 28, # Optimal step count

	"cfg": 7.5 # Classifier-free guidance scale

	## Model Details

	- Model Type: SDXL Fine-tuned with ZTSNR and NovelAI V3 Improvements
	- Base Model: stabilityai/stable-diffusion-xl-base-1.0
	- Training Dataset: 10,000 high-quality images
	- License: Apache 2.0

	## Key Features

	- Zero Terminal SNR (ZTSNR) implementation
	- Increased σ_max ≈ 20000.0 (NovelAI research)
	- High-resolution coherence enhancements
	- Tag-based CLIP weighting
	- VAE improvements

	### Technical Specifications

	- Noise Schedule: σ_max ≈ 20000.0 to σ_min ≈ 0.0292
	- Progressive Steps: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292]
	- Resolution Scaling: √(H×W)/1024

	## Training Details

	### Training Configuration
	- Learning Rate: 1e-6
	- Batch Size: 1
	- Gradient Accumulation Steps: 1
	- Optimizer: AdamW
	- Precision: bfloat16
	- VAE Finetuning: Enabled
	- VAE Learning Rate: 1e-6

	### CLIP Weight Configuration
	- Character Weight: 1.5
	- Style Weight: 1.2
	- Quality Weight: 0.8
	- Setting Weight: 1.0
	- Action Weight: 1.1
	- Object Weight: 0.9


	## Performance Improvements

	- 47% fewer artifacts at σ < 5.0
	- Stable composition at σ > 12.4
	- 31% better detail consistency
	- Improved color accuracy
	- Enhanced dark tone reproduction

	## Repository and Resources

	- GitHub Repository: [SDXL-Training-Improvements](https://github.com/DataCTE/SDXL-Training-Improvements)
	- Training Code: Available in the repository
	- Documentation: [Implementation Details](https://github.com/DataCTE/SDXL-Training-Improvements/blob/main/README.md)
	- Issues and Support: [GitHub Issues](https://github.com/DataCTE/SDXL-Training-Improvements/issues)

	## Citation

	```bibtex
	@article{ossa2024improvements,
	title={Improvements to SDXL in NovelAI Diffusion V3},
	author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
	journal={arXiv preprint arXiv:2409.15997v2},
	year={2024}
	}
	```