Proteus-v0.6 / README.md

Update README.md

fdf5d71 verified 3 months ago

6.15 kB

	---
	pipeline_tag: text-to-image
	widget:
	- text: >-
	The image features an older man, a long white beard and mustache, He has a
	stern expression, giving the impression of a wise and experienced
	individual. The mans beard and mustache are prominent, adding to his
	distinguished appearance. The close-up shot of the mans face emphasizes his
	facial features and the intensity of his gaze.
	output:
	url: assets/oldman.png
	- text: >-
	Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass
	flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green
	used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty,
	noisy, Vintage monk style, very detailed, hd
	output:
	url: assets/swordwoman.png
	- text: >-
	cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An
	Oscar winning movie for Best Cinematography a woman in a kimono standing on
	a subway train in Japan Kodak Motion Picture Film Style, shallow depth of
	field, vignette, highly detailed, high budget, bokeh, cinemascope, moody,
	epic, gorgeous, film grain, grainy
	output:
	url: assets/japanesewoman.png
	- text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning
	output:
	url: assets/logo.png
	language:
	- en
	base_model:
	- stabilityai/stable-diffusion-xl-base-1.0
	tags:
	- art
	---
	<Gallery />

	# Proteus v0.6

	I'm excited to introduce Proteus v0.6, a complete rebuild of my AI image generation model. This is the first version of the rework, focusing entirely on enhancing photorealism. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a preliminary version, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates.

	## Overview

	Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset.

	For now, I'm calling this new training technique Multi-Perspective Fusion.

	### Multi-Perspective Fusion

	This approach involves:

	- Training Multiple LoRAs and Full-Parameter Checkpoints: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data.
	- Integrating into an Overarching Framework: These varied models are then combined within a larger framework to enhance overall performance.

	I'm hoping this method will be interesting to data scientists exploring advanced training techniques.

	## Key Improvements in v0.6

	- Total Rebuild: Constructed entirely from scratch to address previous issues.
	- Enhanced Photorealism: Focused on producing good-quality photorealistic images.
	- Stable Training Process: Refined training methods to prevent the model from falling apart during large-scale training.
	- Preliminary Version: This is the first version of the rework; expect more features and improvements in future releases.

	## Limitations

	- No Illustrations or Anime: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data.
	- Not State-of-the-Art: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point.
	- Work in Progress: This is not the final, fully-featured checkpoint. More updates are planned.

	## Usage
	### Recommended Settings

	- Clip Skip: 1
	- CFG Scale: 7
	- Steps: 25 - 50
	- Sampler: DPM++ 2M SDE
	- Scheduler: Karras
	- Resolution: 1024x1024

	### Use it with 🧨 diffusers

	Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library:

	```python
	import torch
	from diffusers import (
	StableDiffusionXLPipeline,
	KDPM2AncestralDiscreteScheduler,
	AutoencoderKL
	)

	# Load VAE component
	vae = AutoencoderKL.from_pretrained(
	"madebyollin/sdxl-vae-fp16-fix",
	torch_dtype=torch.float16
	)

	# Configure the pipeline
	pipe = StableDiffusionXLPipeline.from_pretrained(
	"dataautogpt3/Proteus-v0.6",
	vae=vae,
	torch_dtype=torch.float16
	)
	pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
	pipe.to('cuda')

	# Define prompts and generate image
	prompt = "a cat wearing sunglasses on the beach"
	negative_prompt = ""

	image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	width=1024,
	height=1024,
	guidance_scale=7,
	num_inference_steps=50,
	).images[0]

	image.save("generated_image.png")
	```
	## Future Plans

	Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable.

	## Collaborations

	If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance.

	## License

	License Options:

	Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses:

	### Polyform Small Business License 1.0.0

	- Permits: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $5 million USD).
	- Requires: Entities exceeding the revenue threshold to obtain a commercial license from me.

	For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/).


	## Acknowledgments

	This is a personal project developed solely by me.

	---

	Citation

	If you use Proteus v0.6 in your work, please cite it as:

	\[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024.

	---