Proteus-v0.6 / README.md
dataautogpt3's picture
Update README.md
fdf5d71 verified
---
pipeline_tag: text-to-image
widget:
- text: >-
The image features an older man, a long white beard and mustache, He has a
stern expression, giving the impression of a wise and experienced
individual. The mans beard and mustache are prominent, adding to his
distinguished appearance. The close-up shot of the mans face emphasizes his
facial features and the intensity of his gaze.
output:
url: assets/oldman.png
- text: >-
Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass
flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green
used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty,
noisy, Vintage monk style, very detailed, hd
output:
url: assets/swordwoman.png
- text: >-
cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An
Oscar winning movie for Best Cinematography a woman in a kimono standing on
a subway train in Japan Kodak Motion Picture Film Style, shallow depth of
field, vignette, highly detailed, high budget, bokeh, cinemascope, moody,
epic, gorgeous, film grain, grainy
output:
url: assets/japanesewoman.png
- text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning
output:
url: assets/logo.png
language:
- en
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
tags:
- art
---
<Gallery />
# Proteus v0.6
I'm excited to introduce **Proteus v0.6**, a complete rebuild of my AI image generation model. This is the **first version of the rework**, focusing entirely on enhancing photorealism. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a **preliminary version**, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates.
## Overview
Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset.
For now, I'm calling this new training technique **Multi-Perspective Fusion**.
### Multi-Perspective Fusion
This approach involves:
- **Training Multiple LoRAs and Full-Parameter Checkpoints**: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data.
- **Integrating into an Overarching Framework**: These varied models are then combined within a larger framework to enhance overall performance.
I'm hoping this method will be interesting to data scientists exploring advanced training techniques.
## Key Improvements in v0.6
- **Total Rebuild**: Constructed entirely from scratch to address previous issues.
- **Enhanced Photorealism**: Focused on producing good-quality photorealistic images.
- **Stable Training Process**: Refined training methods to prevent the model from falling apart during large-scale training.
- **Preliminary Version**: This is the first version of the rework; expect more features and improvements in future releases.
## Limitations
- **No Illustrations or Anime**: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data.
- **Not State-of-the-Art**: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point.
- **Work in Progress**: This is not the final, fully-featured checkpoint. More updates are planned.
## Usage
### Recommended Settings
- **Clip Skip**: 1
- **CFG Scale**: 7
- **Steps**: 25 - 50
- **Sampler**: DPM++ 2M SDE
- **Scheduler**: Karras
- **Resolution**: 1024x1024
### Use it with 🧨 diffusers
Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library:
```python
import torch
from diffusers import (
StableDiffusionXLPipeline,
KDPM2AncestralDiscreteScheduler,
AutoencoderKL
)
# Load VAE component
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16
)
# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"dataautogpt3/Proteus-v0.6",
vae=vae,
torch_dtype=torch.float16
)
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
# Define prompts and generate image
prompt = "a cat wearing sunglasses on the beach"
negative_prompt = ""
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=7,
num_inference_steps=50,
).images[0]
image.save("generated_image.png")
```
## Future Plans
Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable.
## Collaborations
If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance.
## License
**License Options:**
Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses:
### Polyform Small Business License 1.0.0
- **Permits**: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $5 million USD).
- **Requires**: Entities exceeding the revenue threshold to obtain a commercial license from me.
For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/).
## Acknowledgments
This is a personal project developed solely by me.
---
**Citation**
If you use Proteus v0.6 in your work, please cite it as:
\[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024.
---