|
--- |
|
pipeline_tag: text-to-image |
|
widget: |
|
- text: >- |
|
The image features an older man, a long white beard and mustache, He has a |
|
stern expression, giving the impression of a wise and experienced |
|
individual. The mans beard and mustache are prominent, adding to his |
|
distinguished appearance. The close-up shot of the mans face emphasizes his |
|
facial features and the intensity of his gaze. |
|
output: |
|
url: assets/oldman.png |
|
- text: >- |
|
Super Closeup Portrait, action shot, Profoundly dark whitish meadow, glass |
|
flowers, Stains, space grunge style, Jeanne d'Arc wearing White Olive green |
|
used styled Cotton frock, Wielding thin silver sword, Sci-fi vibe, dirty, |
|
noisy, Vintage monk style, very detailed, hd |
|
output: |
|
url: assets/swordwoman.png |
|
- text: >- |
|
cinematic film still of Kodak Motion Picture Film: (Sharp Detailed Image) An |
|
Oscar winning movie for Best Cinematography a woman in a kimono standing on |
|
a subway train in Japan Kodak Motion Picture Film Style, shallow depth of |
|
field, vignette, highly detailed, high budget, bokeh, cinemascope, moody, |
|
epic, gorgeous, film grain, grainy |
|
output: |
|
url: assets/japanesewoman.png |
|
- text: ("Proteus" text logo) powerful aura, swirling power, cinematic, masterpiece, award-winning |
|
output: |
|
url: assets/logo.png |
|
language: |
|
- en |
|
base_model: |
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
tags: |
|
- art |
|
--- |
|
<Gallery /> |
|
|
|
# Proteus v0.6 |
|
|
|
I'm excited to introduce **Proteus v0.6**, a complete rebuild of my AI image generation model. This is the **first version of the rework**, focusing entirely on enhancing photorealism. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a **preliminary version**, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates. |
|
|
|
## Overview |
|
|
|
Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset. |
|
|
|
For now, I'm calling this new training technique **Multi-Perspective Fusion**. |
|
|
|
### Multi-Perspective Fusion |
|
|
|
This approach involves: |
|
|
|
- **Training Multiple LoRAs and Full-Parameter Checkpoints**: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data. |
|
- **Integrating into an Overarching Framework**: These varied models are then combined within a larger framework to enhance overall performance. |
|
|
|
I'm hoping this method will be interesting to data scientists exploring advanced training techniques. |
|
|
|
## Key Improvements in v0.6 |
|
|
|
- **Total Rebuild**: Constructed entirely from scratch to address previous issues. |
|
- **Enhanced Photorealism**: Focused on producing good-quality photorealistic images. |
|
- **Stable Training Process**: Refined training methods to prevent the model from falling apart during large-scale training. |
|
- **Preliminary Version**: This is the first version of the rework; expect more features and improvements in future releases. |
|
|
|
## Limitations |
|
|
|
- **No Illustrations or Anime**: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data. |
|
- **Not State-of-the-Art**: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point. |
|
- **Work in Progress**: This is not the final, fully-featured checkpoint. More updates are planned. |
|
|
|
## Usage |
|
### Recommended Settings |
|
|
|
- **Clip Skip**: 1 |
|
- **CFG Scale**: 7 |
|
- **Steps**: 25 - 50 |
|
- **Sampler**: DPM++ 2M SDE |
|
- **Scheduler**: Karras |
|
- **Resolution**: 1024x1024 |
|
|
|
### Use it with 🧨 diffusers |
|
|
|
Here's how you can use Proteus v0.6 with the Hugging Face 🧨 diffusers library: |
|
|
|
```python |
|
import torch |
|
from diffusers import ( |
|
StableDiffusionXLPipeline, |
|
KDPM2AncestralDiscreteScheduler, |
|
AutoencoderKL |
|
) |
|
|
|
# Load VAE component |
|
vae = AutoencoderKL.from_pretrained( |
|
"madebyollin/sdxl-vae-fp16-fix", |
|
torch_dtype=torch.float16 |
|
) |
|
|
|
# Configure the pipeline |
|
pipe = StableDiffusionXLPipeline.from_pretrained( |
|
"dataautogpt3/Proteus-v0.6", |
|
vae=vae, |
|
torch_dtype=torch.float16 |
|
) |
|
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config) |
|
pipe.to('cuda') |
|
|
|
# Define prompts and generate image |
|
prompt = "a cat wearing sunglasses on the beach" |
|
negative_prompt = "" |
|
|
|
image = pipe( |
|
prompt, |
|
negative_prompt=negative_prompt, |
|
width=1024, |
|
height=1024, |
|
guidance_scale=7, |
|
num_inference_steps=50, |
|
).images[0] |
|
|
|
image.save("generated_image.png") |
|
``` |
|
## Future Plans |
|
|
|
Following the approach from the first version, I plan to gradually introduce new concepts and visual styles by adding one large training batch at a time. This incremental method aims to expand the model's capabilities while keeping it stable. |
|
|
|
## Collaborations |
|
|
|
If anyone is interested, I'd be open to collaborating on papers about this work. I'm looking for a team to help me publish, but I'm new to this and would appreciate any guidance. |
|
|
|
## License |
|
|
|
**License Options:** |
|
|
|
Given my goal to allow personal use and commercial use up to a certain revenue threshold while requiring larger entities to contact me for a separate agreement, I'm considering the following existing licenses: |
|
|
|
### Polyform Small Business License 1.0.0 |
|
|
|
- **Permits**: Use by individuals and entities with annual gross revenues under a specified amount (e.g., $5 million USD). |
|
- **Requires**: Entities exceeding the revenue threshold to obtain a commercial license from me. |
|
|
|
For more details, see the [Polyform Small Business License](https://polyformproject.org/licenses/small-business/1.0.0/). |
|
|
|
|
|
## Acknowledgments |
|
|
|
This is a personal project developed solely by me. |
|
|
|
--- |
|
|
|
**Citation** |
|
|
|
If you use Proteus v0.6 in your work, please cite it as: |
|
|
|
\[Alexander Rafael Izquierdo\], "Proteus v0.6: Multi-Perspective Fusion," 2024. |
|
|
|
--- |