|
--- |
|
license: openrail |
|
pipeline_tag: text-to-image |
|
tags: |
|
- Stable Diffusion |
|
- photorealistic |
|
- sd_1.5 |
|
--- |
|
|
|
https://civitai.com/models/597300/boltmonkey-photoreal?modelVersionId=667353 |
|
|
|
|
|
This is an extremely high-quality photorealistic SD1.5 model that I created as an offshoot to a business project of mine that I work on in my spare time. I believe in the open-source nature of AI and am gradually releasing some of my work that I do not intend to use for my ongoing project. I have been slowly developing this model for roughly a year. |
|
|
|
I have labelled this model as a merge but it is already 30+ iterations deep which include a substantial number of blockmerges and multiple fine-tunes along the way. |
|
|
|
The model is very realistic, especially for SD1.5. |
|
|
|
Hands are generally 5-fingered and not mangled, but overly complex or poor prompting style can result in amputations or distortions. |
|
|
|
Most textures are well-rendered, but I have found that extremely dusty environments (such as in a mine tunnel) look a bit too generic for my liking. |
|
|
|
Lighting and shadows are a strong point of the model. Particularly, volumetric lighting (such as light rays through misty or dusty atmosphere) is well-rendered. |
|
|
|
Most of my showcase uses animals, but the model is adept at generating humans, architecture, natural environments, food, etc... Though, I find that I have not trained enough on most forms of transport. |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/T8-3LROcMo6hSihTE4ddx.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/8mEs5A5HZp2el7i5nGhQA.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/2iKSk1Zx_muykrlepHiWL.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/PMBxD5xypOIMg4z1wHRZK.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/tb_zJ_Te6OvrMPNZuxOZC.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/j7v47skcSFbOCASEsGSOL.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/LAJwc4Bjj5zjCl3Nm545z.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/Ib2RGzy-YfxIiHEG-_9ew.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/L6h67yde-8cforZbRoPq7.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/w4QHbzjBedEL3SWCUeqti.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/pdjf16jNDgk00HCS7FEaz.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/SrIJbQTJ0ajNQPLuMJXWU.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/3dEkBxiJ6k9dJLC6jDD5f.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/_2xeTm1D1EiJMsB9T4_t9.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/A7b2RLXOLkluqSfxtIsKh.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/WXilW7ACN6ivAiAOFUza0.png) |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/AaQlA3LufIfecXJoBwelr.png) |
|
|
|
Suggestions for use |
|
|
|
I have a lot to learn about prompting from the collective CivitAI userbase, but here are a couple of things that I have found work well: |
|
|
|
TL;DR: |
|
|
|
DDIM, 15-40 steps, CFG ~2-10, Clipskip 1-4 (depending on use), LoRAs work well. |
|
|
|
|
|
|
|
This model works well with square and rectangular aspects. Resolutions of 768x+ work best but will sometimes result in duplications around 1024x. Having said that, 512x+ will still produce good images. |
|
|
|
The quality of this model's output is very realistic even with minimal prompting, but is exceptional with well-structured prompts. Moreover, this model works very well with LoRA's so long as you are cognisant of to the LoRA's training resolution (768+ work best). I don't use anime LoRA's so I can't offer any suggestions there, but I will be interested in your results if you try it. |
|
|
|
Good quality photorealistic images will result from extremely simple prompts (e.g., "cat") but the model responds very well to quality guidance prompts and some more complex prompting too. |
|
|
|
The following prompts are my go to: |
|
"ultrarealistic photography, 32k UHD, absurdres, natural light and shadows, volumetric lighting, natural skin textures, accurate attention to details, depth of field, sharp focus" |
|
|
|
Typically, I would use DPM++_3m_SDE_GPU as my sampler with SGM_uniform noise schedule, but I find that this model works best (to my taste) with DDIM sampler and DDIM_uniform noise schedule. |
|
|
|
15 Steps is enough to get good images most of the time, but I typically use 25-40. I have run a few generations with ComfyUI's maximum of 999 steps just to see how it fares. Obviously the results look great, but I see no real need to take it past 50 at the max. |
|
|
|
CFG is a difficult one to give a value for. A CFG of 2-4 work well, but sometimes I will take it as far as 10 depending on what I am generating. I suggest starting with a value of 4 and gauging it for yourself. Obviously, lower values give the model more freedom. |
|
|
|
This model works well without clipskipping but if you are merging several disparate concepts into 1 image then it may pay to skip 2 or 3 to give some fluidity to the concepts |