--- license: openrail pipeline_tag: text-to-image tags: - Stable Diffusion - photorealistic - sd_1.5 --- https://civitai.com/models/597300/boltmonkey-photoreal?modelVersionId=667353 This is an extremely high-quality photorealistic SD1.5 model that I created as an offshoot to a business project of mine that I work on in my spare time. I believe in the open-source nature of AI and am gradually releasing some of my work that I do not intend to use for my ongoing project. I have been slowly developing this model for roughly a year. I have labelled this model as a merge but it is already 30+ iterations deep which include a substantial number of blockmerges and multiple fine-tunes along the way. The model is very realistic, especially for SD1.5. Hands are generally 5-fingered and not mangled, but overly complex or poor prompting style can result in amputations or distortions. Most textures are well-rendered, but I have found that extremely dusty environments (such as in a mine tunnel) look a bit too generic for my liking. Lighting and shadows are a strong point of the model. Particularly, volumetric lighting (such as light rays through misty or dusty atmosphere) is well-rendered. Most of my showcase uses animals, but the model is adept at generating humans, architecture, natural environments, food, etc... Though, I find that I have not trained enough on most forms of transport. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/T8-3LROcMo6hSihTE4ddx.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/8mEs5A5HZp2el7i5nGhQA.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/2iKSk1Zx_muykrlepHiWL.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/PMBxD5xypOIMg4z1wHRZK.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/tb_zJ_Te6OvrMPNZuxOZC.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/j7v47skcSFbOCASEsGSOL.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/LAJwc4Bjj5zjCl3Nm545z.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/Ib2RGzy-YfxIiHEG-_9ew.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/L6h67yde-8cforZbRoPq7.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/w4QHbzjBedEL3SWCUeqti.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/pdjf16jNDgk00HCS7FEaz.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/SrIJbQTJ0ajNQPLuMJXWU.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/3dEkBxiJ6k9dJLC6jDD5f.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/_2xeTm1D1EiJMsB9T4_t9.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/A7b2RLXOLkluqSfxtIsKh.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/WXilW7ACN6ivAiAOFUza0.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/641e4b539128fb5692c61524/AaQlA3LufIfecXJoBwelr.png) Suggestions for use I have a lot to learn about prompting from the collective CivitAI userbase, but here are a couple of things that I have found work well: TL;DR: DDIM, 15-40 steps, CFG ~2-10, Clipskip 1-4 (depending on use), LoRAs work well. This model works well with square and rectangular aspects. Resolutions of 768x+ work best but will sometimes result in duplications around 1024x. Having said that, 512x+ will still produce good images. The quality of this model's output is very realistic even with minimal prompting, but is exceptional with well-structured prompts. Moreover, this model works very well with LoRA's so long as you are cognisant of to the LoRA's training resolution (768+ work best). I don't use anime LoRA's so I can't offer any suggestions there, but I will be interested in your results if you try it. Good quality photorealistic images will result from extremely simple prompts (e.g., "cat") but the model responds very well to quality guidance prompts and some more complex prompting too. The following prompts are my go to: "ultrarealistic photography, 32k UHD, absurdres, natural light and shadows, volumetric lighting, natural skin textures, accurate attention to details, depth of field, sharp focus" Typically, I would use DPM++_3m_SDE_GPU as my sampler with SGM_uniform noise schedule, but I find that this model works best (to my taste) with DDIM sampler and DDIM_uniform noise schedule. 15 Steps is enough to get good images most of the time, but I typically use 25-40. I have run a few generations with ComfyUI's maximum of 999 steps just to see how it fares. Obviously the results look great, but I see no real need to take it past 50 at the max. CFG is a difficult one to give a value for. A CFG of 2-4 work well, but sometimes I will take it as far as 10 depending on what I am generating. I suggest starting with a value of 4 and gauging it for yourself. Obviously, lower values give the model more freedom. This model works well without clipskipping but if you are merging several disparate concepts into 1 image then it may pay to skip 2 or 3 to give some fluidity to the concepts