DavidAU/Gemma-3-12b-it-MAX-HORROR-Imatrix-GGUF

I've run some creative writing tests with the model to test it's capabilities. My previous daily driver for writing was the DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS-GGUF as my GPU only has 12GBs of VRAM - so I will be comparing it a little bit to that model. I run these models on LM studio currently. I would like to be able to use something that can utilize the models to their full potential but oobabooga and kobold runs significantly slower on my system and I haven't found a good alternative (am on windows).

The prose is significantly improved in a lot of aspects over Lyra I find (at least in my limited testing). It is more artistic in the way it describes things and presents beautiful imagery when utilizing the same prompt and settings. It can really breathe life into characters and make them feel multi-faceted.

Prompt adherence is generally very good. Sometimes it will forget to follow certain instructions if the prompt is longer and more complex. But it accepts guidance well and will always adjust things as needed once it is brought to attention. I even enjoy that it is more collaborative. It will ask questions (at the end of the generation) if it doesn't fully understand something and will offer ideas and reflect on if there is something that can be improved. This makes me prefer it significantly over Lyra, as I am not the best at prompting or guiding AI properly. It will write about darker topics and will also write NSFW scenes HOWEVER it does not respond well to instructions with graphic / explicit wording. Using words to subtly imply an NSFW scene works wonders + guiding it to what you're looking for can help significantly. In my limited testing, it also will not describe things with explicit words but more artistically (I happen to prefer this). It does give content warnings, but it doesn't matter that much and doesn't impact the quality of generation.

Some issues I've noticed are that it will reuse the same exact names (of people and places) each time regardless of what kind of story you are writing. This isn't a huge issue for me, as I give my characters and places their own names but it is something to consider. Sometimes it will reuse the same way of describing something, though I don't find issue with this so much since it can vary enough to not be entirely repetitive. And humans also utilize the same way of describing things often so I find it fairly realistic.

Pacing is something it struggles with at times. But being more specific about the kind of pacing you are looking for can improve the generation greatly.

Overall, it is a very impressive model and will likely be the one I use most often for creative writing alongside Lyra. Despite this having a horror tint, I didn't use it for that purpose and it preformed extremely well at other genres, even romance. So far it is my favorite model to work with but I am always testing your models to see how they do. I love the work you do as always, It is nice to see you outdoing yourself each and every time. You are an inspiration to the community.

Below is the generic system prompts I use regardless of the model, as I found these work best:
Simpler (can introduce pacing issues) - Write with vivid detail and sensory language, focusing on showing rather than telling. Use rich and evocative descriptions to create an immersive experience.

More detailed - Craft a compelling narrative for a novel that prioritizes showing over telling and creating a palpable sense of atmosphere. Immerse the reader in the scene through vivid sensory details – describe textures as if they can be felt, sounds as if they are heard directly, etc. Employ figurative language to evoke emotions and create immediacy. After establishing each key detail or action, pause—allow the implications of that event to linger with the reader before moving on. Vary sentence length strategically; use shorter sentences during moments of urgent action, but prioritize longer sentences rich in description when exploring atmosphere and character thoughts.

My settings in LM studio:
Flash attention is used with Q4_K_M quants
Template is set to the default always (Jinja) as I found it works best for my usage
Temperature - 0.8 or 0.9 depending
Top k sampling - 75 (sometimes I will adjust this to 40)
Repeat penalty - 1.1
Top P sampling: 0.95
Min P Sampling - 0.02

DavidAU
/

Gemma-3-12b-it-MAX-HORROR-Imatrix-GGUF

Thoughts on the model