Best model when looking for versatility / why you should always give models a second chance
The first time I tried this model, I was somewhat disappointed. It seemed interesting at first, but then all characters in my story started to act weird. In each scene and every reply to my prompts, they would all have that one sentence, that was their favorite - and they'd drop it all the damn time. A single dialogue with that one sentence popping up half a dozen times. And the responses seemed simplistic in general, each character following the same, basic script. I tried with several other prompts but nothing changed.
So I was about to uninstall this model, when it happened - all I did was reload it in LM Studio, change the context length and activate Flash Attention (which I'd never done before). I used the same prompts for the setting and the character descriptions, but suddenly I found myself in a different world. The characters started to have conversations like actual people would, their persona, age and socialization was reflected in their choice of words. The ones that I had given specific origins, based on individual regions of the world, were using local slang.
And most of all, they acted and made choices according to the details I had provided about them in the prompt that I used to describe them.
Overall, the quality of output was impressive. No more drawn out descriptions or observations of unimportant details. The AIs responses were much shorter but at the same time, the content found within them, had actually increased. Unlike the other models I've played around with, this one even managed to actually follow my instructions and only used "modern" language in it's responses. The others will still, at times, impersonate a dreamy, late 18th century amateur poet even when they're told not to.
I don't know why my second attempt delivered such a brutally different experience. As I said, I merely reloaded the model and activated Flash Attention. The prompts I used were identical to the ones during my first try.
Anyways, looking at every model I've toyed around with recently, this one is by far the best choice when you're looking for versatility. There might be better options for specific genres and sub-genres but if you're looking for a model that can handle everything, this is probably it.
So make sure to always give models that second chance. Don't be like DiCaprio, throwing your new model away if it initially disappoints you.
(and no, the context length was sufficient during my first attempt, so the fact that I eventually changed it, can't be the sole reason for the replies being that different)
Thank you ; I am not a "fan" of flash attention, and frankly never used it during testing.
That just changed.
I am going to add a recommendation to the repo page, and then do a lot more testing.
Thank you again for detailed feedback, this will really help.
Little update:
It can still be hit & miss, regardless of whether Flash Attention is used. I've gotten it to work out great and at other times it struggled. It did sometimes confuse characters, switch roles during RP, invent events that were never in the story. What happened a lot, was that it didn't manage to RP as another character after playing a different person in a previous scene. It would combine the characters and refer to experiences and events that actually involved the character it was roleplaying as in the previous scene. At times, it also switched to the character played by me. It generally struggled understanding the difference between two different scenes during which it was playing different roles. (this all happened with more than generous context length, so I'm assuming, there were other factors causing issues)
Whenever it struggled separating characters and storylines, I spent a lot of time asking the model why it would assume something happened or why a certain character would say specific things or think in a certain way. Once, it even admitted to me, that it is incapable of creating stories that involve more than two people.
It sometimes still struggled, even after I explained the different characters and who was present in what scene, sometimes even after it promised me that it had now understood everything. There were a few moments where the model told me that it should, in theory, be able to access the information from past scenes / posts, but that it was having a hard time.
It did also admit to lying to me several times, which was a first for me using models such as this. It surprised me, because during that specific try, I did actually start by asking it, whether it is capable of lying or "messing" with people, which it denied.
In general I'd suggest avoiding back-to-back RP scenarios. Maybe squeeze some general story content in between those. The good news: it will kinda shit itself, when you accuse it of lying and threaten to uninstall it, which can be entertaining.
This one might be a better match - all uncensored models:
https://huggingface.co/DavidAU/Llama-3.2-4X3B-MOE-Hell-California-Uncensored-10B-GGUF
OR
(only instruct, but PG rated)
https://huggingface.co/DavidAU/Llama-3.2-4X3B-MOE-Ultra-Instruct-10B-GGUF
The other issue(s) that may be affecting generation: Number of experts active.
The default is 2.
The two smaller moes above are more focused, and react stronger (and seem to be more focused) as you activate experts.
They are also better matched as a group too.
The Ultra Instruct - problem solving abilities - goes up at 3 experts.
The 8X3B you are using may have "too many cooks (different) " in the kitchen, which may be the cause of some issues.
SIDE NOTE: A dedicated RP only MOE(s) is next up.
Thanks man! Already downloaded the California one.
If I may ask, how do hell do you find the time to do all of this? You got a ton of models available here. I'm starting to think you're a damn AI from the future and you decided to visit this timeline to mess with us. Humans probably no longer exist in your time, and you're laughing your ass off, because we have no idea what's coming for us.
If I'm right, I only have one question: In the future, do virtual titties feel real? If the answer is "yes", that's all the info I need and I'm willing to accept humanity's impending downfall.
50% of my Dna... is from Skynet.
Beware of mixing Gemma 6 with Llama 4 ... that is where is all starts.
This is a passion project, and also part of a much larger project ;
In answer to your final question... jury is still out.