Which Mistral
Is your model based on Mistral 3.1?
Works great, thank you.
Edit: While testing your model I noticed something interesting, I am using quants: https://huggingface.co/mradermacher/BlackSheep-24B-GGUF and I tested different versions, starting from IQ4xs through Q4m, Q5m and they were similar, the difference in quality is very small. Until I got to Q6k... wow, the difference in quality is already big, which is interesting because I don't experience it in the normal version of Mistral 24B.
What do I mean by big difference in quality? The style of expression changes, the model reaches more often to the past and sticks to events better and I see it in roleplay. I often switched between models during roleplay and saw what IQ4xs, Q4m, Q5m and Q6km would generate, and Q6km almost always generated much better content.
I'm using Vulkan and a mix of AMD and Nvidia cards, maybe that has something to do with it... I don't know. I thought it was worth sharing.
Is your model based on Mistral 3.1?
Works great, thank you.
Edit: While testing your model I noticed something interesting, I am using quants: https://huggingface.co/mradermacher/BlackSheep-24B-GGUF and I tested different versions, starting from IQ4xs through Q4m, Q5m and they were similar, the difference in quality is very small. Until I got to Q6k... wow, the difference in quality is already big, which is interesting because I don't experience it in the normal version of Mistral 24B.
What do I mean by big difference in quality? The style of expression changes, the model reaches more often to the past and sticks to events better and I see it in roleplay. I often switched between models during roleplay and saw what IQ4xs, Q4m, Q5m and Q6km would generate, and Q6km almost always generated much better content.
I'm using Vulkan and a mix of AMD and Nvidia cards, maybe that has something to do with it... I don't know. I thought it was worth sharing.
Wow, thank you got testing multiple quants, this is really interesting for my research!
Can you tell me more about how IQ4xs, Q4m, Q5m quality differ as I always use Q6_KM as default, specifically the kind of things I could possibly look for as I test?
The truth is this model has no SFT and was done 100% through a new/ish layerwise abliteration technique on top of mistralai/Mistral-Small-24B-Instruct-2501 not the Mistral 3.1 with vision.
Is there really a noticeable jump once you reach Q6_KM? This makes me want to test the quality from Q6 -> Full Precision!
I am really excited to read this and look forward to your reply
@Danioken
<3
In normal use I might not notice it, but maybe I'm using it a bit unusually.
I use this: https://github.com/cierru/st-stepped-thinking
On the surface it looks normal but...
As you can see, there's a lot going on here. First, the character analyzes the most important events that led to the current situation, then current thoughts and reflections on the situation, and finally, preliminary plans for what to do in the near future.
So we have a focused summary, character thoughts where the model can show off their understanding of emotions and situations, and plans - which we can later verify how they are executed and to what extent they are realistic/correctly made.
Each part can be edited/regenerated and we have here a wonderful tool for testing the model, the richer the roleplay - the better.
And now when I run IQ4xs it's ok, sometimes you have to re-generate a certain part because the model got lost (the higher the quant the better it copes - Q6km I rarely re-generate.)
IQ4Xs generates everything nicely but sometimes not including it in the main part of the statement, it spoils the immersion because internal reflections are not included in the narrative, or plans previously made at once become irrelevant - just generate again/reduce the temperature and it's ok. And here you can see how Q4m and Q5m perform better, you have to re-generate less often or you can use a higher temperature, but the difference is not that big.
With it written down like this, you can easily assess how quant is coping with everything. So I just do a roleplay and re-generate different parts (and see how the model does, usually there's only a small difference.)
I often test IQ4xs because sometimes this quant is more "uncensored" (maybe the reduced precision damages the model's self-control?), plus it is often more creative - at the expense of intelligence.
In Mistral instruct you can see that Q6km does better in such roleplay but the difference is so small that I used Q5m. In your model the difference is bigger, in such roleplay your Q6km does better (in the sense of handling everything) than the normal version of instruct.
Your Q6m comes in, it almost always takes into account all reflections, refers to past events, tries to execute plans and even improvises if something doesn't work out, and even extracts information from character cards much more often than lower quants.
It's easy to evaluate if you generate certain parts multiple times and look at what is included and what is not, how accurate the summary of events is, how deep the thoughts are (taking into account character, etc.) and the plans - their quality and subsequent implementation.
I don't know if this is the right approach, but this is how I do my roleplay (in this extension I create fields for a specific character depending on the roleplay - these can be a summary, thoughts, plans. Or who is in the room and what they are doing, a description of the surroundings or clothing... The possibilities are endless and you can set it so that only the current ones affect the generation, so it doesn't eat up the context too much.
Here are some examples of what people come up with and how they use it:
https://github.com/cierru/st-stepped-thinking/wiki/Prompts-for-thinking
In this case I used this:
Prompts for: Important past events
Pause your roleplay. Briefly remind important past events that led Cassie to the current situation.
Follow the next rules:
- Describe details in md-list format
- There should be 2-12 points, one or two sentences each, if needed
- Do not use any formatting constructions
- Just the facts
Example:
📍 Past Events
1.
2.
3.
4.
Prompts for: Current thoughts and reflections
Pause your roleplay. Describe the current thoughts and reflections of Cassie in the form of her inner dialogue, take into account past and present events and changes that have occurred and Cassie's character.
Follow the next rules:
- Describe details in md-list format
- There should be 2-4 points, up to three sentences each
- Do not use any formatting constructions
- Remember to use speech consistent with her character
Example:
📍 Thoughts
-
-
Prompts for: Current plans
Pause your roleplay. Describe the current plans of Cassie based on past events, the information you have about her, and her current emotional state.
Follow the next rules:
- Describe details in ordered md-list format
- There should be 2-4 short points, one or two sentences each
- Do not use any formatting constructions
- Remember take into account Cassie's character, thoughts and current situation when creating her plans
Example:
📍 Plans
1.
2.
3.
4.
Wow that was the most thoughtful response and I am sorry but I need a bit to process before I can reply effectively.
I have a very similar framework of refreshing roleplays and other morally grey area scenarios and have a checklist of how well it performed consequential reasoning.
I am really interested in your research using roleplay to evaluate models as it aligns so closely to my alignment strategy that fuels my curiousity for controlled hallucinations.
I have a Mistral 12B because I didn’t take that model seriously before that I am working on , my last blacksheep I was abliterating to make the model politically right as a joke and ended up being the most right leaning model on ugi lmfao!
I have since beat it’s alignment around and some light SFT and think it’s really good at cause effect relationships and would love to see how it performs at roleplays.
I would love it if you could try my 12B out as I work on it, I will be uploading an update tomorrow once it’s done training for context obedience.
Let’s chat, do you use discord?
I just roleplay... and at the same time I test models that work well in my case, nothing special.