DavidAU/L3.1-RP-Hero-Dirty_Harry-8B-GGUF · Context size suggestions

Dec 2, 2024

•

edited Dec 2, 2024

Hi!

Would it be possible to get some information on the model's coherence limit? As we all know, most models don't really work all the way to the "advertised limit", example 100k context limit. I have found that the coherence begins dropping around 20k mark, with more and more degradation after that. However, some models stay strong even after 20k, with a very slight degradation in coherence and logic, while others begin to fall apart after 12k. In regards to your model here, what is the limit of context that you have tested? Can it take context and still follow instructions after 20k ?

DavidAU

Owner Dec 2, 2024

I would say "safe" would be 16k to 32k.
This is based on issues which can occur with fine tuned models (IE what context level the fine tuning was done at).
Likewise this model specifically contains TWO Llama 3.1 and TWO Llama 3 model. (context 128k and 8k respectively).
The merge method (and choices) used will roughly speaking balance out the context levels.

GhostGate

Dec 3, 2024

Interesting! I will give this a shot later on this week. I have been using Nitral-AI's Captian_BMO12b, even though I have a huge backlog of models to test and review. I did notice that in his case, as an example, my RP is going 22k and strong context (limit set at 32k). The model doesn't flinch even when I impersonated multiple characters (for the sakes of simplicity I left the model to play only {{char}} and no other NPC's). I have a feeling that my world info and author notes is keeping it on track. I hope I can get the same experience out of yours as well :)
One of the main reasons why I have been asking around about the context limits of different models, is because I have quite the large setup (world info, character sheet, example dialogue, scenario, instructions, etc.), starting from the bat at 8k with only the intro message. I keep the model in check with a nice little addition to the instructions from the people at Beaver-AI's discord channel, so it stays aware of temporal time and not go spin out of control with a long generation of 3 questions and five actions before I get a word in.