Can't stop output
bpw6.0 exllamav2 tabbyapi lastet
win11 cu124 torch2.4.1 flash attn 2.6.3
Weird, I'll look into it in a few hours to see if the quant is busted.
Did you try to use a lower BPW quantization to see if it works?
tabbyAPI: f20857c
exllama: 0.2.3 (dev commit: 2616fd7)
34 swipes, all end properly, please try a fresh install of tabby, as your environment might have had outdated dependencies or exllama.
if you're using sillytavern, you can also grab sillytavern templates from here: https://huggingface.co/anthracite-org/magnum-v3-9b-chatml to exclude any chatML template issues.
tabbyAPI: f20857c
exllama: 0.2.3 (dev commit: 2616fd7)34 swipes, all end properly, please try a fresh install of tabby, as your environment might have had outdated dependencies or exllama.
if you're using sillytavern, you can also grab sillytavern templates from here: https://huggingface.co/anthracite-org/magnum-v3-9b-chatml to exclude any chatML template issues.
I'm using risuai and there seems to be a problem even though I specify the template as CHATML in tabbyapi.
In any case, I will try various methods to troubleshoot this error.
Thanks for your support
Try out SillyTavern, seems to work best there, but hope your issue with risu solves itself too.
@lucyknada After my test, I found that this problem has nothing to do with the front end. Even common front ends not used for role-playing can see this problem.
I have tried compiling from source, installing from prebuilt binary packages, and using pip to automatically jit compile, but nothing works.
I also tried turning off cache quantization and all experimental options, and reducing the context length, but it still doesn't work.
try setting eos in generation config to: 15 then restarting your inference engine
Solved, that was the problem.
Thank you.
Glad it worked!