Can't stop output

by Pevernow - opened Oct 1, 2024

Discussion

Pevernow

Oct 1, 2024

•

edited Oct 1, 2024

bpw6.0 exllamav2 tabbyapi lastet
win11 cu124 torch2.4.1 flash attn 2.6.3

Pevernow

Oct 1, 2024

•

edited Oct 1, 2024

Even the lowest parameters did not stop it. After the client stopped outputting text, the server would be busy trying to infer non-existent characters until it reached the limit.

Delta-Vector

Anthracite org Oct 1, 2024

•

edited Oct 1, 2024

Weird, I'll look into it in a few hours to see if the quant is busted.

Did you try to use a lower BPW quantization to see if it works?

lucyknada

Anthracite org Oct 1, 2024

tabbyAPI: f20857c
exllama: 0.2.3 (dev commit: 2616fd7)

34 swipes, all end properly, please try a fresh install of tabby, as your environment might have had outdated dependencies or exllama.

if you're using sillytavern, you can also grab sillytavern templates from here: https://huggingface.co/anthracite-org/magnum-v3-9b-chatml to exclude any chatML template issues.

Pevernow

Oct 1, 2024

tabbyAPI: f20857c
exllama: 0.2.3 (dev commit: 2616fd7)

34 swipes, all end properly, please try a fresh install of tabby, as your environment might have had outdated dependencies or exllama.

if you're using sillytavern, you can also grab sillytavern templates from here: https://huggingface.co/anthracite-org/magnum-v3-9b-chatml to exclude any chatML template issues.

I'm using risuai and there seems to be a problem even though I specify the template as CHATML in tabbyapi.

In any case, I will try various methods to troubleshoot this error.

Thanks for your support

lucyknada

Anthracite org Oct 1, 2024

Try out SillyTavern, seems to work best there, but hope your issue with risu solves itself too.

lucyknada changed discussion status to closed Oct 1, 2024

Pevernow

Oct 2, 2024

@lucyknada After my test, I found that this problem has nothing to do with the front end. Even common front ends not used for role-playing can see this problem.

Pevernow

Oct 2, 2024

•

edited Oct 2, 2024

I have tried compiling from source, installing from prebuilt binary packages, and using pip to automatically jit compile, but nothing works.

I also tried turning off cache quantization and all experimental options, and reducing the context length, but it still doesn't work.

lucyknada

Anthracite org Oct 2, 2024

try setting eos in generation config to: 15 then restarting your inference engine

https://huggingface.co/anthracite-org/magnum-v2.5-12b-kto-exl2/blob/9f216e6c00fbdf1ec2e6c2c730a78835a0fa6f99/generation_config.json#L5

Pevernow

Oct 2, 2024

Solved, that was the problem.
Thank you.

lucyknada

Anthracite org Oct 2, 2024

Glad it worked!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment