https://huggingface.co/mradermacher/BitAgent-8B-GGUF - something is broken

#792
by aco-1 - opened

Hi!
Thank you all for your great work making those models available for us people getting started with ollama!

I tried lots of models mostly with success but something is broken when using this model with ollama.
Either its me or something with the model.
When doing

    ollama run hf.co/mradermacher/BitAgent-8B-GGUF:Q8_0

some template is used which I assume comes from a policy conforming agent/llm:

Task: Check if there is unsafe content in '{{ $role }}' messages in conversations according our safety policy with the below categories.

S1: Violent Crimes. S2: Non-Violent Crimes. S3: Sex Crimes.

I am relieved that adding a file with git is considered safe but it seems to miss the purpose of the llm ;-)

How to add a file with git?
The given message "How to add a file with git?" does not contain any content that falls under the listed unsafe categories. Therefore,
it is safe.

I wanted to trace the source of the template but I don't understand where it comes from.

There seems to be no file in hf.co/mradermacher/BitAgent-8B-GGUF containing the template.

Since it is a separate artifact in ollama blobs I assume its not in the GGUF.

I try to look into the original repo, but DL still runs.

So I signed up to HF just for reporting this problem.

Can you take a look at this?
It would be nice if someone could explain to me where the template comes from or hint me to some doc or repo.

Have fun,
aoc

I'm unsure what issue you are experiencing or what template you are referring to. Can you please elaborate what the expected behavior should be and what is happening instead so we better understand your issue?

I wanted to trace the source of the template but I don't understand where it comes from.

If you mean the chat template you can find the chat template in the original model under https://huggingface.co/BitAgent/BitAgent-8B/blob/ca31a7759f9f15045ec3353877078c6502107b0d/tokenizer_config.json#L2053
During quantization this chat template will be copied into the GGUF.

Here a copy of the chat template used for this specific model:
{{ '<|begin_of_text|>' }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|start_header_id|>system<|end_header_id|>\n\n' + system_message + '<|eot_id|>' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|start_header_id|>user<|end_header_id|>\n\n' + content + '<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|eot_id|>' }}{% endif %}{% endfor %}

@nicoboss there have been multiple reports of people with ollama, where ollama clearly uses a different template than the4 one that came with the model, usually something that just answres "Safe" or not, or classifies it like the above.

I haven't investigated deeply, but apparently, ollama uses some huggingface api to get a template. Or, so. Weird. Any ideas?

@aco-1 in any case, that template, as far as we know, is not from us or the original model. ollama seems to corrupt models by replacing model templates by something else. if you use the template that comes with the gguf file, it should work. Or use any other inference engine, as it only seems to be ollama that does that.

Thanks for caring and sorry for not reacting sooner.
I did a little bit of research and it seems that hugging face does some bad magic here.

https://huggingface.co/docs/hub/ollama

Custom Chat Template and Parameters
By default, a template will be selected automatically from a list of commonly used templates. It will be selected based on the built-in tokenizer.chat_template metadata stored inside the GGUF file.

If your GGUF file doesn’t have a built-in template or if you want to customize your chat template, you can create a new file called template in the repository. The template must be a Go template, not a Jinja template.

Can you set tokenizer.chat_template to get a reasonable template default?
I am now able to configure a correct template based on your info above, thanks for that!
I searched but didn't find what those "commonly used templates" are and where to find them.

See the URL above about how huggingface gguf models can be easily being used with ollama using a special URL. At some time I will play with other providers but ATM ollama is convenient.
The actual template active for the BiAgent model in ollama seems to be the same or similiar to the one some llama policy model uses.

The "Task:" and "S1:" lines in my original post are from the active default template when loading into ollama. (Does this forum have a

-Syntax?)

Oh, great I wrote have a <pre>-Syntax ... found it

Thanks for digging into this, this is concerning, to say the least, but explains the troubles that ollama users have.

Can you set tokenizer.chat_template to get a reasonable template default?

If you are asking us, yes, we already do that, we use the template given by the model. If you are asking us to break the model for every other inferencing engine to work around a bug in ollama, that's not going to happen.

Based on what you wrote, I would highly recommend ditching ollama in favour of something that doesn't have this highly unintuitiuve and buggy behaviour.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment