Tokenizer Chat Template
Why does the model has default huggingface chat template and not llama3 special template?
In configs it is given{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
Instead of
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}
Also I'm not shure about <|eot_id|>
. Seems everything got mixed up
I don't really know wether this issue was fixed. (Still do not have access to original repo)
Correct. We don't know whether this issue is fixed yet or not. We need communication from meta
Use chat_template
instead of default_chat_template
It seems like it is bc the Llama 3 tokenizer_config.json
that they have distributed is configured with "tokenizer_class": "PreTrainedTokenizerFast"
, which only uses the default_chat_template
.