meta-llama/Meta-Llama-3-8B-Instruct · Tokenizer Chat Template

May 11, 2024

Why does the model has default huggingface chat template and not llama3 special template?

In configs it is given
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

Instead of

{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}

Sm1Ling

May 11, 2024

Also I'm not shure about <|eot_id|>. Seems everything got mixed up
I don't really know wether this issue was fixed. (Still do not have access to original repo)

oldmanhuggingface

May 13, 2024

Correct. We don't know whether this issue is fixed yet or not. We need communication from meta

dragon7

May 16, 2024

Use chat_template instead of default_chat_template

ringohoffman

Jul 18, 2024

It seems like it is bc the Llama 3 tokenizer_config.json that they have distributed is configured with "tokenizer_class": "PreTrainedTokenizerFast", which only uses the default_chat_template.