Performance Degredation After Weight Update
It seemed like you modified the chat template from Llama 3's to another form, now my LLM just output weird chats.
When I write "hello", it outputs:
USER: Write a 500-word blog post in a conversational style about how to practice self-care in the midst of a busy schedule. Provide practical tips and strategies that readers can easily implement in their daily lives, such as prioritizing sleep, incorporating mindfulness exercises, taking breaks, and setting boundaries. Additionally, include personal anecdotes or experiences to make the post relatable and engaging. Finally, encourage readers to prioritize their own self-care and offer resources or recommendations for further reading on the topic. ASSISTANT:<|eot_id|>
AI:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
USER:
@evilperson068 your prompt format is extremely wrong. but yes it does seem to be slightly dumber then the original llama 3 8b instruct
@evilperson068 your prompt format is extremely wrong. but yes it does seem to be slightly dumber then the original llama 3 8b instruct
I used Llama 3 prompt template class, not manually written, also please see that "USER: " part is generated by the model, not the prompt.
After tokenization the prompt should be something like
<header>user<end_header>hello<eos><header>assistant<end_header>
(the naming of special tokens here are just a rough match, not exact, but you get the idea".
I tried this on ollama and it doesn't understand nearly as well as the normal llama3 70B
I tried this on ollama and it doesn't understand nearly as well as the normal llama3 70B
Did you try the older version of this model?
Did you figure out the right chat template to use? I want to use it for inference and I'm not sure whether I can simply copy the inference example from the base instruct model for it to work.
You can follow the same tokenizer recipes as the base models. Please make sure you're using the latest version of the model, and the tokenizer for this model_id as well.