what template are we using?
FROM ../meta-llama-3.1-8b-instruct-abliterated.Q8_0.gguf
TEMPLATE """
{{if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}{{ end }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
"""
PARAMETER num_ctx 20000
PARAMETER num_predict 2000
PARAMETER num_gpu -1
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
with this modelfile I'm getting repetitive run-on..
anyone using a different template than this?
https://ollama.com/mannix/llama3.1-8b-abliterated
I'm using 2 different templates, one with tools and one without
Didn't test specifically this scenario
great! that's much better than mine.. unfortunately still not getting the results from this model. having much better success with llm3.1 8b instruct itself...
I shouldn't be overfilling the context? very strange to me, I'll tinker and report back
great! that's much better than mine.. unfortunately still not getting the results from this model. having much better success with llm3.1 8b instruct itself...
I shouldn't be overfilling the context? very strange to me, I'll tinker and report back
Yea, I'm having the same issues with openchat. But I'm attempting to design my own interface. That said, I haven't attempted to implement a template yet. I've only just heard of them recently. But I really do need to do that, cuz as it sits, almost all my interface will output for the most part, is massive-max-token info-dumps. Also a massive lack in prompt coherence. And it takes 3-5 minutes for a single response, even from simple queries.