what template are we using?

#8
by cognitivetech - opened
FROM ../meta-llama-3.1-8b-instruct-abliterated.Q8_0.gguf
TEMPLATE """
{{if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}{{ end }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>
"""
PARAMETER num_ctx 20000
PARAMETER num_predict 2000
PARAMETER num_gpu -1
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

with this modelfile I'm getting repetitive run-on..

anyone using a different template than this?

https://ollama.com/mannix/llama3.1-8b-abliterated

I'm using 2 different templates, one with tools and one without
Didn't test specifically this scenario

great! that's much better than mine.. unfortunately still not getting the results from this model. having much better success with llm3.1 8b instruct itself...

I shouldn't be overfilling the context? very strange to me, I'll tinker and report back

great! that's much better than mine.. unfortunately still not getting the results from this model. having much better success with llm3.1 8b instruct itself...

I shouldn't be overfilling the context? very strange to me, I'll tinker and report back

Yea, I'm having the same issues with openchat. But I'm attempting to design my own interface. That said, I haven't attempted to implement a template yet. I've only just heard of them recently. But I really do need to do that, cuz as it sits, almost all my interface will output for the most part, is massive-max-token info-dumps. Also a massive lack in prompt coherence. And it takes 3-5 minutes for a single response, even from simple queries.

Sign up or log in to comment