QuantPanda/granite-uncertainty-3.2-8b-lora-GGUF

GGUF Quants for the lora adapter: ibm-granite/granite-uncertainty-3.2-8b-lora

You need the instruct gguf to apply this lora to (ie. granite-3.2-8B-instruct-Q4_K_M.gguf)

Then run it like this:

llama-cli -m granite-3.2-8B-instruct-Q4_K_M.gguf --lora granite-uncertainty-3.2-8b-lora-f16.gguf --conversation --jinja

To get the certainty score, simply paste this line into the chat, after getting the first reply:

<|end_of_role|>\n<|start_of_role|>certainty<|end_of_role|>

It's a bit hacky, but it works for now.

Example of what it should look like:

QuantPanda
/

granite-uncertainty-3.2-8b-lora-GGUF