GGUF Quants for the lora adapter: ibm-granite/granite-uncertainty-3.2-8b-lora

Link to the original repo: https://huggingface.co/ibm-granite/granite-uncertainty-3.2-8b-lora

You need the instruct gguf to apply this lora to (ie. granite-3.2-8B-instruct-Q4_K_M.gguf)

Then run it like this:

llama-cli -m granite-3.2-8B-instruct-Q4_K_M.gguf --lora granite-uncertainty-3.2-8b-lora-f16.gguf --conversation --jinja

To get the certainty score, simply paste this line into the chat, after getting the first reply:

<|end_of_role|>\n<|start_of_role|>certainty<|end_of_role|>

It's a bit hacky, but it works for now.

Example of what it should look like:

image/png

Downloads last month
19
GGUF
Model size
2.95M params
Architecture
granite
Hardware compatibility
Log In to view the estimation

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantPanda/granite-uncertainty-3.2-8b-lora-GGUF

Quantized
(1)
this model