GGUF Quants for the lora adapter: ibm-granite/granite-uncertainty-3.2-8b-lora
Link to the original repo: https://huggingface.co/ibm-granite/granite-uncertainty-3.2-8b-lora
You need the instruct gguf to apply this lora to (ie. granite-3.2-8B-instruct-Q4_K_M.gguf)
Then run it like this:
llama-cli -m granite-3.2-8B-instruct-Q4_K_M.gguf --lora granite-uncertainty-3.2-8b-lora-f16.gguf --conversation --jinja
To get the certainty score, simply paste this line into the chat, after getting the first reply:
<|end_of_role|>\n<|start_of_role|>certainty<|end_of_role|>
It's a bit hacky, but it works for now.
Example of what it should look like:
- Downloads last month
- 19
Hardware compatibility
Log In
to view the estimation
8-bit
16-bit
32-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for QuantPanda/granite-uncertainty-3.2-8b-lora-GGUF
Base model
ibm-granite/granite-3.2-8b-lora-uncertainty