GGML/GGUF(v2) Quantizations of the model: https://huggingface.co/winglian/basilisk-4b This is winglian/llama-2-4b, a 4B parameter Llama-2 model, finetuned with open orca CoT data.

I tried to run on latest llama.cpp commit, but I was getting an error(GGML_ASSERT: llama.cpp:8136: false), then I converted again the model to gguf using this llama.cpp commit https://github.com/ggerganov/llama.cpp/tree/019ba1dcd0c7775a5ac0f7442634a330eb0173cc it seems to be working now.

Downloads last month
39
GGUF
Model size
3.5B params
Architecture
llama

2-bit

4-bit

5-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Aryanne/Basilisk-4B-gguf