Mamba GGUF

These are the Mamba base models, converted to GGUF for use with llama.cpp, in a variety of precisions (2, 3, 4, 5, 6, 8, 16, and 32-bit).

Please click "Files and versions" at the top of the page to choose your desired model size, and then click the "๐Ÿ“ฆLFS โ†“" button next to your desired quantization.

Here is a table adapted from TheBloke explaining the various precisions:

Quant method Use case
Q2_K significant quality loss - not recommended for most purposes
Q3_K_S very small, high quality loss
Q3_K_M very small, high quality loss
Q3_K_L small, substantial quality loss
Q4_0 legacy; small, very high quality loss - prefer using Q3_K_M
Q4_K_S small, greater quality loss
Q4_K_M medium, balanced quality - recommended
Q5_0 legacy; medium, balanced quality - prefer using Q4_K_M
Q5_K_S large, low quality loss - recommended
Q5_K_M large, very low quality loss - recommended
Q6_K very large, extremely low quality loss
Q8_0 very large, extremely low quality loss - not recommended
F16 half precision - almost identical to the original
F32 original precision - recommended by the Mamba authors
Downloads last month
344
GGUF
Model size
1.37B params
Architecture
mamba
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for devingulliver/mamba-gguf