Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,7 +1,6 @@
|
|
1 |
---
|
2 |
quantized_by: bartowski
|
3 |
pipeline_tag: text-generation
|
4 |
-
base_model: Qwen/Qwen2.5-Coder-32B-Instruct
|
5 |
---
|
6 |
|
7 |
## Llamacpp imatrix Quantizations of Qwen2.5-Coder-32B-Instruct
|
@@ -35,7 +34,7 @@ Run them in [LM Studio](https://lmstudio.ai/)
|
|
35 |
| [Qwen2.5-Coder-32B-Instruct-Q5_K_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q5_K_M.gguf) | Q5_K_M | 23.26GB | false | High quality, *recommended*. |
|
36 |
| [Qwen2.5-Coder-32B-Instruct-Q5_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q5_K_S.gguf) | Q5_K_S | 22.64GB | false | High quality, *recommended*. |
|
37 |
| [Qwen2.5-Coder-32B-Instruct-Q4_K_L.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_L.gguf) | Q4_K_L | 20.43GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
|
38 |
-
| [Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf) | Q4_K_M | 19.85GB | false | Good quality, default size for
|
39 |
| [Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf) | Q4_K_S | 18.78GB | false | Slightly lower quality with more space savings, *recommended*. |
|
40 |
| [Qwen2.5-Coder-32B-Instruct-Q4_0.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_0.gguf) | Q4_0 | 18.71GB | false | Legacy format, generally not worth using over similarly sized formats |
|
41 |
| [Qwen2.5-Coder-32B-Instruct-IQ4_NL.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ4_NL.gguf) | IQ4_NL | 18.68GB | false | Similar to IQ4_XS, but slightly larger. |
|
@@ -50,7 +49,6 @@ Run them in [LM Studio](https://lmstudio.ai/)
|
|
50 |
| [Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf) | Q3_K_S | 14.39GB | false | Low quality, not recommended. |
|
51 |
| [Qwen2.5-Coder-32B-Instruct-IQ3_XS.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ3_XS.gguf) | IQ3_XS | 13.71GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
|
52 |
| [Qwen2.5-Coder-32B-Instruct-Q2_K_L.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q2_K_L.gguf) | Q2_K_L | 13.07GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
|
53 |
-
| [Qwen2.5-Coder-32B-Instruct-IQ3_XXS.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ3_XXS.gguf) | IQ3_XXS | 12.84GB | false | Lower quality, new method with decent performance, comparable to Q3 quants. |
|
54 |
| [Qwen2.5-Coder-32B-Instruct-Q2_K.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q2_K.gguf) | Q2_K | 12.31GB | false | Very low quality but surprisingly usable. |
|
55 |
| [Qwen2.5-Coder-32B-Instruct-IQ2_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ2_M.gguf) | IQ2_M | 11.26GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
|
56 |
| [Qwen2.5-Coder-32B-Instruct-IQ2_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ2_S.gguf) | IQ2_S | 10.39GB | false | Low quality, uses SOTA techniques to be usable. |
|
@@ -121,8 +119,8 @@ The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have
|
|
121 |
|
122 |
## Credits
|
123 |
|
124 |
-
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
|
125 |
|
126 |
-
Thank you ZeroWw for the inspiration to experiment with embed/output
|
127 |
|
128 |
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
|
|
|
1 |
---
|
2 |
quantized_by: bartowski
|
3 |
pipeline_tag: text-generation
|
|
|
4 |
---
|
5 |
|
6 |
## Llamacpp imatrix Quantizations of Qwen2.5-Coder-32B-Instruct
|
|
|
34 |
| [Qwen2.5-Coder-32B-Instruct-Q5_K_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q5_K_M.gguf) | Q5_K_M | 23.26GB | false | High quality, *recommended*. |
|
35 |
| [Qwen2.5-Coder-32B-Instruct-Q5_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q5_K_S.gguf) | Q5_K_S | 22.64GB | false | High quality, *recommended*. |
|
36 |
| [Qwen2.5-Coder-32B-Instruct-Q4_K_L.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_L.gguf) | Q4_K_L | 20.43GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
|
37 |
+
| [Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_M.gguf) | Q4_K_M | 19.85GB | false | Good quality, default size for most use cases, *recommended*. |
|
38 |
| [Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf) | Q4_K_S | 18.78GB | false | Slightly lower quality with more space savings, *recommended*. |
|
39 |
| [Qwen2.5-Coder-32B-Instruct-Q4_0.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q4_0.gguf) | Q4_0 | 18.71GB | false | Legacy format, generally not worth using over similarly sized formats |
|
40 |
| [Qwen2.5-Coder-32B-Instruct-IQ4_NL.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ4_NL.gguf) | IQ4_NL | 18.68GB | false | Similar to IQ4_XS, but slightly larger. |
|
|
|
49 |
| [Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf) | Q3_K_S | 14.39GB | false | Low quality, not recommended. |
|
50 |
| [Qwen2.5-Coder-32B-Instruct-IQ3_XS.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ3_XS.gguf) | IQ3_XS | 13.71GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
|
51 |
| [Qwen2.5-Coder-32B-Instruct-Q2_K_L.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q2_K_L.gguf) | Q2_K_L | 13.07GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
|
|
|
52 |
| [Qwen2.5-Coder-32B-Instruct-Q2_K.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-Q2_K.gguf) | Q2_K | 12.31GB | false | Very low quality but surprisingly usable. |
|
53 |
| [Qwen2.5-Coder-32B-Instruct-IQ2_M.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ2_M.gguf) | IQ2_M | 11.26GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
|
54 |
| [Qwen2.5-Coder-32B-Instruct-IQ2_S.gguf](https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/Qwen2.5-Coder-32B-Instruct-IQ2_S.gguf) | IQ2_S | 10.39GB | false | Low quality, uses SOTA techniques to be usable. |
|
|
|
119 |
|
120 |
## Credits
|
121 |
|
122 |
+
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
|
123 |
|
124 |
+
Thank you ZeroWw for the inspiration to experiment with embed/output.
|
125 |
|
126 |
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
|