cgus
/

Qwen2.5-14B-Instruct-abliterated-exl2

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

cgus commited on Nov 12, 2024

Commit

0889bf6

·

verified ·

1 Parent(s): 740e43c

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -20,10 +20,13 @@ Made by: [huihui-ai](https://huggingface.co/huihui-ai)
 [4.5bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/4.5bpw-h6)
 [5bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/5bpw-h6)
 [6bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/6bpw-h6)
 ## Quantization notes
-Made with Exllamav2 0.2.3 with the default dataset. These quants are meant for modern RTX cards on Windows/Linux or AMD on Linux.
-The model have to fit the GPU to work properly. For example RTX3060/12GB should be able to load 4bpw with 16k context.
 It requires an app with Exllamav2 loader, such as Text-Generation-WebUI, TabbyAPI and some others.
 # Original model card

 [4.5bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/4.5bpw-h6)
 [5bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/5bpw-h6)
 [6bpw h6](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2/tree/6bpw-h6)
+Didn't make 8bpw.
 ## Quantization notes
+I accidentally made these quants and didn't finish 8bpw after noticing [v2 version](https://huggingface.co/cgus/Qwen2.5-14B-Instruct-abliterated-exl2), that's why 8bpw quant is missing.
+Made with Exllamav2 0.2.3 with the default dataset. These require modern RTX cards on Windows/Linux or AMD on Linux.
+The model have to fit the GPU to work properly. For example RTX3060/12GB should be able to load 4.5-5bpw/Q6 cache and 16k context.
 It requires an app with Exllamav2 loader, such as Text-Generation-WebUI, TabbyAPI and some others.
 # Original model card