DeusImperator
/

DeepSeek-R1-Distill-Qwen-32B_exl2_4.5bpw_L

Text Generation

Model card Files Files and versions Community

DeusImperator commited on 23 days ago

Commit

dfee10f

·

verified ·

1 Parent(s): d7c7160

Upload README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -9,9 +9,9 @@ quantized_by: DeusImperator
 This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
-This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant method for first and last layer (8bpw), all other layers of the model use normally chosen methods.
-I tested it briefly and it seems to work.
 ## Prompt Templates
@@ -20,6 +20,11 @@ Uses below format:
 <｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜>{AI_message}<｜end▁of▁sentence｜><｜Assistant｜>
 ```
 ### Original readme below
 ---

 This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
+This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (4.5bpw_L) inspired by quants like Q4_K_L and Q6_K_L made by [bartowski](https://huggingface.co/bartowski))
+I tested it briefly and it seems to work. It fits nicely in 24GB VRAM on Windows with 16k fp16 context (should fit 2x that with q8 cache in exl2).
 ## Prompt Templates
 <｜begin▁of▁sentence｜>{system_prompt}<｜User｜>{prompt}<｜Assistant｜>{AI_message}<｜end▁of▁sentence｜><｜Assistant｜>
 ```
+Below prompt might be useful:
+```
+Think step by step about the reasoning process and then the answer. The reasoning process and answer should be enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
+```
 ### Original readme below
 ---