Upload README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ quantized_by: DeusImperator
|
|
9 |
|
10 |
This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
|
11 |
|
12 |
-
This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant
|
13 |
|
14 |
-
I tested it briefly and it seems to work.
|
15 |
|
16 |
## Prompt Templates
|
17 |
|
@@ -20,6 +20,11 @@ Uses below format:
|
|
20 |
<|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|>{AI_message}<|end▁of▁sentence|><|Assistant|>
|
21 |
```
|
22 |
|
|
|
|
|
|
|
|
|
|
|
23 |
### Original readme below
|
24 |
|
25 |
---
|
|
|
9 |
|
10 |
This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
|
11 |
|
12 |
+
This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (4.5bpw_L) inspired by quants like Q4_K_L and Q6_K_L made by [bartowski](https://huggingface.co/bartowski))
|
13 |
|
14 |
+
I tested it briefly and it seems to work. It fits nicely in 24GB VRAM on Windows with 16k fp16 context (should fit 2x that with q8 cache in exl2).
|
15 |
|
16 |
## Prompt Templates
|
17 |
|
|
|
20 |
<|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|>{AI_message}<|end▁of▁sentence|><|Assistant|>
|
21 |
```
|
22 |
|
23 |
+
Below prompt might be useful:
|
24 |
+
```
|
25 |
+
Think step by step about the reasoning process and then the answer. The reasoning process and answer should be enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
|
26 |
+
```
|
27 |
+
|
28 |
### Original readme below
|
29 |
|
30 |
---
|