DeusImperator commited on
Commit
dfee10f
·
verified ·
1 Parent(s): d7c7160

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -9,9 +9,9 @@ quantized_by: DeusImperator
9
 
10
  This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
11
 
12
- This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant method for first and last layer (8bpw), all other layers of the model use normally chosen methods.
13
 
14
- I tested it briefly and it seems to work.
15
 
16
  ## Prompt Templates
17
 
@@ -20,6 +20,11 @@ Uses below format:
20
  <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|>{AI_message}<|end▁of▁sentence|><|Assistant|>
21
  ```
22
 
 
 
 
 
 
23
  ### Original readme below
24
 
25
  ---
 
9
 
10
  This is a 4.5bpw EXL2 quant of [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
11
 
12
+ This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (4k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (4.5bpw_L) inspired by quants like Q4_K_L and Q6_K_L made by [bartowski](https://huggingface.co/bartowski))
13
 
14
+ I tested it briefly and it seems to work. It fits nicely in 24GB VRAM on Windows with 16k fp16 context (should fit 2x that with q8 cache in exl2).
15
 
16
  ## Prompt Templates
17
 
 
20
  <|begin▁of▁sentence|>{system_prompt}<|User|>{prompt}<|Assistant|>{AI_message}<|end▁of▁sentence|><|Assistant|>
21
  ```
22
 
23
+ Below prompt might be useful:
24
+ ```
25
+ Think step by step about the reasoning process and then the answer. The reasoning process and answer should be enclosed in <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
26
+ ```
27
+
28
  ### Original readme below
29
 
30
  ---