rozek commited on
Commit
cbad4af
1 Parent(s): 4d7e25e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -16,10 +16,25 @@ but fine-tuned for context lengths up to 32K using "Position interpolation" and
16
 
17
  While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
18
  context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
19
- it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
20
- the file names)
21
 
22
- ## How the Quantization was done ##
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
25
  using [Docker](https://www.docker.com/).
 
16
 
17
  While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
18
  context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
19
+ it contains the following quantizations of the original weights from Together's fined-tuned model
 
20
 
21
+ * [Q2_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q2_K.gguf)
22
+ * [Q3_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_S.gguf)
23
+ * [Q3_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_M.gguf) (aka Q3_K)
24
+ * [Q3_K_L](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_L.gguf)
25
+ * [Q4_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_0.gguf)
26
+ * [Q4_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_1.gguf)
27
+ * [Q4_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_S.gguf)
28
+ * [Q4_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_M.gguf) (aka Q4_K)
29
+ * [Q5_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_0.gguf)
30
+ * [Q5_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_1.gguf)
31
+ * [Q5_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_S.gguf)
32
+ * [Q5_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_M.gguf) (aka Q5_K)
33
+ * [Q6_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q6_K.gguf)
34
+ * [Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q8_0.gguf)
35
+ * [F16](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-f16.gguf)
36
+
37
+ ## How Quantization was done ##
38
 
39
  Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
40
  using [Docker](https://www.docker.com/).