Update README.md
Browse files
README.md
CHANGED
@@ -16,10 +16,25 @@ but fine-tuned for context lengths up to 32K using "Position interpolation" and
|
|
16 |
|
17 |
While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
|
18 |
context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
|
19 |
-
it contains
|
20 |
-
the file names)
|
21 |
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
|
25 |
using [Docker](https://www.docker.com/).
|
|
|
16 |
|
17 |
While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
|
18 |
context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
|
19 |
+
it contains the following quantizations of the original weights from Together's fined-tuned model
|
|
|
20 |
|
21 |
+
* [Q2_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q2_K.gguf)
|
22 |
+
* [Q3_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_S.gguf)
|
23 |
+
* [Q3_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_M.gguf) (aka Q3_K)
|
24 |
+
* [Q3_K_L](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_L.gguf)
|
25 |
+
* [Q4_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_0.gguf)
|
26 |
+
* [Q4_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_1.gguf)
|
27 |
+
* [Q4_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_S.gguf)
|
28 |
+
* [Q4_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_M.gguf) (aka Q4_K)
|
29 |
+
* [Q5_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_0.gguf)
|
30 |
+
* [Q5_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_1.gguf)
|
31 |
+
* [Q5_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_S.gguf)
|
32 |
+
* [Q5_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_M.gguf) (aka Q5_K)
|
33 |
+
* [Q6_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q6_K.gguf)
|
34 |
+
* [Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q8_0.gguf)
|
35 |
+
* [F16](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-f16.gguf)
|
36 |
+
|
37 |
+
## How Quantization was done ##
|
38 |
|
39 |
Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
|
40 |
using [Docker](https://www.docker.com/).
|