Update README.md
Browse files
README.md
CHANGED
@@ -47,6 +47,12 @@ it contains the following quantizations of the original weights from Together's
|
|
47 |
> when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
|
48 |
> the model was fine-tuned for that size), it is recommended that you keep your context as small as possible
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
## How Quantization was done ##
|
51 |
|
52 |
Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
|
|
|
47 |
> when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
|
48 |
> the model was fine-tuned for that size), it is recommended that you keep your context as small as possible
|
49 |
|
50 |
+
> If you need quantizations for Together Computer's
|
51 |
+
> [Llama-2-7B-32K-Instruct](https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct/tree/main)
|
52 |
+
> model, then look for
|
53 |
+
> [LLaMA-2-7B-32K-Instruct_GGUF](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/upload/main)
|
54 |
+
> which is currently being uploaded
|
55 |
+
|
56 |
## How Quantization was done ##
|
57 |
|
58 |
Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
|