Update README.md
Browse files
README.md
CHANGED
@@ -43,6 +43,8 @@ it contains the following quantizations of the original weights from Together's
|
|
43 |
* ~~[Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-Q8_0.gguf)~~ and
|
44 |
* ~~[F16](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-f16.gguf)~~ (unquantized)
|
45 |
|
|
|
|
|
46 |
> Nota bene: while RoPE makes inferences with large contexts possible, you still need an awful lot of RAM
|
47 |
> when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
|
48 |
> the model was fine-tuned for that size), it is recommended that you keep your context as small as possible
|
|
|
43 |
* ~~[Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-Q8_0.gguf)~~ and
|
44 |
* ~~[F16](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-f16.gguf)~~ (unquantized)
|
45 |
|
46 |
+
(strikethrough links are currently being uploaded)
|
47 |
+
|
48 |
> Nota bene: while RoPE makes inferences with large contexts possible, you still need an awful lot of RAM
|
49 |
> when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
|
50 |
> the model was fine-tuned for that size), it is recommended that you keep your context as small as possible
|