rozek
/

LLaMA-2-7B-32K-Instruct_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 30, 2023

Commit

2367165

•

1 Parent(s): 0b71789

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -43,6 +43,8 @@ it contains the following quantizations of the original weights from Together's
 * ~~[Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-Q8_0.gguf)~~ and
 * ~~[F16](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-f16.gguf)~~ (unquantized)
 > Nota bene: while RoPE makes inferences with large contexts possible, you still need an awful lot of RAM
 > when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
 > the model was fine-tuned for that size), it is recommended that you keep your context as small as possible

 * ~~[Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-Q8_0.gguf)~~ and
 * ~~[F16](https://huggingface.co/rozek/LLaMA-2-7B-32K-Instruct_GGUF/blob/main/LLaMA-2-7B-32K-Instruct-f16.gguf)~~ (unquantized)
+(strikethrough links are currently being uploaded)
 > Nota bene: while RoPE makes inferences with large contexts possible, you still need an awful lot of RAM
 > when doing so. And since "32K" does not mean that you always have to use a context size of 32768 (only that
 > the model was fine-tuned for that size), it is recommended that you keep your context as small as possible