rozek
/

LLaMA-2-7B-32K_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 29, 2023

Commit

e18ce0a

·

1 Parent(s): 3406223

Update README.md

Files changed (1) hide show

README.md +2 -8

README.md CHANGED Viewed

@@ -14,14 +14,8 @@ tags:
 but fine-tuned for context lengths up to 32K using "Position interpolation" and "Rotary Position Embeddings"
 (RoPE).
-The current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) supports such large context lengths
-by means of the new [`--rope-scale`](https://github.com/ggerganov/llama.cpp/tree/master/examples/main#extended-context-size)
-parameter.
-> Nota bene: for the model described here the `--rope-scale` is `8` (original context size was 4k, the
-> fine-tuned one is 32k)
-However, llama.cpp requires quantized files in the new GGUF format - that's where this repo comes in:
 it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
 the file names)

 but fine-tuned for context lengths up to 32K using "Position interpolation" and "Rotary Position Embeddings"
 (RoPE).
+While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
+context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
 it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
 the file names)