rozek
/

LLaMA-2-7B-32K_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 28, 2023

Commit

6a4410d

·

1 Parent(s): 53efd1d

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -1,3 +1,34 @@
 ---
-license: mit
 ---

 ---
+license: llama2
+tags:
+- llama2
+- quantized
+- gguf
+- 32k-context
 ---
+# LLaMA-2-7B-32K #
+[Together Computer, Inc.](https://together.ai/) has released
+[LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K), a model based on Meta AI's LLaMA-2-7B,
+but fine-tuned for context lengths up to 32K using "Position interpolation" and "Rotary Position Embeddings"
+(RoPE).
+The current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) supports such large context lengths
+by means of the new [`--rope-scale`](https://github.com/ggerganov/llama.cpp/tree/master/examples/main#extended-context-size)
+parameter.
+> Nota bene: for the model described here the `--rope-scale` is `8` (original context size was 4k, the
+> fine-tuned one is 32k)
+However, llama.cpp requires quantized files in the new GGUF format - that's where this repo comes in:
+it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
+the file names)
+Concerning the license(s):
+* the [orignal model](https://ai.meta.com/llama/) (from Meta AI) was released under a rather [permittive
+license](https://ai.meta.com/llama/license/)
+* the fine tuned model from Together Computer uses the
+[same license](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/README.md)
+* as a consequence, this repo does so as well