Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
---
|
2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: llama2
|
3 |
+
tags:
|
4 |
+
- llama2
|
5 |
+
- quantized
|
6 |
+
- gguf
|
7 |
+
- 32k-context
|
8 |
---
|
9 |
+
|
10 |
+
# LLaMA-2-7B-32K #
|
11 |
+
|
12 |
+
[Together Computer, Inc.](https://together.ai/) has released
|
13 |
+
[LLaMA-2-7B-32K](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K), a model based on Meta AI's LLaMA-2-7B,
|
14 |
+
but fine-tuned for context lengths up to 32K using "Position interpolation" and "Rotary Position Embeddings"
|
15 |
+
(RoPE).
|
16 |
+
|
17 |
+
The current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) supports such large context lengths
|
18 |
+
by means of the new [`--rope-scale`](https://github.com/ggerganov/llama.cpp/tree/master/examples/main#extended-context-size)
|
19 |
+
parameter.
|
20 |
+
|
21 |
+
> Nota bene: for the model described here the `--rope-scale` is `8` (original context size was 4k, the
|
22 |
+
> fine-tuned one is 32k)
|
23 |
+
|
24 |
+
However, llama.cpp requires quantized files in the new GGUF format - that's where this repo comes in:
|
25 |
+
it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
|
26 |
+
the file names)
|
27 |
+
|
28 |
+
Concerning the license(s):
|
29 |
+
|
30 |
+
* the [orignal model](https://ai.meta.com/llama/) (from Meta AI) was released under a rather [permittive
|
31 |
+
license](https://ai.meta.com/llama/license/)
|
32 |
+
* the fine tuned model from Together Computer uses the
|
33 |
+
[same license](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/README.md)
|
34 |
+
* as a consequence, this repo does so as well
|