rozek
/

LLaMA-2-7B-32K_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 30, 2023

Commit

cbad4af

•

1 Parent(s): 4d7e25e

Update README.md

Files changed (1) hide show

README.md +18 -3

README.md CHANGED Viewed

@@ -16,10 +16,25 @@ but fine-tuned for context lengths up to 32K using "Position interpolation" and
 While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
 context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
-it contains a few quantizations of the original weights from Together's fined-tuned model (as indicated by
-the file names)
-## How the Quantization was done ##
 Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
 using [Docker](https://www.docker.com/).

 While the current version of [llama.cpp](https://github.com/ggerganov/llama.cpp) already supports such large
 context lengths, it requires quantized files in the new GGUF format - and that's where this repo comes in:
+it contains the following quantizations of the original weights from Together's fined-tuned model
+* [Q2_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q2_K.gguf)
+* [Q3_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_S.gguf)
+* [Q3_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_M.gguf) (aka Q3_K)
+* [Q3_K_L](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q3_K_L.gguf)
+* [Q4_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_0.gguf)
+* [Q4_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_1.gguf)
+* [Q4_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_S.gguf)
+* [Q4_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q4_K_M.gguf) (aka Q4_K)
+* [Q5_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_0.gguf)
+* [Q5_1](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_1.gguf)
+* [Q5_K_S](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_S.gguf)
+* [Q5_K_M](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q5_K_M.gguf) (aka Q5_K)
+* [Q6_K](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q6_K.gguf)
+* [Q8_0](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-Q8_0.gguf)
+* [F16](https://huggingface.co/rozek/LLaMA-2-7B-32K_GGUF/blob/main/LLaMA-2-7B-32K-f16.gguf)
+## How Quantization was done ##
 Since the author does not want arbitrary Python stuff to loiter on his computer, the quantization was done
 using [Docker](https://www.docker.com/).