rozek
/

LLaMA-2-7B-32K_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 28, 2023

Commit

c9dfcc4

·

1 Parent(s): 054943d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -75,8 +75,8 @@ python3 convert.py ../LLaMA-2-7B-32K
 ./quantize ../LLaMA-2-7B-32K/ggml-model-f16.gguf \
    ../LLaMA-2-7B-32K/LLaMA-2-7B-32K-Q4_0.gguf Q4_0
 ```
-11. run any quantizations you need and stop the container again (you may even delete it as the generated files
-will remain available on your host computer
 You are now free to move the quanitization results to where you need them and run inferences with context
 lengths up to 32K (depending on the amount of memory you will have available - long contexts need an awful

 ./quantize ../LLaMA-2-7B-32K/ggml-model-f16.gguf \
    ../LLaMA-2-7B-32K/LLaMA-2-7B-32K-Q4_0.gguf Q4_0
 ```
+11. run any quantizations you need and stop the container when finished (you may even delete it as the generated files
+will remain available on your host computer)
 You are now free to move the quanitization results to where you need them and run inferences with context
 lengths up to 32K (depending on the amount of memory you will have available - long contexts need an awful