nisten
/

llama3-8b-instruct-32k-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Apr 24, 2024

Commit

7059481

·

verified ·

1 Parent(s): 7a48646

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -11,13 +11,18 @@ base_model: meta-llama/Meta-Llama-3-8B-Instruct
 > [!TIP]
 > You have to set context with ***-c 32000*** in llama.cpp to take advantage of this when you run it.
-## Run the model in interactive mode with a long prompt inside a textfile with -f
 ```verilog
-./main -m llama3ins-8b-32k-q4ns.gguf --temp 0.3 --color -f ../prompt19k.txt -ngl 33 -n 2000 -i -c 32000
 ```
-## Prompt format
 ```xml
 <|im_start|>system{You are a hyperintelligent hilarious raccoon that solves everything via first-principles based resoning.}<|im_end|>
@@ -78,3 +83,4 @@ Final estimate: PPL = 22.7933 +/- 1.05192
 > The ns quants are custom nisten quants and work well down to 2 bit.
 > 1.75bit quant is included for reference however perplexity tanks and is incoherent.

 > [!TIP]
 > You have to set context with ***-c 32000*** in llama.cpp to take advantage of this when you run it.
+>
+## How to run the model in interactive mode using llama.cpp with a long prompt inside a textfile with -f
 ```verilog
+git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
+./main -m llama3ins-8b-32k-q4ns.gguf --temp 0.3 --color -f mylongprompt.txt -ngl 33 -n 2000 -i -c 32000
 ```
+## Prompt format - paste up to 32000 token long prompt inside the user{} brackets
+> [!TIP] put this inside your ***longprompt.txt*** file
+> or copy from below and add to above command like this -p "<|im_start....."
 ```xml
 <|im_start|>system{You are a hyperintelligent hilarious raccoon that solves everything via first-principles based resoning.}<|im_end|>
 > The ns quants are custom nisten quants and work well down to 2 bit.
 > 1.75bit quant is included for reference however perplexity tanks and is incoherent.
+# Built with Meta Llama 3