File size: 1,573 Bytes

9ab8626
ba7f279
d36c2a3
 
 
e6c29f4
9ab8626
bfcebe0
33c304c
ae9ad79
2ee211e
33c304c
ba1963f
33c304c
 
 
 
 
 
 
 
 
 
 
 
 
bfcebe0
 
 
 
d36c2a3

---
license: llama2
tags:
- llama2
- text-generation-inference
base_midel: meta-llama/Llama-2-7b-chat-hf
---

# llama-2-7b-chat_q4_quantized_cpp
- This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
- As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
- The performance is `roughly` **~3 tokens per second**
# Usage:
1. Clone the llama C++ repository from github:<br>
  `git clone https://github.com/ggerganov/llama.cpp.git`
2. Enter the **llama.cpp** repository(which was downloaded in the step 1) and build it by running the **make** command<br>
  `cd llama.cpp` <br>
  `make`
3. Create a directory names **7B** under the directory **llama.cpp/models** and put the model file **ggml-model-q4_0.bin** under this newly created **7B** directory<br>
  `cd models` <br>
  `mkdir 7B`
4. Navigate back to **llama.cpp** directory and run the below command:-<br>
  `./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/alpaca.txt` <br>
   > the initial prompt file can be changed to anything from `prompts/alpaca.txt` to of your choice
5. That's it. Enter the desired prompts and let the results surprise you...

# Credits:
1. https://github.com/facebookresearch/llama
2. https://github.com/ggerganov/llama.cpp
3. https://medium.com/@karankakwani/build-and-run-llama2-llm-locally-a3b393c1570e