metadata

license: llama2
tags:
  - llama2
  - text-generation-inference
base_midel: meta-llama/Llama-2-7b-chat-hf

llama-2-7b-chat_q4_quantized_cpp

This model contains the 4-bit quantized version of llama2-7B-chat model in cpp.
This can be run on a local cpu system as a cpp module (instructions for the same are given below).
As for the testing, the model has been tested on Linux(Ubuntu) os with 12 GB RAM and core i5 processor.
The performance is roughly ~3 tokens per second

Usage:

Clone the llama C++ repository from github:
git clone https://github.com/ggerganov/llama.cpp.git
Enter the llama.cpp repository(which was downloaded in the step 1) and build it by running the make command
cd llama.cpp
make
Create a directory names 7B under the directory llama.cpp/models and put the model file ggml-model-q4_0.bin under this newly created 7B directory
cd models
mkdir 7B
Navigate back to llama.cpp directory and run the below command:-
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/alpaca.txt

the initial prompt file can be changed to anything from prompts/alpaca.txt to of your choice
That's it. Enter the desired prompts and let the results surprise you...