amanpreetsingh459
/

llama-2-7b-chat_q4_quantized_cpp

text-generation-inference

Model card Files Files and versions Community

amanpreetsingh459 commited on Dec 3, 2023

Commit

2ee211e

·

1 Parent(s): 33c304c

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -4,9 +4,9 @@ license: mit
 # llama-2-7b-chat_q4_quantized_cpp
 - This model contains the 4-bit quantized version of [llama2](https://github.com/facebookresearch/llama) model in cpp.
-- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*
 - As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
 # Usage:
 1. Clone the llama C++ repository from github:<br>
   `git clone https://github.com/ggerganov/llama.cpp.git`

 # llama-2-7b-chat_q4_quantized_cpp
 - This model contains the 4-bit quantized version of [llama2](https://github.com/facebookresearch/llama) model in cpp.
+- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
 - As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
+- The performance is `roughly` **907.46 ms per token**, **1.10 tokens per second**
 # Usage:
 1. Clone the llama C++ repository from github:<br>
   `git clone https://github.com/ggerganov/llama.cpp.git`