amanpreetsingh459 commited on
Commit
ba1963f
1 Parent(s): d36c2a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ tags:
9
  - This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
10
  - This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
11
  - As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
12
- - The performance is `roughly` **907.46 ms per token**, **1.10 tokens per second**
13
  # Usage:
14
  1. Clone the llama C++ repository from github:<br>
15
  `git clone https://github.com/ggerganov/llama.cpp.git`
 
9
  - This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
10
  - This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
11
  - As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
12
+ - The performance is `roughly` **~3 tokens per second**
13
  # Usage:
14
  1. Clone the llama C++ repository from github:<br>
15
  `git clone https://github.com/ggerganov/llama.cpp.git`