amanpreetsingh459
commited on
Commit
•
ba1963f
1
Parent(s):
d36c2a3
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
- This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
|
10 |
- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
|
11 |
- As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
|
12 |
-
- The performance is `roughly`
|
13 |
# Usage:
|
14 |
1. Clone the llama C++ repository from github:<br>
|
15 |
`git clone https://github.com/ggerganov/llama.cpp.git`
|
|
|
9 |
- This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
|
10 |
- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
|
11 |
- As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
|
12 |
+
- The performance is `roughly` **~3 tokens per second**
|
13 |
# Usage:
|
14 |
1. Clone the llama C++ repository from github:<br>
|
15 |
`git clone https://github.com/ggerganov/llama.cpp.git`
|