Update README.md
Browse files
README.md
CHANGED
@@ -15,8 +15,8 @@ I also have 4bit GPTQ files for GPU inference available here: [TheBloke/alpaca-l
|
|
15 |
`alpaca-lora-65B.ggml.q2_0.bin` | q2_0 | 2bit | 24.5GB | 27GB | Lowest RAM requirements, minimum quality |
|
16 |
`alpaca-lora-65B.ggml.q4_0.bin` | q4_0 | 4bit | 40.8GB | 43GB | Maximum compatibility |
|
17 |
`alpaca-lora-65B.ggml.q4_2.bin` | q4_2 | 4bit | 40.8GB | 43GB | Best compromise between resources, speed and quality |
|
18 |
-
`alpaca-lora-65B.ggml.q5_0.bin` | q5_0 |
|
19 |
-
`alpaca-lora-65B.ggml.q5_1.bin` | q5_1 |
|
20 |
|
21 |
* The q2_0 file requires the least resources, but does not have great quality compared to the others.
|
22 |
* It's likely to be better to use a 30B model at 4bit vs a 65B model at 2bit.
|
|
|
15 |
`alpaca-lora-65B.ggml.q2_0.bin` | q2_0 | 2bit | 24.5GB | 27GB | Lowest RAM requirements, minimum quality |
|
16 |
`alpaca-lora-65B.ggml.q4_0.bin` | q4_0 | 4bit | 40.8GB | 43GB | Maximum compatibility |
|
17 |
`alpaca-lora-65B.ggml.q4_2.bin` | q4_2 | 4bit | 40.8GB | 43GB | Best compromise between resources, speed and quality |
|
18 |
+
`alpaca-lora-65B.ggml.q5_0.bin` | q5_0 | 5bit | 44.9GB | 47GB | Brand new 5bit method. Potentially higher quality than 4bit, at cost of slightly higher resources. |
|
19 |
+
`alpaca-lora-65B.ggml.q5_1.bin` | q5_1 | 5bit | 49GB | 51GB | Brand new 5bit method. Slightly higher resource usage than q5_0. |
|
20 |
|
21 |
* The q2_0 file requires the least resources, but does not have great quality compared to the others.
|
22 |
* It's likely to be better to use a 30B model at 4bit vs a 65B model at 2bit.
|