TheBloke commited on
Commit
6562363
·
1 Parent(s): 49a66e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -15,8 +15,8 @@ I also have 4bit GPTQ files for GPU inference available here: [TheBloke/alpaca-l
15
  `alpaca-lora-65B.ggml.q2_0.bin` | q2_0 | 2bit | 24.5GB | 27GB | Lowest RAM requirements, minimum quality |
16
  `alpaca-lora-65B.ggml.q4_0.bin` | q4_0 | 4bit | 40.8GB | 43GB | Maximum compatibility |
17
  `alpaca-lora-65B.ggml.q4_2.bin` | q4_2 | 4bit | 40.8GB | 43GB | Best compromise between resources, speed and quality |
18
- `alpaca-lora-65B.ggml.q5_0.bin` | q5_0 | 4bit | 44.9GB | 47GB | Best compromise between resources, speed and quality |
19
- `alpaca-lora-65B.ggml.q5_1.bin` | q5_1 | 4bit | 49GB | 51GB | Best compromise between resources, speed and quality |
20
 
21
  * The q2_0 file requires the least resources, but does not have great quality compared to the others.
22
  * It's likely to be better to use a 30B model at 4bit vs a 65B model at 2bit.
 
15
  `alpaca-lora-65B.ggml.q2_0.bin` | q2_0 | 2bit | 24.5GB | 27GB | Lowest RAM requirements, minimum quality |
16
  `alpaca-lora-65B.ggml.q4_0.bin` | q4_0 | 4bit | 40.8GB | 43GB | Maximum compatibility |
17
  `alpaca-lora-65B.ggml.q4_2.bin` | q4_2 | 4bit | 40.8GB | 43GB | Best compromise between resources, speed and quality |
18
+ `alpaca-lora-65B.ggml.q5_0.bin` | q5_0 | 5bit | 44.9GB | 47GB | Brand new 5bit method. Potentially higher quality than 4bit, at cost of slightly higher resources. |
19
+ `alpaca-lora-65B.ggml.q5_1.bin` | q5_1 | 5bit | 49GB | 51GB | Brand new 5bit method. Slightly higher resource usage than q5_0. |
20
 
21
  * The q2_0 file requires the least resources, but does not have great quality compared to the others.
22
  * It's likely to be better to use a 30B model at 4bit vs a 65B model at 2bit.