Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,7 @@ It was created with GPTQ-for-LLaMA with group size 32 and act order true as para
|
|
| 7 |
|
| 8 |
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
| 9 |
|
| 10 |
-
Use
|
| 11 |
-
|
| 12 |
-
If max_seq_len = 4096, compress_pos_emb = 2
|
| 13 |
-
|
| 14 |
-
If max_seq_len = 8192, compress_pos_emb = 4
|
| 15 |
|
| 16 |
If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:
|
| 17 |
|
|
|
|
| 7 |
|
| 8 |
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
| 9 |
|
| 10 |
+
Use compress_pos_emb = 4 for any context up to 8192 context.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:
|
| 13 |
|