Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
31b6c53
1 Parent(s): 0af79da

Upload new GPTQs with varied parameters

Browse files
Files changed (1) hide show
  1. README.md +1 -4
README.md CHANGED
@@ -44,11 +44,8 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
44
  <|user|>
45
  {prompt}
46
  <|assistant|>
47
-
48
  ```
49
 
50
- Note: it is important to add a line break (`\n`) after the `<|assistant|>` token in the prompt template.
51
-
52
  ## Provided files
53
 
54
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
@@ -59,7 +56,7 @@ Each separate quant is in a different branch. See below for instructions on fet
59
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
60
  | main | 4 | 128 | False | 3.90 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
61
  | gptq-4bit-32g-actorder_True | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 32g gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
62
- | gptq-4bit-64g-actorder_True | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 64g uses less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
63
  | gptq-4bit-128g-actorder_True | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 128g uses even less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
64
  | gptq-8bit--1g-actorder_True | 8 | None | True | 7.01 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
65
  | gptq-8bit-128g-actorder_False | 8 | 128 | False | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |
 
44
  <|user|>
45
  {prompt}
46
  <|assistant|>
 
47
  ```
48
 
 
 
49
  ## Provided files
50
 
51
  Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
 
56
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
57
  | main | 4 | 128 | False | 3.90 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
58
  | gptq-4bit-32g-actorder_True | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 32g gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
59
+ | gptq-4bit-64g-actorder_True | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 64g uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
60
  | gptq-4bit-128g-actorder_True | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | 4-bit, with Act Order and group size. 128g uses even less VRAM, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
61
  | gptq-8bit--1g-actorder_True | 8 | None | True | 7.01 GB | False | AutoGPTQ | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
62
  | gptq-8bit-128g-actorder_False | 8 | 128 | False | 7.16 GB | False | AutoGPTQ | 8-bit, with group size 128g for higher inference quality and without Act Order to improve AutoGPTQ speed. |