TheBloke commited on
Commit
377f10e
·
1 Parent(s): 13f6b45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -110,8 +110,10 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
110
 
111
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
112
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
113
- | [main](https://huggingface.co/TheBloke/Falcon-180B-Chat-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 92.74 GB | No | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
114
- | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/Falcon-180B-Chat-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 70.54 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
 
 
115
 
116
  <!-- README_GPTQ.md-provided-files end -->
117
 
@@ -136,7 +138,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
136
 
137
  1. Click the **Model tab**.
138
  2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
139
- - To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit--1g-actorder_True`
140
  - see Provided Files above for the list of branches for each option.
141
  3. Click **Download**.
142
  4. The model will start downloading. Once it's finished it will say "Done".
@@ -169,7 +171,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
169
  model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
170
 
171
  # To use a different branch, change revision
172
- # For example: revision="gptq-3bit--1g-actorder_True"
173
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
174
  device_map="auto",
175
  revision="main")
 
110
 
111
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
112
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
113
+ | main | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 94.25 GB | No | 4-bit, with Act Order and group size 128g. Higher quality than group_size=None, but also higher VRAM usage. |
114
+ | gptq-4bit--1g-actorder_True | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 92.74 GB | No | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
115
+ | gptq-3bit-128g-actorder_True | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 73.81 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With group size 128g and act-order. Higher quality than 3bit-128g-False. |
116
+ | gptq-3bit--1g-actorder_True | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 70.54 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With no group size for lowest possible VRAM requirements. Lower quality than 3-bit 128g. |
117
 
118
  <!-- README_GPTQ.md-provided-files end -->
119
 
 
138
 
139
  1. Click the **Model tab**.
140
  2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
141
+ - To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit-128g-actorder_True`
142
  - see Provided Files above for the list of branches for each option.
143
  3. Click **Download**.
144
  4. The model will start downloading. Once it's finished it will say "Done".
 
171
  model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
172
 
173
  # To use a different branch, change revision
174
+ # For example: revision="gptq-3bit-128g-actorder_True"
175
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
176
  device_map="auto",
177
  revision="main")