Update README.md
Browse files
README.md
CHANGED
@@ -110,8 +110,10 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
|
|
110 |
|
111 |
| Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
|
112 |
| ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
|
113 |
-
|
|
114 |
-
|
|
|
|
|
|
115 |
|
116 |
<!-- README_GPTQ.md-provided-files end -->
|
117 |
|
@@ -136,7 +138,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
|
|
136 |
|
137 |
1. Click the **Model tab**.
|
138 |
2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
|
139 |
-
- To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit
|
140 |
- see Provided Files above for the list of branches for each option.
|
141 |
3. Click **Download**.
|
142 |
4. The model will start downloading. Once it's finished it will say "Done".
|
@@ -169,7 +171,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
|
169 |
model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
|
170 |
|
171 |
# To use a different branch, change revision
|
172 |
-
# For example: revision="gptq-3bit
|
173 |
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
|
174 |
device_map="auto",
|
175 |
revision="main")
|
|
|
110 |
|
111 |
| Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
|
112 |
| ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
|
113 |
+
| main | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 94.25 GB | No | 4-bit, with Act Order and group size 128g. Higher quality than group_size=None, but also higher VRAM usage. |
|
114 |
+
| gptq-4bit--1g-actorder_True | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 92.74 GB | No | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
|
115 |
+
| gptq-3bit-128g-actorder_True | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 73.81 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With group size 128g and act-order. Higher quality than 3bit-128g-False. |
|
116 |
+
| gptq-3bit--1g-actorder_True | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 70.54 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With no group size for lowest possible VRAM requirements. Lower quality than 3-bit 128g. |
|
117 |
|
118 |
<!-- README_GPTQ.md-provided-files end -->
|
119 |
|
|
|
138 |
|
139 |
1. Click the **Model tab**.
|
140 |
2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
|
141 |
+
- To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit-128g-actorder_True`
|
142 |
- see Provided Files above for the list of branches for each option.
|
143 |
3. Click **Download**.
|
144 |
4. The model will start downloading. Once it's finished it will say "Done".
|
|
|
171 |
model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
|
172 |
|
173 |
# To use a different branch, change revision
|
174 |
+
# For example: revision="gptq-3bit-128g-actorder_True"
|
175 |
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
|
176 |
device_map="auto",
|
177 |
revision="main")
|