TheBloke
/

Falcon-180B-Chat-GPTQ

@@ -110,8 +110,10 @@ All recent GPTQ files are made with AutoGPTQ, and all files in non-main branches
 | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
 | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
-| [main](https://huggingface.co/TheBloke/Falcon-180B-Chat-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 92.74 GB | No | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
-| [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/Falcon-180B-Chat-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 70.54 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
 <!-- README_GPTQ.md-provided-files end -->
@@ -136,7 +138,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
-  - To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit--1g-actorder_True`
   - see Provided Files above for the list of branches for each option.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done".
@@ -169,7 +171,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
 # To use a different branch, change revision
-# For example: revision="gptq-3bit--1g-actorder_True"
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                              device_map="auto",
                                              revision="main")

 | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
 | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
+| main | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 94.25 GB | No | 4-bit, with Act Order and group size 128g. Higher quality than group_size=None, but also higher VRAM usage. |
+| gptq-4bit--1g-actorder_True | 4 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 92.74 GB | No | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
+| gptq-3bit-128g-actorder_True | 3 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 73.81 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With group size 128g and act-order. Higher quality than 3bit-128g-False. |
+| gptq-3bit--1g-actorder_True | 3 | None | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 2048 | 70.54 GB | No | 3-bit, so much lower VRAM requirements but worse quality than 4-bit. With no group size for lowest possible VRAM requirements. Lower quality than 3-bit 128g. |
 <!-- README_GPTQ.md-provided-files end -->
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/Falcon-180B-Chat-GPTQ`.
+  - To download from a specific branch, enter for example `TheBloke/Falcon-180B-Chat-GPTQ:gptq-3bit-128g-actorder_True`
   - see Provided Files above for the list of branches for each option.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done".
 model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
 # To use a different branch, change revision
+# For example: revision="gptq-3bit-128g-actorder_True"
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                              device_map="auto",
                                              revision="main")