TheBloke commited on
Commit
2b78e44
1 Parent(s): 992cd84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -49
README.md CHANGED
@@ -43,36 +43,18 @@ This repo contains GPTQ model files for [Technology Innovation Institute's Falco
43
 
44
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
45
 
46
-
47
- ## EXPERIMENTAL
48
-
49
- These are experimental first GPTQs for Falcon 180B. They have not yet been tested.
50
 
51
  Transformers version 4.33.0 is required.
52
 
53
- In order to make them, a small change was needed to AutoGPTQ to add support for the new model_type name `falcon`. You will need to merge this PR before you can attempt to load them in AutoGPTQ: https://github.com/PanQiWei/AutoGPTQ/pull/326
54
-
55
- Once this change has made, they should be usable just like any other GPTQ model. You can try the example Transformers Python code later in this README, or try loading them directly from AutoGPTQ.
56
-
57
- I believe you will need 2 x 80GB GPUs (or 4 x 48GB) to load the 4-bit models, and probably the 3-bit ones as well.
58
-
59
- Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
60
 
61
- ## SPLIT FILES
62
 
63
- Due to the HF 50GB file limit, and the fact that GPTQ does not currently support sharding, I have had to split the `model.safetensors` file.
 
 
64
 
65
- To join it:
66
-
67
- Linux and macOS:
68
- ```
69
- cat model.safetensors-split-* > model.safetensors && rm model.safetensors-split-*
70
- ```
71
- Windows command line:
72
- ```
73
- COPY /B model.safetensors.split-a + model.safetensors.split-b model.safetensors
74
- del model.safetensors.split-a model.safetensors.split-b
75
- ```
76
  <!-- description end -->
77
 
78
  <!-- repositories-available start -->
@@ -159,40 +141,22 @@ It is strongly recommended to use the text-generation-webui one-click-installers
159
 
160
  ### Install the necessary packages
161
 
162
- Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ compiled from source with a patch.
163
 
164
  ```shell
165
  pip3 install transformers>=4.33.0 optimum>=1.12.0
166
- pip3 uninstall -y auto-gptq
167
- git clone -b TB_Latest_Falcon https://github.com/TheBloke/AutoGPTQ
168
- cd AutoGPTQ
169
- pip3 install .
170
  ```
171
 
172
- ### You then need to manually download the repo so it can be merged
173
-
174
- I recommend using my fast download script
175
-
176
- ```shell
177
- git clone https://github.com/TheBlokeAI/AIScripts
178
- python3 AIScripts/hub_download.py TheBloke/Falcon-180B-Chat-GPTQ Falcon-180B-Chat-GPTQ --branch main # change branch if you want to use the 3-bit model instead
179
- ```
180
-
181
- ### Now join the files
182
-
183
- ```shell
184
- cd Falcon-180B-Chat-GPTQ
185
- # Windows users: see the command to use in the Description at the top of this README
186
- cat model.safetensors-split-* > model.safetensors && rm model.safetensors-split-*
187
- ```
188
-
189
- ### And then finally you can run the following code
190
 
191
  ```python
192
  from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
193
 
194
- model_name_or_path = "/path/to/Falcon-180B-Chat-GPTQ" # change this to the path you downloaded the model to
195
 
 
 
196
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
197
  device_map="auto",
198
  revision="main")
@@ -206,7 +170,7 @@ Assistant: '''
206
  print("\n\n*** Generate:")
207
 
208
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
209
- output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
210
  print(tokenizer.decode(output[0]))
211
 
212
  # Inference can also be done using transformers' pipeline
@@ -218,6 +182,7 @@ pipe = pipeline(
218
  tokenizer=tokenizer,
219
  max_new_tokens=512,
220
  temperature=0.7,
 
221
  top_p=0.95,
222
  repetition_penalty=1.15
223
  )
 
43
 
44
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
45
 
46
+ ## Requirements
 
 
 
47
 
48
  Transformers version 4.33.0 is required.
49
 
50
+ Due to the huge size of the model, the GPTQ has been sharded. This will break compatibility with AutoGPTQ, and therefore any clients/libraries that use AutoGPTQ directly.
 
 
 
 
 
 
51
 
52
+ But they work great direct from Transformers!
53
 
54
+ Currently these GPTQs are tested to work with:
55
+ - Transformers 4.33.0
56
+ - Text Generation Inference (TGI) 1.0.2
57
 
 
 
 
 
 
 
 
 
 
 
 
58
  <!-- description end -->
59
 
60
  <!-- repositories-available start -->
 
141
 
142
  ### Install the necessary packages
143
 
144
+ Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ.
145
 
146
  ```shell
147
  pip3 install transformers>=4.33.0 optimum>=1.12.0
148
+ pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
 
 
 
149
  ```
150
 
151
+ ### Transformers sample code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  ```python
154
  from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
155
 
156
+ model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
157
 
158
+ # To use a different branch, change revision
159
+ # For example: revision="gptq-3bit--1g-actorder_True"
160
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
161
  device_map="auto",
162
  revision="main")
 
170
  print("\n\n*** Generate:")
171
 
172
  input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
173
+ output = model.generate(inputs=input_ids, do_sample=True, temperature=0.7, max_new_tokens=512)
174
  print(tokenizer.decode(output[0]))
175
 
176
  # Inference can also be done using transformers' pipeline
 
182
  tokenizer=tokenizer,
183
  max_new_tokens=512,
184
  temperature=0.7,
185
+ do_sample=True,
186
  top_p=0.95,
187
  repetition_penalty=1.15
188
  )