Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
d831a2c
·
1 Parent(s): cef32fd

Update for Transformers GPTQ support

Browse files
README.md CHANGED
@@ -11,17 +11,20 @@ pipeline_tag: text-generation
11
  ---
12
 
13
  <!-- header start -->
14
- <div style="width: 100%;">
15
- <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
 
16
  </div>
17
  <div style="display: flex; justify-content: space-between; width: 100%;">
18
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
19
- <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
20
  </div>
21
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
22
- <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
23
  </div>
24
  </div>
 
 
25
  <!-- header end -->
26
 
27
  # Pankaj Mathur's Orca Mini v2 13B GPTQ
@@ -180,6 +183,7 @@ The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLa
180
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
181
 
182
  <!-- footer start -->
 
183
  ## Discord
184
 
185
  For further support, and discussions on these models and AI in general, join us at:
@@ -199,12 +203,15 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
199
  * Patreon: https://patreon.com/TheBlokeAI
200
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
201
 
202
- **Special thanks to**: Luke from CarbonQuill, Aemon Algiz.
 
 
203
 
204
- **Patreon special mentions**: Space Cruiser, Nikolai Manek, Sam, Chris McCloskey, Rishabh Srivastava, Kalila, Spiking Neurons AB, Khalefa Al-Ahmad, WelcomeToTheClub, Chadd, Lone Striker, Viktor Bowallius, Edmond Seymore, Ai Maven, Chris Smitley, Dave, Alexandros Triantafyllidis, Luke @flexchar, Elle, ya boyyy, Talal Aujan, Alex , Jonathan Leane, Deep Realms, Randy H, subjectnull, Preetika Verma, Joseph William Delisle, Michael Levine, chris gileta, K, Oscar Rangel, LangChain4j, Trenton Dambrowitz, Eugene Pentland, Johann-Peter Hartmann, Femi Adebogun, Illia Dulskyi, senxiiz, Daniel P. Andersen, Sean Connelly, Artur Olbinski, RoA, Mano Prime, Derek Yates, Raven Klaugh, David Flickinger, Willem Michiel, Pieter, Willian Hasse, vamX, Luke Pendergrass, webtim, Ghost , Rainer Wilmers, Nathan LeClaire, Will Dee, Cory Kujawski, John Detwiler, Fred von Graf, biorpg, Iucharbius , Imad Khwaja, Pierre Kircher, terasurfer , Asp the Wyvern, John Villwock, theTransient, zynix , Gabriel Tamborski, Fen Risland, Gabriel Puliatti, Matthew Berman, Pyrater, SuperWojo, Stephen Murray, Karl Bernard, Ajan Kanaga, Greatston Gnanesh, Junyu Yang.
205
 
206
  Thank you to all my generous patrons and donaters!
207
 
 
 
208
  <!-- footer end -->
209
 
210
  # Original model card: Pankaj Mathur's Orca Mini v2 13B
@@ -220,7 +227,7 @@ Please note this model has *better code generation capabilities* compare to our
220
 
221
  # Evaluation
222
 
223
- I evaluated orca_mini_v2_13b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
224
 
225
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
226
 
@@ -325,12 +332,12 @@ model = LlamaForCausalLM.from_pretrained(
325
 
326
  #generate text function
327
  def generate_text(system, instruction, input=None):
328
-
329
  if input:
330
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
331
  else:
332
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
333
-
334
  tokens = tokenizer.encode(prompt)
335
  tokens = torch.LongTensor(tokens).unsqueeze(0)
336
  tokens = tokens.to('cuda')
@@ -340,14 +347,14 @@ def generate_text(system, instruction, input=None):
340
  length = len(tokens[0])
341
  with torch.no_grad():
342
  rest = model.generate(
343
- input_ids=tokens,
344
- max_length=length+instance['generate_len'],
345
- use_cache=True,
346
- do_sample=True,
347
  top_p=instance['top_p'],
348
  temperature=instance['temperature'],
349
  top_k=instance['top_k']
350
- )
351
  output = rest[0][length:]
352
  string = tokenizer.decode(output, skip_special_tokens=True)
353
  return f'[!] Response: {string}'
@@ -406,7 +413,7 @@ If you found wizardlm_alpaca_dolly_orca_open_llama_7b useful in your research or
406
  ```
407
  ```
408
  @misc{mukherjee2023orca,
409
- title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
410
  author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
411
  year={2023},
412
  eprint={2306.02707},
@@ -453,7 +460,7 @@ If you found wizardlm_alpaca_dolly_orca_open_llama_7b useful in your research or
453
  ```
454
  ```
455
  @misc{xu2023wizardlm,
456
- title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
457
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
458
  year={2023},
459
  eprint={2304.12244},
 
11
  ---
12
 
13
  <!-- header start -->
14
+ <!-- 200823 -->
15
+ <div style="width: auto; margin-left: auto; margin-right: auto">
16
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
17
  </div>
18
  <div style="display: flex; justify-content: space-between; width: 100%;">
19
  <div style="display: flex; flex-direction: column; align-items: flex-start;">
20
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://discord.gg/theblokeai">Chat & support: TheBloke's Discord server</a></p>
21
  </div>
22
  <div style="display: flex; flex-direction: column; align-items: flex-end;">
23
+ <p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
24
  </div>
25
  </div>
26
+ <div style="text-align:center; margin-top: 0em; margin-bottom: 0em"><p style="margin-top: 0.25em; margin-bottom: 0em;">TheBloke's LLM work is generously supported by a grant from <a href="https://a16z.com">andreessen horowitz (a16z)</a></p></div>
27
+ <hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
28
  <!-- header end -->
29
 
30
  # Pankaj Mathur's Orca Mini v2 13B GPTQ
 
183
  ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
184
 
185
  <!-- footer start -->
186
+ <!-- 200823 -->
187
  ## Discord
188
 
189
  For further support, and discussions on these models and AI in general, join us at:
 
203
  * Patreon: https://patreon.com/TheBlokeAI
204
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
205
 
206
+ **Special thanks to**: Aemon Algiz.
207
+
208
+ **Patreon special mentions**: Sam, theTransient, Jonathan Leane, Steven Wood, webtim, Johann-Peter Hartmann, Geoffrey Montalvo, Gabriel Tamborski, Willem Michiel, John Villwock, Derek Yates, Mesiah Bishop, Eugene Pentland, Pieter, Chadd, Stephen Murray, Daniel P. Andersen, terasurfer, Brandon Frisco, Thomas Belote, Sid, Nathan LeClaire, Magnesian, Alps Aficionado, Stanislav Ovsiannikov, Alex, Joseph William Delisle, Nikolai Manek, Michael Davis, Junyu Yang, K, J, Spencer Kim, Stefan Sabev, Olusegun Samson, transmissions 11, Michael Levine, Cory Kujawski, Rainer Wilmers, zynix, Kalila, Luke @flexchar, Ajan Kanaga, Mandus, vamX, Ai Maven, Mano Prime, Matthew Berman, subjectnull, Vitor Caleffi, Clay Pascal, biorpg, alfie_i, 阿明, Jeffrey Morgan, ya boyyy, Raymond Fosdick, knownsqashed, Olakabola, Leonard Tan, ReadyPlayerEmma, Enrico Ros, Dave, Talal Aujan, Illia Dulskyi, Sean Connelly, senxiiz, Artur Olbinski, Elle, Raven Klaugh, Fen Risland, Deep Realms, Imad Khwaja, Fred von Graf, Will Dee, usrbinkat, SuperWojo, Alexandros Triantafyllidis, Swaroop Kallakuri, Dan Guido, John Detwiler, Pedro Madruga, Iucharbius, Viktor Bowallius, Asp the Wyvern, Edmond Seymore, Trenton Dambrowitz, Space Cruiser, Spiking Neurons AB, Pyrater, LangChain4j, Tony Hughes, Kacper Wikieł, Rishabh Srivastava, David Ziegler, Luke Pendergrass, Andrey, Gabriel Puliatti, Lone Striker, Sebastain Graf, Pierre Kircher, Randy H, NimbleBox.ai, Vadim, danny, Deo Leter
209
 
 
210
 
211
  Thank you to all my generous patrons and donaters!
212
 
213
+ And thank you again to a16z for their generous grant.
214
+
215
  <!-- footer end -->
216
 
217
  # Original model card: Pankaj Mathur's Orca Mini v2 13B
 
227
 
228
  # Evaluation
229
 
230
+ I evaluated orca_mini_v2_13b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
231
 
232
  Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
233
 
 
332
 
333
  #generate text function
334
  def generate_text(system, instruction, input=None):
335
+
336
  if input:
337
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
338
  else:
339
  prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
340
+
341
  tokens = tokenizer.encode(prompt)
342
  tokens = torch.LongTensor(tokens).unsqueeze(0)
343
  tokens = tokens.to('cuda')
 
347
  length = len(tokens[0])
348
  with torch.no_grad():
349
  rest = model.generate(
350
+ input_ids=tokens,
351
+ max_length=length+instance['generate_len'],
352
+ use_cache=True,
353
+ do_sample=True,
354
  top_p=instance['top_p'],
355
  temperature=instance['temperature'],
356
  top_k=instance['top_k']
357
+ )
358
  output = rest[0][length:]
359
  string = tokenizer.decode(output, skip_special_tokens=True)
360
  return f'[!] Response: {string}'
 
413
  ```
414
  ```
415
  @misc{mukherjee2023orca,
416
+ title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
417
  author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
418
  year={2023},
419
  eprint={2306.02707},
 
460
  ```
461
  ```
462
  @misc{xu2023wizardlm,
463
+ title={WizardLM: Empowering Large Language Models to Follow Complex Instructions},
464
  author={Can Xu and Qingfeng Sun and Kai Zheng and Xiubo Geng and Pu Zhao and Jiazhan Feng and Chongyang Tao and Daxin Jiang},
465
  year={2023},
466
  eprint={2304.12244},
config.json CHANGED
@@ -1,24 +1,34 @@
1
  {
2
- "_name_or_path": "/workspace/models/llama-13b",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "bos_token_id": 1,
7
- "eos_token_id": 2,
8
- "hidden_act": "silu",
9
- "hidden_size": 5120,
10
- "initializer_range": 0.02,
11
- "intermediate_size": 13824,
12
- "max_position_embeddings": 2048,
13
- "max_sequence_length": 2048,
14
- "model_type": "llama",
15
- "num_attention_heads": 40,
16
- "num_hidden_layers": 40,
17
- "pad_token_id": 0,
18
- "rms_norm_eps": 1e-06,
19
- "tie_word_embeddings": false,
20
- "torch_dtype": "bfloat16",
21
- "transformers_version": "4.28.1",
22
- "use_cache": true,
23
- "vocab_size": 32000
 
 
 
 
 
 
 
 
 
 
24
  }
 
1
  {
2
+ "_name_or_path": "/workspace/models/llama-13b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 13824,
12
+ "max_position_embeddings": 2048,
13
+ "max_sequence_length": 2048,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 40,
17
+ "pad_token_id": 0,
18
+ "rms_norm_eps": 1e-06,
19
+ "tie_word_embeddings": false,
20
+ "torch_dtype": "bfloat16",
21
+ "transformers_version": "4.28.1",
22
+ "use_cache": true,
23
+ "vocab_size": 32000,
24
+ "quantization_config": {
25
+ "bits": 4,
26
+ "group_size": 128,
27
+ "damp_percent": 0.01,
28
+ "desc_act": false,
29
+ "sym": true,
30
+ "true_sequential": true,
31
+ "model_file_base_name": "model",
32
+ "quant_method": "gptq"
33
+ }
34
  }
orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order.safetensors → model.safetensors RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d2aa7b2dad267d3dd911716302daee1056bf46087d134490f255287c8bc3be3
3
- size 7454797240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c19e053bf4ebb4f9c4f2c8530d01cd8113fc7fdf4e03806ccd21294550e71cc3
3
+ size 7454797304
quantize_config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "bits": 4,
3
- "group_size": 128,
4
- "damp_percent": 0.01,
5
- "desc_act": false,
6
- "sym": true,
7
- "true_sequential": true
 
8
  }
 
1
  {
2
+ "bits": 4,
3
+ "group_size": 128,
4
+ "damp_percent": 0.01,
5
+ "desc_act": false,
6
+ "sym": true,
7
+ "true_sequential": true,
8
+ "model_file_base_name": "model"
9
  }