TheBloke commited on
Commit
bf39544
1 Parent(s): 88f2299

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +7 -15
README.md CHANGED
@@ -29,14 +29,6 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
31
 
32
- ## Prompt template
33
-
34
- ```
35
- A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions
36
- ###Human: prompt
37
- ###Assistant:
38
- ```
39
-
40
  ## How to easily download and use this model in text-generation-webui
41
 
42
  Please make sure you're using the latest version of text-generation-webui
@@ -66,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
66
  import argparse
67
 
68
  model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
69
- model_basename = "robin-65b-GPTQ-4bit--1g.no-act.order"
70
 
71
  use_triton = False
72
 
@@ -82,8 +74,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
82
 
83
  # Note: check the prompt template is correct for this model.
84
  prompt = "Tell me about AI"
85
- prompt_template=f'''###Human: {prompt}
86
- ###Assistant:'''
87
 
88
  print("\n\n*** Generate:")
89
 
@@ -112,17 +104,17 @@ print(pipe(prompt_template)[0]['generated_text'])
112
 
113
  ## Provided files
114
 
115
- **robin-65b-GPTQ-4bit--1g.no-act.order.safetensors**
116
 
117
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
118
 
 
119
 
120
-
121
- * `robin-65b-GPTQ-4bit--1g.no-act.order.safetensors`
122
  * Works with AutoGPTQ in CUDA or Triton modes.
123
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
124
  * Works with text-generation-webui, including one-click-installers.
125
- * Parameters: Groupsize = -1. Act Order / desc_act = False.
126
 
127
  <!-- footer start -->
128
  ## Discord
 
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
31
 
 
 
 
 
 
 
 
 
32
  ## How to easily download and use this model in text-generation-webui
33
 
34
  Please make sure you're using the latest version of text-generation-webui
 
58
  import argparse
59
 
60
  model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
61
+ model_basename = "robin-65b-GPTQ-4bit--1g.act.order"
62
 
63
  use_triton = False
64
 
 
74
 
75
  # Note: check the prompt template is correct for this model.
76
  prompt = "Tell me about AI"
77
+ prompt_template=f'''### Human: {prompt}
78
+ ### Assistant:'''
79
 
80
  print("\n\n*** Generate:")
81
 
 
104
 
105
  ## Provided files
106
 
107
+ **robin-65b-GPTQ-4bit--1g.act.order.safetensors**
108
 
109
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
110
 
111
+ It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
112
 
113
+ * `robin-65b-GPTQ-4bit--1g.act.order.safetensors`
 
114
  * Works with AutoGPTQ in CUDA or Triton modes.
115
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
116
  * Works with text-generation-webui, including one-click-installers.
117
+ * Parameters: Groupsize = -1. Act Order / desc_act = True.
118
 
119
  <!-- footer start -->
120
  ## Discord