TheBloke
/

robin-65B-v2-GPTQ

@@ -29,14 +29,6 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
-## Prompt template
-```
-A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions
-###Human: prompt
-###Assistant:
-```
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
@@ -66,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
-model_basename = "robin-65b-GPTQ-4bit--1g.no-act.order"
 use_triton = False
@@ -82,8 +74,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
 # Note: check the prompt template is correct for this model.
 prompt = "Tell me about AI"
-prompt_template=f'''###Human: {prompt}
-###Assistant:'''
 print("\n\n*** Generate:")
@@ -112,17 +104,17 @@ print(pipe(prompt_template)[0]['generated_text'])
 ## Provided files
-**robin-65b-GPTQ-4bit--1g.no-act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
-* `robin-65b-GPTQ-4bit--1g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
-  * Parameters: Groupsize = -1. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord

 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
 ## How to easily download and use this model in text-generation-webui
 Please make sure you're using the latest version of text-generation-webui
 import argparse
 model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
+model_basename = "robin-65b-GPTQ-4bit--1g.act.order"
 use_triton = False
 # Note: check the prompt template is correct for this model.
 prompt = "Tell me about AI"
+prompt_template=f'''### Human: {prompt}
+### Assistant:'''
 print("\n\n*** Generate:")
 ## Provided files
+**robin-65b-GPTQ-4bit--1g.act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
+* `robin-65b-GPTQ-4bit--1g.act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = -1. Act Order / desc_act = True.
 <!-- footer start -->
 ## Discord