Initial GPTQ model commit
Browse files
README.md
CHANGED
@@ -29,14 +29,6 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
|
31 |
|
32 |
-
## Prompt template
|
33 |
-
|
34 |
-
```
|
35 |
-
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions
|
36 |
-
###Human: prompt
|
37 |
-
###Assistant:
|
38 |
-
```
|
39 |
-
|
40 |
## How to easily download and use this model in text-generation-webui
|
41 |
|
42 |
Please make sure you're using the latest version of text-generation-webui
|
@@ -66,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
|
66 |
import argparse
|
67 |
|
68 |
model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
|
69 |
-
model_basename = "robin-65b-GPTQ-4bit--1g.
|
70 |
|
71 |
use_triton = False
|
72 |
|
@@ -82,8 +74,8 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
82 |
|
83 |
# Note: check the prompt template is correct for this model.
|
84 |
prompt = "Tell me about AI"
|
85 |
-
prompt_template=f'''###Human: {prompt}
|
86 |
-
###Assistant:'''
|
87 |
|
88 |
print("\n\n*** Generate:")
|
89 |
|
@@ -112,17 +104,17 @@ print(pipe(prompt_template)[0]['generated_text'])
|
|
112 |
|
113 |
## Provided files
|
114 |
|
115 |
-
**robin-65b-GPTQ-4bit--1g.
|
116 |
|
117 |
This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
118 |
|
|
|
119 |
|
120 |
-
|
121 |
-
* `robin-65b-GPTQ-4bit--1g.no-act.order.safetensors`
|
122 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
123 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
124 |
* Works with text-generation-webui, including one-click-installers.
|
125 |
-
* Parameters: Groupsize = -1. Act Order / desc_act =
|
126 |
|
127 |
<!-- footer start -->
|
128 |
## Discord
|
|
|
29 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/robin-65B-v2-GGML)
|
30 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/robin-65b-v2-fp16)
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
## How to easily download and use this model in text-generation-webui
|
33 |
|
34 |
Please make sure you're using the latest version of text-generation-webui
|
|
|
58 |
import argparse
|
59 |
|
60 |
model_name_or_path = "TheBloke/robin-65B-v2-GPTQ"
|
61 |
+
model_basename = "robin-65b-GPTQ-4bit--1g.act.order"
|
62 |
|
63 |
use_triton = False
|
64 |
|
|
|
74 |
|
75 |
# Note: check the prompt template is correct for this model.
|
76 |
prompt = "Tell me about AI"
|
77 |
+
prompt_template=f'''### Human: {prompt}
|
78 |
+
### Assistant:'''
|
79 |
|
80 |
print("\n\n*** Generate:")
|
81 |
|
|
|
104 |
|
105 |
## Provided files
|
106 |
|
107 |
+
**robin-65b-GPTQ-4bit--1g.act.order.safetensors**
|
108 |
|
109 |
This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
110 |
|
111 |
+
It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
|
112 |
|
113 |
+
* `robin-65b-GPTQ-4bit--1g.act.order.safetensors`
|
|
|
114 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
115 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
116 |
* Works with text-generation-webui, including one-click-installers.
|
117 |
+
* Parameters: Groupsize = -1. Act Order / desc_act = True.
|
118 |
|
119 |
<!-- footer start -->
|
120 |
## Discord
|