why TheBloke/guanaco-65B-GPTQ run slow even on 80GB GPU

#17

by balajivantari - opened Jun 10, 2023

Discussion

balajivantari

Jun 10, 2023

the code i tried is given below...

even tried with langchain still it's too slow,,,,

can you tell me how to make this model work faster. ........

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit.act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

prompt = "who is the first president of US"
prompt_template=f'''### Instruction: {prompt}

Response:'''

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Thankyou;

balajivantari

Jun 12, 2023

hlo bro, can you please tell me how to make it fast

balu548411

Jun 12, 2023

Hey Bloke, can please give a clarification

TheBloke

Owner Jun 12, 2023

•

edited Jun 12, 2023

It may be because the AutoGPTQ CUDA extension hasn't built. Show the full output you see when running the script.

balajivantari

Jun 12, 2023

code i tried and the output i got, it took around 10 min+

can you plz tell me how to fix this

TheBloke

Owner Jun 12, 2023

Please show me the output of:

!python -c 'import torch ; print(torch.__version__) ; print(torch.cuda.is_available())'
!nvidia-smi

balajivantari

Jun 12, 2023

yes man here is that

TheBloke

Owner Jun 12, 2023

I can't see any obvious problems there. However you're using the latest development version of AutoGPTQ, which has not been as well tested.

Please use version 0.2.2:

!pip uninstall auto-gptq
!pip install auto-gptq==0.2.2

Run that, then run the following test script:

import torch
import autogptq_cuda

print(torch.__version__)
print(torch.cuda.is_available())
print(autogptq_cuda)

And show me output.

balu548411

Jun 12, 2023

here its bro

balu548411

Jun 12, 2023

i think problem is with installation am ryt, can you tell me the crt procedure of installation of autogptq

TheBloke

Owner Jun 12, 2023

Try building from source:

!git clone https://github.com/PanQiWei/AutoGPTQ
!cd AutoGPTQ && git checkout v0.2.1 && pip install .

Then test again

binyue

Jun 15, 2023

is it solved? 解决了吗？TheBloke 你本地实验的运行速度大概是多少？一秒几个token

jackshan

Aug 12, 2023

it is solved

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment