Gradio chatbot with AutoModelForCausalLM, AutoTokenizer and @spaces.GPU() (ZeroGPU space)

#3
by schuler - opened

Hello.

Congrats for your work. I'm using it.

It would be fantastic to have a template that uses HF Autoloaders + ZeroGPU spaces. My own custom coded LLM model doesn't work with InferenceClient. This is the model that I'm using:

REPO_NAME = 'schuler/experimental-JP47D21-KPhi-3-micro-4k-instruct'

def load_model(repo_name):
    tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
    generator_conf = GenerationConfig.from_pretrained(repo_name)
    model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="eager")
    # model.to('cuda')
    return tokenizer, generator_conf, model

tokenizer, generator_conf, model = load_model(REPO_NAME)
global_error = ''
try:
  generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
except Exception as e:
  global_error =  f"Failed to load model: {str(e)}"

As an example, I'm not able to make this space to work with ZeroGPU:
https://huggingface.co/spaces/schuler/experimental-kphi-3-nano-4k-instruct-gradio-autoloader

Sign up or log in to comment