Spaces:
Running
Running
Gradio chatbot with AutoModelForCausalLM, AutoTokenizer and @spaces.GPU() (ZeroGPU space)
#3
by
schuler
- opened
Hello.
Congrats for your work. I'm using it.
It would be fantastic to have a template that uses HF Autoloaders + ZeroGPU spaces. My own custom coded LLM model doesn't work with InferenceClient
. This is the model that I'm using:
REPO_NAME = 'schuler/experimental-JP47D21-KPhi-3-micro-4k-instruct'
def load_model(repo_name):
tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
generator_conf = GenerationConfig.from_pretrained(repo_name)
model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="eager")
# model.to('cuda')
return tokenizer, generator_conf, model
tokenizer, generator_conf, model = load_model(REPO_NAME)
global_error = ''
try:
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
except Exception as e:
global_error = f"Failed to load model: {str(e)}"
As an example, I'm not able to make this space to work with ZeroGPU:
https://huggingface.co/spaces/schuler/experimental-kphi-3-nano-4k-instruct-gradio-autoloader