I have used this model with transformers library & Gradio. The model is taking forever to respond even for smaller input. Am I missing anything? Below is my app.py

import gradio as gr

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("text-generation", model="AGBonnet/medinote-7b", max_new_tokens=1024)

Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AGBonnet/medinote-7b")
model = AutoModelForCausalLM.from_pretrained("AGBonnet/medinote-7b")

def launch(input):
out = pipe(input)
return out[0]['generated_text']

iface = gr.Interface(launch,
inputs=gr.Textbox(),
outputs="text")

iface.launch()

AGBonnet
/

medinote-7b

Model is running forever

Use a pipeline as a high-level helper

Load model directly