Jais-7b-chat (Its a double quantized version)

This model is the double quantized version of jais-13b-chat by core42. The aim is to run the model in GPU poor machines. For high quality tasks its better to use the 13b model not quantized one.

Model creator: Core42

Original model: jais-13b-chat

How To Run

Just run it as a text-generation pipeline task.

System Requirements:

It successfully has been tested on Google Colab Pro T4 instance.

How To Run:

First install libs:

pip install -Uq huggingface_hub transformers bitsandbytes xformers accelerate

Create the pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer, BitsAndBytesConfig

tokenizer = AutoTokenizer.from_pretrained("erfanvaredi/jais-7b-chat")
model = AutoModelForCausalLM.from_pretrained(
    "erfanvaredi/jais-7b-chat",
    trust_remote_code=True,
    device_map='auto',
)

# Create a pipeline
pipe = pipeline(model=model, tokenizer=tokenizer, task='text-generation')

Create prompt:

chat = [
    {"role": "user", "content": 'Tell me a funny joke about Large Language Models.'},
]
prompt = pipe.tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

Create streamer (Its optional. If u want to have generated texts as stream, do it else it does'nt matter):

streamer = TextStreamer(
    tokenizer,
    skip_prompt=True,
    stop_token=[tokenizer.eos_token]
)

Ask the model:

pipe(
  prompt,
  streamer=streamer,
  max_new_tokens=256,
  temperature=0,
)