Jais-7b-chat (Its a double quantized version)

This model is the double quantized version of jais-13b-chat by core42. The aim is to run the model in GPU poor machines. For high quality tasks its better to use the 13b model not quantized one.

Model creator: Core42

Original model: jais-13b-chat

How To Run

Just run it as a text-generation pipeline task.

System Requirements:

It successfully has been tested on Google Colab Pro T4 instance.

How To Run:

  1. First install libs:
pip install -Uq huggingface_hub transformers bitsandbytes xformers accelerate
  1. Create the pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer, BitsAndBytesConfig

tokenizer = AutoTokenizer.from_pretrained("erfanvaredi/jais-7b-chat")
model = AutoModelForCausalLM.from_pretrained(
    "erfanvaredi/jais-7b-chat",
    trust_remote_code=True,
    device_map='auto',
)

# Create a pipeline
pipe = pipeline(model=model, tokenizer=tokenizer, task='text-generation')
  1. Create prompt:
chat = [
    {"role": "user", "content": 'Tell me a funny joke about Large Language Models.'},
]
prompt = pipe.tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
  1. Create streamer (Its optional. If u want to have generated texts as stream, do it else it does'nt matter):
streamer = TextStreamer(
    tokenizer,
    skip_prompt=True,
    stop_token=[tokenizer.eos_token]
)
  1. Ask the model:
pipe(
  prompt,
  streamer=streamer,
  max_new_tokens=256,
  temperature=0,
)

:)

Downloads last month
369
Safetensors
Model size
6.93B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.