--- base_model: stabilityai/stablelm-zephyr-3b datasets: - HuggingFaceH4/ultrachat_200k - HuggingFaceH4/ultrafeedback_binarized - meta-math/MetaMathQA - WizardLM/WizardLM_evol_instruct_V2_196k - Intel/orca_dpo_pairs language: - en license: other tags: - causal-lm - openvino - nncf - 4-bit extra_gated_fields: Name: text Email: text Country: text Organization or Affiliation: text I ALLOW Stability AI to email me about new model releases: checkbox model-index: - name: stablelm-zephyr-3b results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 46.08 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 74.16 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 46.17 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 46.49 source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 65.51 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 42.15 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-zephyr-3b name: Open LLM Leaderboard --- This model is a quantized version of [`stabilityai/stablelm-zephyr-3b`](https://huggingface.co/stabilityai/stablelm-zephyr-3b) and is converted to the OpenVINO format. This model was obtained via the [nncf-quantization](https://huggingface.co/spaces/echarlaix/nncf-quantization) space with [optimum-intel](https://github.com/huggingface/optimum-intel). Please note: For commercial use, please refer to https://stability.ai/license. ### Model Description StableLM Zephyr 3B is a 3 billion parameter instruction tuned inspired by [HugginFaceH4's Zephyr 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) training pipeline this model was trained on a mix of publicly available datasets, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), evaluation for this model based on [MT Bench](https://arxiv.org/abs/2306.05685) and [Alpaca Benchmark](https://tatsu-lab.github.io/alpaca_eval/) ### Model Parameters ``` context window = 4096 model type = 3B model params = 2.80 B BOS token = 0 '<|endoftext|>' EOS token = 0 '<|endoftext|>' UNK token = 0 '<|endoftext|>' PAD token = 0 '<|endoftext|>' ``` The tokenizer of this model supports `chat_templates` ### Usage StableLM Zephyr 3B uses the following instruction format: ``` <|user|> List 3 synonyms for the word "tiny"<|endoftext|> <|assistant|> 1. Dwarf 2. Little 3. Petite<|endoftext|> ``` ### Model Details - Developed by: Stability AI - Model type: StableLM Zephyr 3B model is an auto-regressive language model based on the transformer decoder architecture. - Language(s): English - Library: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git) - Finetuned from model: [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t) - License: [StabilityAI Community License](https://huggingface.co/stabilityai/stablelm-zephyr-3b/raw/main/LICENSE.md). - Commercial License: to use this model commercially, please refer to https://stability.ai/license - Contact: For questions and comments about the model, please email lm@stability.ai First make sure you have `optimum-intel` installed: ```bash pip install openvino-genai==2024.4.0 pip install optimum-intel[openvino] ``` To load your model you can do as follows: ```python from optimum.intel import OVModelForCausalLM from transformers import AutoTokenizer, AutoConfig from threading import Thread from transformers import TextIteratorStreamer model_id = "FM-1976/stablelm-zephyr-3b-openvino-4bit" model = OVModelForCausalLM.from_pretrained(model_id) tokenizer = AutoTokenizer.from_pretrained(model_id) ov_model = OVModelForCausalLM.from_pretrained( model_id = model_id, device='CPU', ov_config={"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""}, config=AutoConfig.from_pretrained(model_id) ) # Generation with a prompt message question = 'Explain the plot of Cinderella in a sentence.' messages = [ {"role": "user", "content": question} ] print('Question:', question) #Credit to https://github.com/openvino-dev-samples/chatglm3.openvino/blob/main/chat.py streamer = TextIteratorStreamer(tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True) model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, pad_token_id=tokenizer.eos_token_id, num_return_sequences=1, return_tensors="pt") generate_kwargs = dict(input_ids=model_inputs, max_new_tokens=450, temperature=0.1, do_sample=True, top_p=0.5, repetition_penalty=1.178, streamer=streamer) t1 = Thread(target=ov_model.generate, kwargs=generate_kwargs) t1.start() for new_text in streamer: new_text = new_text print(new_text, end="", flush=True) ```