yam-peleg
/

Hebrew-Gemma-11B-Instruct

@@ -12,69 +12,45 @@ library_name: transformers
 - **Base Model:** [Hebrew-Gemma-11B](https://huggingface.co/yam-peleg/Hebrew-Gemma-11B)
 - **Instruct Model:** [Hebrew-Gemma-11B-Instruct](https://huggingface.co/yam-peleg/Hebrew-Gemma-11B-Instruct)
-Hebrew-Gemma-11B is an open-source Large Language Model (LLM) is a hebrew/english pretrained generative text model with 11 billion parameters, based on the Gemma-7B architecture from Google.
 It is continued pretrain of gemma-7b, extended to a larger scale and trained on 3B additional tokens of both English and Hebrew text data.
-The resulting model Gemma-11B is a powerful general-purpose language model suitable for a wide range of natural language processing tasks, with a focus on Hebrew language understanding and generation.
-### Terms of Use
-As an extention of Gemma-7B, this model is subject to the original license and terms of use by Google.
-**Gemma-7B original Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
-### Usage
-Below are some code snippets on how to get quickly started with running the model.
-First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
-### Running on CPU
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct")
-model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct")
-input_text = "שלום! מה שלומך היום?"
-input_ids = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**input_ids)
-print(tokenizer.decode(outputs[0]))
 ```
-### Running on GPU
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct")
-model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct", device_map="auto")
-input_text = "שלום! מה שלומך היום?"
-input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids)
-print(tokenizer.decode(outputs[0]))
 ```
-### Running with 4-Bit precision
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
-tokenizer = AutoTokenizer.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct")
-model = AutoModelForCausalLM.from_pretrained("yam-peleg/Hebrew-Gemma-11B-Instruct", quantization_config = BitsAndBytesConfig(load_in_4bit=True))
-input_text = "שלום! מה שלומך היום?"
-input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
-outputs = model.generate(**input_ids)
-print(tokenizer.decode(outputs[0])
-```
 ### Benchmark Results

 - **Base Model:** [Hebrew-Gemma-11B](https://huggingface.co/yam-peleg/Hebrew-Gemma-11B)
 - **Instruct Model:** [Hebrew-Gemma-11B-Instruct](https://huggingface.co/yam-peleg/Hebrew-Gemma-11B-Instruct)
+The Hebrew-Gemma-11B-Instruct Large Language Model (LLM) is a instruct fine-tuned version of the [Hebrew-Gemma-11B](https://huggingface.co/yam-peleg/Hebrew-Gemma-11B) generative text model using a variety of conversation datasets.
 It is continued pretrain of gemma-7b, extended to a larger scale and trained on 3B additional tokens of both English and Hebrew text data.
+# Instruction format
+This format must be strictly respected, otherwise the model will generate sub-optimal outputs.
+```
+<bos><start_of_turn>user
+Write a hello world program<end_of_turn>
+<start_of_turn>model
+Here is a simple hellow world program<end_of_turn>
+<eos>
 ```
+Each turn is preceded by a <start_of_turn> delimiter and then the role of the entity (either user, for content supplied by the user, or model for LLM responses). Turns finish with the <end_of_turn> token.
+You can follow this format to build the prompt manually, if you need to do it without the tokenizer's chat template.
+A simple example:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "Hebrew-Gemma-11B-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda")
+chat = [
+    { "role": "user", "content": "כתוב קוד פשוט בפייתון שמדפיס למסך את התאריך של היום" },
+]
+prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
 ```
+### Terms of Use
+As an extention of Gemma-7B, this model is subject to the original license and terms of use by Google.
 ### Benchmark Results