Update README.md

Browse files

Files changed (1) hide show

README.md +16 -20

README.md CHANGED Viewed

@@ -38,16 +38,30 @@ Tamil LLM: A Breakthrough in Tamil Language Understanding In the realm of langua
 ## Instruction format
-In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
 E.g.
 ```
 prompt_template =<s>"""சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, வழங்கப்பட்ட வழிகாட்டுதல்களைப் பின்பற்றி, தேவையான தகவலை உள்ளிடவும்.
 ### Instruction:
 {}
 ### Response:"""
 ```
 This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
@@ -55,25 +69,7 @@ This format is available as a [chat template](https://huggingface.co/docs/transf
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-device = "cuda" # the device to load the model onto
-model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
-tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
-messages = [
- {"role": "user", "content": "What is your favourite condiment?"},
- {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
- {"role": "user", "content": "Do you have mayonnaise recipes?"}
-]
-encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
-model_inputs = encodeds.to(device)
-model.to(device)
-generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
-decoded = tokenizer.batch_decode(generated_ids)
-print(decoded[0])
 ```
 ## Python function to format query
 ```python

 ## Instruction format
+To harness the power of instruction fine-tuning, your prompt must be encapsulated within <s> and </s> tokens. This instructional format revolves around three key elements: Instruction, Input, and Response. The Tamil Mistral instruct model is adept at engaging in conversations based on this structured template.
 E.g.
 ```
+# without Input
+prompt_template =<s>"""சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, தேவையான தகவலை உள்ளிடவும்.
+### Instruction:
+{}
+### Response:"""
+# with Input
 prompt_template =<s>"""சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, வழங்கப்பட்ட வழிகாட்டுதல்களைப் பின்பற்றி, தேவையான தகவலை உள்ளிடவும்.
 ### Instruction:
 {}
+### Input:
+{}
 ### Response:"""
 ```
 This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 ```
 ## Python function to format query
 ```python