README.md · Hemanth-thunder/Tamil-Mistral-7B-Instruct-v0.1 at b4b068ad71da70aade28a2ca4accf89b7eaf5e7c

Tamil-Mistral-7B-Instruct-v0.1 / README.md

Hemanth-thunder

Update README.md

504c496 verified 7 months ago

preview code

raw

history blame

6.46 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: Hemanth-thunder/Tamil-Mistral-7B-v0.1
	Pretrain_Model: mistralai/Mistral-7B-v0.1
	tags:
	- Mistral
	- instruct
	- finetune
	- chatml
	- DPO
	- RLHF
	- gpt4
	- synthetic data
	- distillation
	- function calling
	- json mode
	datasets:
	- Hemanth-thunder/tamil-instruction
	language:
	- ta
	widget:
	- example_title: Tamil Chat with LLM
	messages:
	- role: system
	content: >-
	சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, வழங்கப்பட்ட வழிகாட்டுதல்களைப் பின்பற்றி, தேவையான தகவலை உள்ளிடவும்.
	- role: user
	content: >-
	மூன்று இயற்கை கூறுகளை குறிப்பிடவும்.
	---

	# Model Card for Tamil-Mistral-7B-Instruct-v0.1

	The Tamil-Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of [Tamil-Mistral-7B-Instruct-v0.1](https://huggingface.co/Hemanth-thunder/Tamil-Mistral-7B-Instruct-v0.1).
	Tamil LLM: A Breakthrough in Tamil Language Understanding In the realm of language models, the fine-tuned Tamil Mistral model represents a significant advancement. Unlike its English counterpart, the Tamil Mistral model is specifically tailored to comprehend and generate text in the Tamil language. This innovation addresses a critical gap, as the English Mistral model fails to effectively engage with Tamil, a language rich in culture and heritage. Through extensive fine-tuning with a base Tamil Mistral model, this iteration has been meticulously enhanced to grasp the nuances and intricacies of the Tamil language. As a result, we are delighted to present a revolutionary model that enables seamless interaction through text. Welcome to the future of conversational Tamil language processing with our instructive model.

	# Dataset
	alpaca dataset (400k) instruction google translated

	# Training time
	18 hrs to train on NVIDIA RTX A6000 48GB with batch size of 30

	## Kaggle demo link
	https://www.kaggle.com/code/hemanthkumar21/tamil-mistral-instruct-v0-1-demo/

	```python
	from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,TextStreamer,pipeline)
	import torch
	model_name = "Hemanth-thunder/Tamil-Mistral-7B-Instruct-v0.1"
	nf4_config = BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_quant_type="nf4",bnb_4bit_use_double_quant=True,
	bnb_4bit_compute_dtype=torch.bfloat16
	)
	model = AutoModelForCausalLM.from_pretrained(model_name,device_map='auto',quantization_config=nf4_config,use_cache=False,low_cpu_mem_usage=True )
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "right"
	streamer = TextStreamer(tokenizer)
	pipe = pipeline("text-generation" ,model=model, tokenizer=tokenizer ,do_sample=True, repetition_penalty=1.15,top_p=0.95,streamer=streamer)
	prompt = create_prompt("வாழ்க்கையில் ஆரோக்கியமாக இருப்பது எப்படி?")
	result=pipe(prompt,max_length=512,pad_token_id=tokenizer.eos_token_id)

	```
	```
	result:
	- உடற்பயிற்சி - ஆரோக்கியமான உணவை உண்ணுங்கள் -2 புகைபிடிக்காதே - தவறாமல் உடற்பயிற்சி செய்</s>
	```
	## Instruction format

	To harness the power of instruction fine-tuning, your prompt must be encapsulated within <s> and </s> tokens. This instructional format revolves around three key elements: Instruction, Input, and Response. The Tamil Mistral instruct model is adept at engaging in conversations based on this structured template.
	E.g.


	```
	# without Input
	prompt_template =<s>"""சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, தேவையான தகவலை உள்ளிடவும்.

	### Instruction:
	{}

	### Response:"""

	# with Input
	prompt_template =<s>"""சரியான பதிலுடன் வேலையை வெற்றிகரமாக முடிக்க, வழங்கப்பட்ட வழிகாட்டுதல்களைப் பின்பற்றி, தேவையான தகவலை உள்ளிடவும்.

	### Instruction:
	{}

	### Input:
	{}

	### Response:"""

	```

	## Python function to format query
	```python
	def create_prompt(query,prompt_template=prompt_template):
	bos_token = "<s>"
	eos_token = "</s>"
	if query:
	prompt_template = prompt_template.format(query)
	else:
	raise "Please input with query"
	prompt = bos_token+prompt_template #eos_token
	return prompt
	```

	## Model Architecture
	This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices:
	- Grouped-Query Attention
	- Sliding-Window Attention
	- Byte-fallback BPE tokenizer

	## Troubleshooting
	- If you see the following error:
	```
	Traceback (most recent call last):
	File "", line 1, in
	File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
	config, kwargs = AutoConfig.from_pretrained(
	File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained
	config_class = CONFIG_MAPPING[config_dict["model_type"]]
	File "/transformers/models/auto/configuration_auto.py", line 723, in getitem
	raise KeyError(key)
	KeyError: 'mistral'
	```

	Installing transformers from source should solve the issue
	pip install git+https://github.com/huggingface/transformers

	This should not be required after transformers-v4.33.4.

	## Limitations

	The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
	It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
	make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.


	## Quantized Versions:
	coming s00n

	# How to Cite

	```bibtext
	@misc{Tamil-Mistral-7B-Instruct-v0.1,
	url={[https://huggingface.co/Hemanth-thunder/Tamil-Mistral-7B-Instruct-v0.1]https://huggingface.co/Hemanth-thunder/Tamil-Mistral-7B-Instruct-v0.1)},
	title={Tamil-Mistral-7B-Instruct-v0.1},
	author={"hemanth kumar"}
	}
	```