|
--- |
|
datasets: |
|
- taesiri/TinyStories-Farsi |
|
library_name: transformers |
|
model_name: LLaMA-3.1-8B-Persian-Instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- language-model |
|
- fine-tuned |
|
- instruction-following |
|
- PEFT |
|
- LoRA |
|
- BitsAndBytes |
|
- Persian |
|
- Farsi |
|
- text-generation |
|
--- |
|
|
|
|
|
# LLaMA-3.1-8B-Persian-Instruct |
|
|
|
This model is a fine-tuned version of the `meta-llama/Meta-Llama-3.1-8B-Instruct` model, specifically tailored for generating and understanding Persian text. The fine-tuning was conducted using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset, which includes a diverse set of short stories in Persian. The primary goal of this fine-tuning was to enhance the model's performance in instruction-following tasks within the Persian language. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a fine-tuned version of Llama-3.1-8B-Instruct that meta has released. By training this model on persian short stories, the new model gets to understand the relation between English and Persian in a more meaning full way. |
|
|
|
- **Developed by:** Meta AI |
|
- **Model type:** Language Model |
|
- **License:** Apache 2.0 |
|
- **Base Model:** `meta-llama/Meta-Llama-3.1-8B-Instruct` |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Llama-3.1-8B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
The model was fine-tuned using the [TinyStories-Farsi](https://huggingface.co/datasets/taesiri/TinyStories-Farsi) dataset. This dataset provided a rich and diverse linguistic context, helping the model better understand and generate text in Persian. |
|
|
|
### Training Procedure |
|
The fine-tuning process was conducted using the following setup: |
|
|
|
- **Epochs:** 4 |
|
- **Batch Size:** 8 |
|
- **Gradient Accumulation Steps:** 2 |
|
- **Hardware:** NVIDIA A100 GPU |
|
|
|
### Fine-Tuning Strategy |
|
|
|
To make the fine-tuning process efficient and effective, PEFT (Parameter-Efficient Fine-Tuning) techniques were employed. Specifically, the `BitsAndBytesConfig(load_in_4bit=True)` configuration was used, allowing the model to be fine-tuned in 4-bit precision. This approach significantly reduced the computational resources required while maintaining high performance, resulting in a training time of approximately 2 hours. The use of `BitsAndBytesConfig(load_in_4bit=True)` helped reduce the environmental impact by minimizing the computational resources required. |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is well-suited for generating text in Persian, particularly for instruction-following tasks. It can be used in applications like chatbots, customer support systems, educational tools, and more where accurate and context-aware Persian language generation is needed. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model is not intended for tasks requiring deep reasoning, complex multi-turn conversations, or contexts beyond the immediate prompt. It is also not designed for generating text in languages other than Persian. |
|
|
|
## How to Get Started with the Model |
|
|
|
Here is how you can use this model: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Specify the combined model |
|
model_name = "AmirMohseni/Llama-3.1-8B-Instruct-Persian-finetuned-sft" |
|
|
|
# Load the model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Ensure pad_token is set (if not already set) |
|
if tokenizer.pad_token is None: |
|
tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token}) |
|
|
|
# Check if CUDA is available, otherwise use CPU |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
model = model.to(device) |
|
|
|
# Example usage |
|
input_text = "چطوری میتونم به اطلاعات درباره ی سهام شرکت های آمریکایی دست پیدا کنم؟" |
|
|
|
# Tokenize the input |
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True).to(device) |
|
|
|
# Generate text |
|
outputs = model.generate( |
|
inputs['input_ids'], |
|
attention_mask=inputs['attention_mask'], |
|
max_length=512, |
|
pad_token_id=tokenizer.pad_token_id |
|
) |
|
|
|
# Decode and print the output |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |