|
--- |
|
library_name: transformers |
|
license: llama3.1 |
|
language: |
|
- en |
|
- fa |
|
tags: |
|
- LLM |
|
- llama3.1 |
|
- PartAI |
|
- conversational |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-Instruct |
|
--- |
|
|
|
# Model Details |
|
|
|
The Dorna models are a family of decoder-only models, specifically trained/fine-tuned on Persian data, developed by [Part AI](https://partdp.ai/). As a new release, an 8B instruct model from this family is being made available. |
|
Dorna2-Llama3.1-8B-Instruct is built using the [Meta Llama 3.1 Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model. |
|
|
|
|
|
## How to use |
|
|
|
To test and use model freely on Hugging Face Spaces click [here](https://huggingface.co/spaces/PartAI/Dorna2-Llama3.1-8B-Instruct)! |
|
|
|
You can also run conversational inference using the Transformers Auto classes with the generate() function. Let's look at an example. |
|
|
|
```Python |
|
import torch |
|
import transformers |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_path, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "system", |
|
"content": "You are a helpful Persian assistant. Please answer questions in the asked language."}, |
|
{"role": "user", "content": "کاغذ A4 بزرگ تر است یا A5؟"}, |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
terminators = [ |
|
tokenizer.eos_token_id, |
|
tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=256, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.3, |
|
top_p=0.85, |
|
) |
|
response = outputs[0][input_ids.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
``` |
|
|
|
You can also use the notebook below to test the model in Google Colab. |
|
|
|
<a href="https://colab.research.google.com/drive/1hcpDFKUabfQBIE4F1eJFoEkoHrF46-yT?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab Code" width="87" height="15"/></a> |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
#### Comparative Evaluation |
|
This evaluation compares **Dorna2-Llama3.1-8B-Instruct**, **Llama3.1-8B-Instruct**, and other fine-tuned **Llama3.1-8B** models. For broader comparisons among various large language models (LLMs), please refer to the [Open Persian LLM Leaderboard](https://huggingface.co/spaces/PartAI/open-persian-llm-leaderboard), which provides a comprehensive evaluation across multiple LLMs. |
|
|
|
|
|
#### Tasks and Evaluation Framework |
|
Five specialized tasks have been carefully curated to evaluate and benchmark the models. Each task has been designed to challenge different aspects of the models' capabilities. These tasks include: |
|
|
|
- **Part Multiple Choice**: Focuses on common knowledge and reasoning in a multiple-choice format. |
|
- **ARC Easy**: Tests on easy-level general knowledge. |
|
- **ARC Challenge**: Assesses models on harder questions requiring advanced reasoning. |
|
- **MMLU Pro**: Covers professional-level exams. |
|
- **AUT Multiple Choice Persian**: Specialized Persian-language examination. |
|
|
|
Each dataset is entirely in Persian, offering a unique and robust testing ground for LLMs in non-English settings. Collectively, the datasets contain over **40k samples**, spanning diverse linguistic and technical challenges such as Common Knowledge, Reasoning, Summarization, and Specialized Examinations. |
|
|
|
### Evaluation Results |
|
|
|
|
|
| **Model** | **Average Accuracy** | **Part Multiple Choice** | **ARC Easy** | **ARC Challenge** | **MMLU Pro** | **AUT Multiple Choice Persian** | |
|
|:---------------------------------------:|:-----------------------:|:-------------------------:|:------------:|:------------------:|:------------:|:-------------------------------:| |
|
| **PartAI/Dorna2-Llama3.1-8B-Instruct** | **50.72** | 34.48 | **79.59** | **64.42** | **21.47** | 53.64 | |
|
| **O1-OPEN/OpenO1-LLama-8B-v0.1** | 50.22 | 34.66 | 77.87 | 63.08 | 21.24 | **54.24** | |
|
| **meta-llama/Llama-3.1-8B-Instruct** | 50.14 | **36.68** | 78.40 | 60.40 | 21.00 | **54.24** | |
|
| **NousResearch/Hermes-3-Llama-3.1-8B** | 48.77 | 35.01 | 77.01 | 58.39 | 21.00 | 52.46 | |
|
| **Skywork/Skywork-o1-Open-Llama-3.1-8B**| 34.15 | 27.02 | 47.12 | 41.61 | 14.55 | 40.43 | |
|
|
|
|
|
|
|
## Contact us |
|
|
|
If you have any questions regarding this model, you can reach us via the [community](https://huggingface.co/PartAI/Dorna-Llama3-8B-Instruct/discussions) on Hugging Face. |