|
--- |
|
license: cc |
|
language: |
|
- fa |
|
- en |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
inference: false |
|
|
|
|
|
|
|
|
|
metrics: |
|
- bleu |
|
- comet |
|
- accuracy |
|
- perplexity |
|
- spearmanr |
|
pipeline_tag: text-generation |
|
co2_eq_emissions: |
|
emissions: 232380 |
|
--- |
|
|
|
|
|
<img src="PersianMind.jpg" alt="PersianMind logo" width=200/> |
|
|
|
|
|
# <span style="font-variant:small-caps;">PersianMind</span> |
|
|
|
<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model. |
|
The model achieves state-of-the-art results on Persian subset of the [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) benchmark |
|
and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task. |
|
It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task. |
|
|
|
### Model Description |
|
|
|
- **Developed by:** [Pedram Rostami](mailto:[email protected]), [Ali Salemi](mailto:[email protected]), and [Mohammad Javad Dousti](mailto:[email protected]) |
|
- **Model type:** Language model |
|
- **Languages:** English and Persian |
|
- **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (non-commercial use only.) |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
Note that you need to install <code><b>sentencepiece</b></code> and <code><b>accelerate</b></code> libraries along with <code><b>PyTorch</b></code> and <code><b>🤗Transformers</b></code> to run this code. |
|
|
|
```python |
|
from transformers import LlamaTokenizer, LlamaForCausalLM |
|
import torch |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
# Generate an access token for your account as explained in |
|
# https://huggingface.co/docs/transformers.js/guides/private |
|
access_token = "hf_..." |
|
model = LlamaForCausalLM.from_pretrained( |
|
"universitytehran/PersianMind-v1.0", |
|
torch_dtype=torch.bfloat16, |
|
low_cpu_mem_usage=True, |
|
device_map={"": device}, |
|
token=access_token, |
|
) |
|
tokenizer = LlamaTokenizer.from_pretrained( |
|
"universitytehran/PersianMind-v1.0", |
|
) |
|
|
|
TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: " |
|
CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \ |
|
"NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \ |
|
"providing recommendations, and helping with decision making. You can ask it anything you want and " \ |
|
"it will do its best to give you accurate and relevant information." |
|
PROMPT = "در مورد هوش مصنوعی توضیح بده." |
|
|
|
model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT) |
|
input_tokens = tokenizer(model_input, return_tensors="pt") |
|
input_tokens = input_tokens.to(device) |
|
generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1) |
|
model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] |
|
|
|
print(model_output[len(model_input):]) |
|
``` |
|
|
|
### How to Quantize the Model |
|
|
|
Quantized models can be run on resource-constrained devices. |
|
To quantize the model, you should install the <code><b>bitsandbytes</b></code> library. |
|
In order to quantize the model in 8-bit (`INT8`), use the code below. |
|
|
|
```python |
|
model = LlamaForCausalLM.from_pretrained( |
|
"universitytehran/PersianMind-v1.0", |
|
device_map="auto", |
|
low_cpu_mem_usage=True, |
|
load_in_8bit=True |
|
) |
|
``` |
|
|
|
Alternatively, you can quantize the model in 4-bit (`NormalFloat4`) with the following code. |
|
|
|
```python |
|
from transformers import BitsAndBytesConfig |
|
|
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
) |
|
model = LlamaForCausalLM.from_pretrained( |
|
"universitytehran/PersianMind-v1.0", |
|
quantization_config=quantization_config, |
|
device_map="auto" |
|
) |
|
``` |
|
|
|
### Evaluating Quantized Models |
|
|
|
| Model | <span style="font-variant:small-caps;">Belebele</span> (Persian) | Fa→En Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | En→Fa Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | Model Size | Tokens/sec | |
|
| :----------------------------------------------------------------: | :--------------------------------------------------------------: | :------------------------------------------------------------------------: | :------------------------------------------------------------------------: | :--------: | :--------: | |
|
| <span style="font-variant:small-caps;">PersianMind</span> (`BF16`) | 73.9 | 83.61 | 79.44 | 13.7G | 25.35 | |
|
| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 | |
|
| <span style="font-variant:small-caps;">PersianMind</span> (`NF4`) | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 | |
|
|
|
We evaluated quantized models in various tasks against the original model. |
|
Specifically, we evaluated all models using the reading comprehension multiple-choice |
|
question-answering benchmark of [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model. |
|
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks. |
|
For this, we utilized the Persian-English subset of the [<span style="font-variant:small-caps;">Flores</span>-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and |
|
reported our results using the <span style="font-variant:small-caps;">Comet</span> metric. |
|
Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks. |
|
To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint()` function. |
|
|
|
## License |
|
<span style="font-variant:small-caps;">PersianMind</span> is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE). |
|
It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model. |
|
Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page. |
|
If you suspect any violations, please reach out to us. |
|
|
|
|
|
## Citation |
|
|
|
If you find the following model helpful, please ensure to cite the following paper. |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@misc{persianmind, |
|
title={{PersianMind: A Cross-Lingual Persian-English Large Language Model}}, |
|
author={Rostami, Pedram and Salemi, Ali and Dousti, Mohammad Javad}, |
|
year={2024} |
|
eprint={2401.06466}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |