File size: 6,772 Bytes
390be8d
 
 
 
 
 
 
 
 
ba75333
 
 
 
390be8d
 
 
 
 
 
 
ba75333
 
390be8d
 
 
8e0d4a9
ba75333
390be8d
beb0d76
390be8d
beb0d76
da88011
beb0d76
 
390be8d
 
 
 
 
 
ff69d78
390be8d
 
 
beb0d76
 
390be8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
beb0d76
ae8cd0d
 
beb0d76
 
ae8cd0d
 
 
 
 
 
 
 
 
 
beb0d76
ae8cd0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
beb0d76
ae8cd0d
da88011
 
f1da2e1
da88011
 
ae8cd0d
 
 
da88011
ae8cd0d
d8bbdbd
beb0d76
 
 
ae8cd0d
390be8d
beb0d76
b2de090
 
390be8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: cc
language:
- fa
- en
library_name: transformers
tags:
- text-generation-inference
inference: false
# widget:
# - text:
#   output:
#     url: PersianMind.jpg
metrics:
- bleu
- comet
- accuracy
- perplexity
- spearmanr
pipeline_tag: text-generation
co2_eq_emissions: 
  emissions: 232380
---


<img src="PersianMind.jpg" alt="PersianMind logo" width=200/> 


# <span style="font-variant:small-caps;">PersianMind</span>

<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
The model achieves state-of-the-art results on Persian subset of the [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) benchmark 
and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.

### Model Description

- **Developed by:** [Pedram Rostami](mailto:[email protected]), [Ali Salemi](mailto:[email protected]), and [Mohammad Javad Dousti](mailto:[email protected])
- **Model type:** Language model
- **Languages:** English and Persian
- **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (non-commercial use only.)

## How to Get Started with the Model

Use the code below to get started with the model.
Note that you need to install <code><b>sentencepiece</b></code> and <code><b>accelerate</b></code> libraries along with <code><b>Pytorch</b></code> and <code><b>🤗Transformers</b></code> to run this code.

```python
from transformers import LlamaTokenizer, LlamaForCausalLM
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = LlamaForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    device_map={"": device},
)
tokenizer = LlamaTokenizer.from_pretrained(
    "universitytehran/PersianMind-v1.0",
)

TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: "
CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \
    "NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \
    "providing recommendations, and helping with decision making. You can ask it anything you want and " \
    "it will do its best to give you accurate and relevant information."
PROMPT = "در مورد هوش مصنوعی توضیح بده."

model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT)
input_tokens = tokenizer(model_input, return_tensors="pt")
input_tokens = input_tokens.to(device)
generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1)
model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
model_output = model_output.replace(model_input, "")

print(model_output)
```

### How to Quantize the Model

Quantized models can be run on resource-constrained devices. 
To quantize the model, you should install the <code><b>bitsandbytes</b></code> library.
In order to quantize the model in 8-bit (`INT8`), use the code below. 

```python
model = LlamaForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0",
    device_map="auto",
    low_cpu_mem_usage=True,
    load_in_8bit=True
)
```

Alternatively, you can quantize the model in 4-bit (`INT4`) with the following code.

```python
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)
model = LlamaForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0", 
    quantization_config=quantization_config, 
    device_map="auto"
)
```

### Evaluating Quantized Models

| Model                                                              | <span style="font-variant:small-caps;">Belebele</span> (Persian) | Fa→En Translation | En→Fa Translation | Model Size | Tokens/sec |
| :----------------------------------------------------------------- | :--------------------------------------------------------------: | :---------------: | :---------------: | :--------: | :--------: |
| <span style="font-variant:small-caps;">PersianMind</span> (`BF16`) |        73.9                                                      |       83.61       |       79.44       |   13.7G    |   25.35    |
| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) |        73.7                                                      |       82.32       |       78.61       |    7.2G    |   11.36    |
| <span style="font-variant:small-caps;">PersianMind</span> (`INT4`) |        70.2                                                      |       82.07       |       80.36       |    3.9G    |   24.36    |

We evaluated quantized models in various tasks against the original model. 
Specifically, we evaluated all models using the reading comprehension multiple-choice 
question-answering benchmark of [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model. 
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks. 
For this, we utilized the Persian-English subset of the [<span style="font-variant:small-caps;">Flores</span>-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
reported our results using the <span style="font-variant:small-caps;">Comet</span> metric. 
Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks. 
To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint()` function.

## License
<span style="font-variant:small-caps;">PersianMind</span> is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE). 
It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page.
If you suspect any violations, please reach out to us.


## Citation

If you find the following model helpful, please ensure to cite the following paper.

**BibTeX:**
```bibtex
@article{persianmind,
  title={{PersianMind: A Cross-Lingual Persian-English Large Language Model}},
  author={Rostami, Pedram and Salemi, Ali and Dousti, Mohammad Javad},
  journal={arXiv preprint},
  year={2024}
}
```