Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,7 @@ co2_eq_emissions:
|
|
29 |
# <span style="font-variant:small-caps;">PersianMind</span>
|
30 |
|
31 |
<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
|
32 |
-
The model achieves state-of-the-art results on Persian subset of the [Belebele](https://github.com/facebookresearch/belebele) benchmark
|
33 |
and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
|
34 |
It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.
|
35 |
|
@@ -111,15 +111,15 @@ model = LlamaForCausalLM.from_pretrained(
|
|
111 |
|
112 |
### Evaluating Quantized Models
|
113 |
|
114 |
-
| Model | Belebele (Persian) | Fa→En Translation | En→Fa Translation | Model Size | Tokens/sec |
|
115 |
-
| :----------------------------------------------------------------- |
|
116 |
-
| <span style="font-variant:small-caps;">PersianMind</span> (`bf16`) | 73.9
|
117 |
-
| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) | 73.7
|
118 |
-
| <span style="font-variant:small-caps;">PersianMind</span> (`INT4`) | 70.2
|
119 |
|
120 |
We evaluated quantized models in various tasks against the original model.
|
121 |
Specifically, we evaluated all models using the reading comprehension multiple-choice
|
122 |
-
question-answering benchmark of [Belebele](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model.
|
123 |
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
|
124 |
For this, we utilized the Persian-English subset of the [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
|
125 |
reported our results using the <span style="font-variant:small-caps;">Comet</span> metric.
|
|
|
29 |
# <span style="font-variant:small-caps;">PersianMind</span>
|
30 |
|
31 |
<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
|
32 |
+
The model achieves state-of-the-art results on Persian subset of the [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) benchmark
|
33 |
and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
|
34 |
It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.
|
35 |
|
|
|
111 |
|
112 |
### Evaluating Quantized Models
|
113 |
|
114 |
+
| Model | <span style="font-variant:small-caps;">Belebele</span> (Persian) | Fa→En Translation | En→Fa Translation | Model Size | Tokens/sec |
|
115 |
+
| :----------------------------------------------------------------- | :--------------------------------------------------------------: | :---------------: | :---------------: | :--------: | :--------: |
|
116 |
+
| <span style="font-variant:small-caps;">PersianMind</span> (`bf16`) | 73.9 | 83.61 | 79.44 | 13.7G | 25.35 |
|
117 |
+
| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 |
|
118 |
+
| <span style="font-variant:small-caps;">PersianMind</span> (`INT4`) | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 |
|
119 |
|
120 |
We evaluated quantized models in various tasks against the original model.
|
121 |
Specifically, we evaluated all models using the reading comprehension multiple-choice
|
122 |
+
question-answering benchmark of [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model.
|
123 |
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
|
124 |
For this, we utilized the Persian-English subset of the [Flores-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
|
125 |
reported our results using the <span style="font-variant:small-caps;">Comet</span> metric.
|