File size: 965 Bytes
70c07dc
 
6673d15
70c07dc
cc527ae
384e938
3b28145
 
 
bdd837a
264ea79
 
8981ee0
 
3b28145
d1ff48f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
---
license: gpl-2.0
language: ar
---
A model which is jointly trained and fine-tuned on Quran, Saheefa and nahj-al-balaqa. All Datasets are available [Here](https://github.com/language-ml/course-nlp-ir-1-text-exploring/tree/main/exploring-datasets/religious_text). Code will be available soon ...

Some Examples for filling the mask:

- ```
ุฐูŽู„ููƒูŽ [MASK] ู„ูŽุง ุฑูŽูŠู’ุจูŽ ูููŠู‡ู ู‡ูุฏู‹ู‰ ู„ูู„ู’ู…ูุชู‘ูŽู‚ููŠู†ูŽ
```
- ```
ูŠูŽุง ุฃูŽูŠู‘ูู‡ูŽุง ุงู„ู†ู‘ูŽุงุณู ุงุนู’ุจูุฏููˆุง ุฑูŽุจู‘ูŽูƒูู…ู ุงู„ู‘ูŽุฐููŠ ุฎูŽู„ูŽู‚ูŽูƒูู…ู’ ูˆูŽุงู„ู‘ูŽุฐููŠู†ูŽ ู…ูู†ู’ ู‚ูŽุจู’ู„ููƒูู…ู’ ู„ูŽุนูŽู„ู‘ูŽูƒูู…ู’ [MASK]
```

This model is fine-tuned on [Bert Base Arabic](https://huggingface.co/asafaya/bert-base-arabic) for 30 epochs. We have used `Masked Language Modeling` to fine-tune the model. Also, after each 5 epochs, we have completely masked the words again for the model to learn the embeddings very well and not overfit the data.