|
--- |
|
language: wo |
|
tags: |
|
- bert |
|
- language-model |
|
- wo |
|
- wolof |
|
--- |
|
|
|
# Soraberta: Unsupervised Language Model Pre-training for Wolof |
|
|
|
**bert-base-wolof** is pretrained bert-base model on wolof language . |
|
|
|
## Soraberta models |
|
|
|
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters | |
|
| :------: | :---: | :---: | :---: | :---: | |
|
| `bert-base` | 6 | 12 | 514 | 56931622 M | |
|
|
|
|
|
|
|
|
|
## Using Soraberta with Hugging Face's Transformers |
|
|
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> unmasker = pipeline('fill-mask', model='abdouaziiz/bert-base-wolof') |
|
>>> unmasker("kuy yoot du [MASK].") |
|
|
|
[{'sequence': '[CLS] kuy yoot du seqet. [SEP]', |
|
'score': 0.09505125880241394, |
|
'token': 13578}, |
|
{'sequence': '[CLS] kuy yoot du daw. [SEP]', |
|
'score': 0.08882280439138412, |
|
'token': 679}, |
|
{'sequence': '[CLS] kuy yoot du yoot. [SEP]', |
|
'score': 0.057790059596300125, |
|
'token': 5117}, |
|
{'sequence': '[CLS] kuy yoot du seqat. [SEP]', |
|
'score': 0.05671025067567825, |
|
'token': 4992}, |
|
{'sequence': '[CLS] kuy yoot du yaqu. [SEP]', |
|
'score': 0.0469999685883522, |
|
'token': 1735}] |
|
``` |
|
|
|
## Training data |
|
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONLINE](http://www.wolof-online.com/) |
|
[ALFFA_PUBLIC](https://github.com/getalp/ALFFA_PUBLIC/tree/master/ASR/WOLOF) |
|
|
|
|
|
|
|
## Contact |
|
|
|
Please contact [email protected] for any question, feedback or request. |