PuoBERTa-POS: A Setswana Langage Model Finetuned on MasakhaPOS for Parts of Speech Tagging.

Zenodo doi badge arXiv 🤗 https://huggingface.co/dsfsi/PuoBERTa

A Roberta-based language model finetuned on MasakhanePOS for Parts of Speech Tagging.

Based on https://huggingface.co/dsfsi/PuoBERTa

Model Details

Model Description

This is a POS model trained on Setswana based on PuoBERTa and fineruned on MasakhaPOS Setswana.

  • Developed by: Vukosi Marivate (@vukosi), Moseli Mots'Oehli (@MoseliMotsoehli) , Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai
  • Model type: RoBERTa Model
  • Language(s) (NLP): Setswana
  • License: CC BY 4.0

Model Performance

Performance of models on the MasakhaPOS downstream task.

Model Test Performance
Multilingual Models
AfroLM 83.8
AfriBERTa 82.5
AfroXLMR-base 82.7
AfroXLMR-large 83.0
Monolingual Models
NCHLT TSN RoBERTa 82.3
PuoBERTa 83.4
PuoBERTa+JW300 84.1

Usage

Use this model for Part of Speech Tagging for Setswana.


Citation Information

Bibtex Refrence

@inproceedings{marivate2023puoberta,
  title   = {PuoBERTa: Training and evaluation of a curated language model for Setswana},
  author  = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai},
  year    = {2023},
  booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science},
  url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17},
  keywords = {NLP},
  preprint_url = {https://arxiv.org/abs/2310.09141},
  dataset_url = {https://github.com/dsfsi/PuoBERTa},
  software_url = {https://huggingface.co/dsfsi/PuoBERTa}
}

Contributing

Your contributions are welcome! Feel free to improve the model.

Model Card Authors

Vukosi Marivate

Model Card Contact

For more details, reach out or check our website.

Email: [email protected]

Enjoy exploring Setswana through AI!

Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train dsfsi/PuoBERTa-POS