YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. AfroLID is described in this paper: AfroLID: A Neural Language Identification Tool for African Languages.
What's New in AfroLID v1.5?
- Fine-tuned on SERENGETI, a massively multilingual language model covering 517 African languages and language varieties.
- Enhanced model performance, improving macro-F1 from 95.95 to 97.41.
- Built on Hugging Face Transformers for seamless integration.
- Optimized for easy use with the Hugging Face pipeline.
- Better efficiency and accuracy, making it more robust for African langauges identification.
How to use AfroLID v1.5?
from transformers import pipeline
afrolid = pipeline("text-classification", model='UBC-NLP/afrolid_1.5')
input_text="6Acï looi aya në wuöt dït kɔ̈k yiic ku lɔ wuöt tɔ̈u tëmec piny de Manatha ku Eparaim ku Thimion , ku ɣään mec tɔ̈u të lɔ rut cï Naptali"
result = afrolid(input_text)
# Extract the label and score from the first result
language = result[0]['label']
score = result[0]['score']
print(f"detected langauge: {language}\tscore: {round(score*100, 2)}")
Output:
detected langauge: dip score: 99.99
Supported languages
Please refer to suported-languages
Citation
If you use the AfroLID v1.5 model for your scientific publication, or if you find the resources in this repository useful, please cite our papers as follows:
AfroLID's paper*
@article{adebara2022afrolid,
title={AfroLID: A Neural Language Identification Tool for African Languages},
author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = December,
year = "2022",
}
Serengeti's Paper
@inproceedings{adebara-etal-2023-serengeti,
title = "{SERENGETI}: Massively Multilingual Language Models for {A}frica",
author = "Adebara, Ife and
Elmadany, AbdelRahim and
Abdul-Mageed, Muhammad and
Alcoba Inciarte, Alcides",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.97",
doi = "10.18653/v1/2023.findings-acl.97",
pages = "1498--1537",
}
- Downloads last month
- 14
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.