UBC-NLP/afrolid_1.5 · Hugging Face

AfroLID

AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. AfroLID is described in this paper: AfroLID: A Neural Language Identification Tool for African Languages.

What's New in AfroLID v1.5?

Fine-tuned on SERENGETI, a massively multilingual language model covering 517 African languages and language varieties.
Enhanced model performance, improving macro-F1 from 95.95 to 97.41.
Built on Hugging Face Transformers for seamless integration.
Optimized for easy use with the Hugging Face pipeline.
Better efficiency and accuracy, making it more robust for African langauges identification.

How to use AfroLID v1.5?

from transformers import pipeline


afrolid = pipeline("text-classification", model='UBC-NLP/afrolid_1.5')

input_text="6Acï looi aya në wuöt dït kɔ̈k yiic ku lɔ wuöt tɔ̈u tëmec piny de Manatha ku Eparaim ku Thimion , ku ɣään mec tɔ̈u të lɔ rut cï Naptali"

result = afrolid(input_text)

# Extract the label and score from the first result
language = result[0]['label']
score = result[0]['score']

print(f"detected langauge: {language}\tscore: {round(score*100, 2)}")

Output:

detected langauge: dip	score: 99.99

Supported languages

Please refer to suported-languages

Citation

If you use the AfroLID v1.5 model for your scientific publication, or if you find the resources in this repository useful, please cite our papers as follows:

AfroLID's paper*

@article{adebara2022afrolid,
  title={AfroLID: A Neural Language Identification Tool for African Languages},
  author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
  booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = December,
  year = "2022",
}

Serengeti's Paper

@inproceedings{adebara-etal-2023-serengeti,
    title = "{SERENGETI}: Massively Multilingual Language Models for {A}frica",
    author = "Adebara, Ife  and
      Elmadany, AbdelRahim  and
      Abdul-Mageed, Muhammad  and
      Alcoba Inciarte, Alcides",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.97",
    doi = "10.18653/v1/2023.findings-acl.97",
    pages = "1498--1537",
}