darija-ner / README.md
hananour's picture
Update README.md
34cc94c verified
---
language:
- ar
pipeline_tag: token-classification
tags:
- NER
- Darija
widget:
- text: "دونالد طرامب هو الرئيس لفايت د ميريكان"
- text: "لمقار ديال OPEC كاين ف فيينا العاصمة ديال لوتريش"
- text: "عوينة يغومان جماعة ترابية قروية كاينة ف إقليم آسا الزاݣ"
---
# darija-ner
<!-- Provide a quick summary of what the model is/does. -->
This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4.
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Hanane Nour Moussa
- **Model type:** Token classification
- **Language(s) (NLP):** Arabic, Darija
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/HananeNourMoussa/darija-ner
- **Paper (dataset):** Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
F1 score.
### Results
DarNERcorp_test: F1 = 66.06%
MixedNERcorp_test: F1 = 70.06%
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** NVIDIA T4
- **Hours used:** 0.7
- **Cloud Provider:** Google Cloud
- **Compute Region:** europe-west1
- **Carbon Emitted:** 0.01 kg
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use DarNERcorp dataset to train your models, cite the following paper:
Hanane Nour Moussa, Asmaa Mourhir,
DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect,
Data in Brief,
Volume 48,
2023,
109234,
ISSN 2352-3409,
https://doi.org/10.1016/j.dib.2023.109234.
(https://www.sciencedirect.com/science/article/pii/S2352340923003530)
## GitHub Repo:
Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner