File size: 2,558 Bytes
01ba413 713e96b 3f0b1cd 713e96b 4068f49 713e96b 2848670 d7fa973 2848670 713e96b c774769 713e96b 34cc94c c774769 713e96b 2848670 713e96b 3fc3e7c d150b14 3fc3e7c 713e96b 2848670 713e96b c774769 713e96b 2848670 713e96b d150b14 713e96b 7b17b17 713e96b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
language:
- ar
pipeline_tag: token-classification
tags:
- NER
- Darija
widget:
- text: "دونالد طرامب هو الرئيس لفايت د ميريكان"
- text: "لمقار ديال OPEC كاين ف فيينا العاصمة ديال لوتريش"
- text: "عوينة يغومان جماعة ترابية قروية كاينة ف إقليم آسا الزاݣ"
---
# darija-ner
<!-- Provide a quick summary of what the model is/does. -->
This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4.
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Hanane Nour Moussa
- **Model type:** Token classification
- **Language(s) (NLP):** Arabic, Darija
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/HananeNourMoussa/darija-ner
- **Paper (dataset):** Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
F1 score.
### Results
DarNERcorp_test: F1 = 66.06%
MixedNERcorp_test: F1 = 70.06%
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** NVIDIA T4
- **Hours used:** 0.7
- **Cloud Provider:** Google Cloud
- **Compute Region:** europe-west1
- **Carbon Emitted:** 0.01 kg
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use DarNERcorp dataset to train your models, cite the following paper:
Hanane Nour Moussa, Asmaa Mourhir,
DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect,
Data in Brief,
Volume 48,
2023,
109234,
ISSN 2352-3409,
https://doi.org/10.1016/j.dib.2023.109234.
(https://www.sciencedirect.com/science/article/pii/S2352340923003530)
## GitHub Repo:
Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner
|