File size: 2,558 Bytes
01ba413
 
 
 
 
 
 
713e96b
 
 
 
 
3f0b1cd
713e96b
 
 
4068f49
713e96b
 
 
 
 
2848670
d7fa973
2848670
713e96b
c774769
713e96b
 
 
34cc94c
c774769
713e96b
 
 
 
2848670
713e96b
 
 
3fc3e7c
d150b14
3fc3e7c
713e96b
 
 
 
 
 
 
2848670
 
 
 
 
713e96b
c774769
713e96b
 
2848670
713e96b
d150b14
 
 
 
 
 
 
 
 
713e96b
7b17b17
 
713e96b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
language:
- ar
pipeline_tag: token-classification
tags:
- NER
- Darija
widget:
- text: "دونالد طرامب هو الرئيس لفايت د ميريكان"
- text: "لمقار ديال OPEC كاين ف فيينا العاصمة ديال لوتريش"
- text: "عوينة يغومان جماعة ترابية قروية كاينة ف إقليم آسا الزاݣ"
---
# darija-ner

<!-- Provide a quick summary of what the model is/does. -->

This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4. 

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Hanane Nour Moussa
- **Model type:** Token classification
- **Language(s) (NLP):** Arabic, Darija

### Model Sources 

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/HananeNourMoussa/darija-ner
- **Paper (dataset):** Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
F1 score. 

### Results

DarNERcorp_test: F1 = 66.06%

MixedNERcorp_test: F1 = 70.06%

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** NVIDIA T4
- **Hours used:** 0.7
- **Cloud Provider:** Google Cloud
- **Compute Region:** europe-west1
- **Carbon Emitted:** 0.01 kg

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you use DarNERcorp dataset to train your models, cite the following paper: 

Hanane Nour Moussa, Asmaa Mourhir,
DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect,
Data in Brief,
Volume 48,
2023,
109234,
ISSN 2352-3409,
https://doi.org/10.1016/j.dib.2023.109234.
(https://www.sciencedirect.com/science/article/pii/S2352340923003530)

## GitHub Repo: 
Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner