hananour
/

darija-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

hananour commited on May 29, 2023

Commit

d150b14

·

1 Parent(s): 2848670

Update README.md

Files changed (1) hide show

README.md +10 -15

README.md CHANGED Viewed

@@ -46,6 +46,7 @@ F1 score.
 ### Results
 DarNERcorp_test: F1 = 66.06
 MixedNERcorp_test: F1 = 70.06
 ## Environmental Impact
@@ -65,21 +66,15 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 If you use DarNERcorp dataset to train your models, cite the following paper:
-**BibTeX:**
-@article{MOUSSA2023109234,
-title = {DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect},
-journal = {Data in Brief},
-volume = {48},
-pages = {109234},
-year = {2023},
-issn = {2352-3409},
-doi = {https://doi.org/10.1016/j.dib.2023.109234},
-url = {https://www.sciencedirect.com/science/article/pii/S2352340923003530},
-author = {Hanane Nour Moussa and Asmaa Mourhir},
-keywords = {Natural language processing, Text mining, Named entity recognition, Dialectal Arabic, Corpus, BIO},
-abstract = {DarNERcorp is a manually annotated named entity recognition (NER) dataset in the Moroccan dialect, also called Darija. The dataset consists of 65,905 tokens and their corresponding tags according to BIO scheme. 13.8% of the tokens are named entities spanning four categories: person, location, organization, and miscellaneous. The data were scraped from the Moroccan Dialect section of Wikipedia and processed and annotated using open-source libraries and tools. The data are useful for the Arabic natural language processing (NLP) community as they address the lack in dialectal Arabic annotated corpora. This dataset can be used to train and evaluate named entity recognition systems in dialectal and mixed Arabic.}
-}
 ## Model Card Authors [optional]

 ### Results
 DarNERcorp_test: F1 = 66.06
 MixedNERcorp_test: F1 = 70.06
 ## Environmental Impact
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 If you use DarNERcorp dataset to train your models, cite the following paper:
+Hanane Nour Moussa, Asmaa Mourhir,
+DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect,
+Data in Brief,
+Volume 48,
+2023,
+109234,
+ISSN 2352-3409,
+https://doi.org/10.1016/j.dib.2023.109234.
+(https://www.sciencedirect.com/science/article/pii/S2352340923003530)
 ## Model Card Authors [optional]