mohannad-tazi
/

NER_Darija_MAR_FSBM

Token Classification

named-entity-recognition

Inference Endpoints

Model card Files Files and versions Community

NER_Darija_MAR_FSBM / README.md

mohannad-tazi's picture

Update README.md

cfbe1ec verified 3 days ago

|

history blame contribute delete

2.76 kB

	---
	language:
	- ar
	metrics:
	- precision
	- accuracy
	- recall
	- f1
	base_model:
	- aubmindlab/bert-base-arabertv02
	pipeline_tag: token-classification
	library_name: transformers
	datasets:
	- DarNERcorp
	tags:
	- ner
	- named-entity-recognition
	- arabic
	- darija


	---

	# NER Model for Moroccan Dialect (Darija)

	## Model Description
	This model is a Named Entity Recognition (NER) model fine-tuned on the DarNERcorp dataset. It is designed to recognize entities such as person names, locations, organizations, and miscellaneous entities in Moroccan Arabic (Darija) text. The model is based on the BERT architecture and is useful for tasks such as information extraction from social media or news articles.

	### Model Architecture
	- Architecture: BERT-based model for token classification
	- Pre-trained Model: aubmindlab/bert-base-arabertv02
	- Fine-tuning Dataset: DarNERcorp
	- Languages: Moroccan Arabic (Darija)

	## Intended Use
	This model is designed for Named Entity Recognition tasks in Moroccan Arabic. It can identify and classify entities such as:
	- PER: Person names (e.g., "محمد", "فاطمة")
	- LOC: Locations (e.g., "الرباط", "طنجة")
	- ORG: Organizations (e.g., "البنك المغربي", "جامعة الحسن الثاني")
	- MISC: Miscellaneous entities

	### Use Cases
	- Social media analysis: Extracting entities from Moroccan Arabic posts and tweets.
	- News summarization: Identifying important entities in news articles.
	- Information extraction: Extracting named entities from informal or formal texts.

	## Evaluation Results

	The model achieves the following results on the evaluation dataset:
	- Precision: 74.04%
	- Recall: 85.16%
	- F1 Score: 78.61%

	## How to Use
	To use the model, you need to load it with the Hugging Face Transformers library. Here's an example:

	```python
	from transformers import pipeline

	# Load the model
	nlp = pipeline("ner", model="mohannad-tazi/ner-darija-darner")

	# Use the model
	text = "محمد كان في الرباط."
	result = nlp(text)
	print(result)

	# Dataset
	The model is trained on the DarNERcorp dataset, a corpus designed specifically for Named Entity Recognition in the Moroccan Arabic dialect. The dataset includes sentences labeled with named entity tags such as PER, LOC, ORG, and MISC.

	# Preprocessing Steps
	- Tokenization using the BERT tokenizer.
	- Alignment of labels with tokenized inputs (considering word-piece tokens).
	- Padding and truncating sentences to a fixed length for uniformity.

	#Limitations
	The model is trained on a specific corpus and may not generalize well to all Moroccan Arabic texts.
	Performance may vary depending on text quality and tagging consistency in the dataset.