|
--- |
|
license: apache-2.0 |
|
language: |
|
- ar |
|
metrics: |
|
- precision |
|
- recall |
|
library_name: fasttext |
|
pipeline_tag: text-classification |
|
tags: |
|
- arabic |
|
- text-classification |
|
--- |
|
|
|
# Model Card for Arabic Text Classification Model |
|
|
|
This model classifies Arabic text into one of seven categories using FastText’s supervised learning method. It is particularly suitable for tasks requiring rapid text categorization in Arabic. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** [Tevfik İstanbullu] |
|
- **Model type:** [Supervised classification model using FastText embeddings] |
|
- **Language(s) (NLP):** [Arabic] |
|
- **License:** [Apache License 2.0d] |
|
|
|
### Model Sources [optional] |
|
|
|
|
|
- **Repository:** [https://huggingface.co/Tevfik34/arabic-text-classifier-fasttext] |
|
- **Demo:** [] |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended for direct use in text classification tasks within the Arabic language. It can be deployed in applications for organizing news articles, automating customer support categorization, or any other domain-specific categorization tasks. |
|
|
|
|
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
The model is not designed for tasks outside of the Arabic language or for multi-label classifications where multiple labels are assigned to a single text instance. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
This model is trained on publicly available Arabic text data with specific categories (Finance, Sports, Politics, Medical, Tech, Culture, Religion). |
|
It may contain biases present in the original dataset and may not perform equally well on all Arabic dialects. |
|
Users should test the model in their specific applications to assess accuracy and suitability. |
|
|
|
### Recommendations |
|
|
|
|
|
Users should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[```python |
|
import fasttext |
|
|
|
# Load the model |
|
model = fasttext.load_model("path_to_your_model.bin") |
|
|
|
# Make predictions |
|
label, prob = model.predict("Sample Arabic text") |
|
print(f"Predicted Label: {label[0]}, Probability: {prob[0]}")] |
|
|
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Data Size: 194,317 Arabic text samples |
|
Categories: 7 categories - Finance, Sports, Politics, Medical, Tech, Culture, Religion |
|
|
|
### Training Procedure |
|
|
|
* Embedding Dimension: 300 |
|
* Epochs: 25 |
|
* Learning Rate: 0.1 |
|
* Word N-grams: 3 |
|
* Min Count: 1 |
|
These parameters were selected to enhance the model’s ability to capture context in Arabic text and perform well across a diverse range of categories. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
Evaluated on a hold-out test set of 6,428 Arabic text samples, representing the seven categories. |
|
|
|
#### Metrics |
|
|
|
* Accuracy: Measures overall model performance. |
|
* Precision: Reflects the relevancy of predicted categories. |
|
* Recall: Indicates the model's ability to identify relevant categories. |
|
* F1-score: The harmonic mean of precision and recall, balancing these two metrics. |
|
|
|
### Results |
|
* Precision: 96.20% |
|
* Recall: 95.40% |
|
* F1: 95.79% |