Model Card for Arabic Text Classification Model
This model classifies Arabic text into one of seven categories using FastText’s supervised learning method. It is particularly suitable for tasks requiring rapid text categorization in Arabic.
Model Details
Model Description
- Developed by: [Tevfik İstanbullu]
- Model type: [Supervised classification model using FastText embeddings]
- Language(s) (NLP): [Arabic]
- License: [Apache License 2.0d]
Model Sources [optional]
- Repository: [https://huggingface.co/Tevfik34/arabic-text-classifier-fasttext]
- Demo: []
Uses
Direct Use
This model is intended for direct use in text classification tasks within the Arabic language. It can be deployed in applications for organizing news articles, automating customer support categorization, or any other domain-specific categorization tasks.
Out-of-Scope Use
The model is not designed for tasks outside of the Arabic language or for multi-label classifications where multiple labels are assigned to a single text instance.
Bias, Risks, and Limitations
This model is trained on publicly available Arabic text data with specific categories (Finance, Sports, Politics, Medical, Tech, Culture, Religion). It may contain biases present in the original dataset and may not perform equally well on all Arabic dialects. Users should test the model in their specific applications to assess accuracy and suitability.
Recommendations
Users should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[```python import fasttext
Load the model
model = fasttext.load_model("path_to_your_model.bin")
Make predictions
label, prob = model.predict("Sample Arabic text") print(f"Predicted Label: {label[0]}, Probability: {prob[0]}")]
Training Details
Training Data
Data Size: 194,317 Arabic text samples Categories: 7 categories - Finance, Sports, Politics, Medical, Tech, Culture, Religion
Training Procedure
- Embedding Dimension: 300
- Epochs: 25
- Learning Rate: 0.1
- Word N-grams: 3
- Min Count: 1 These parameters were selected to enhance the model’s ability to capture context in Arabic text and perform well across a diverse range of categories.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluated on a hold-out test set of 6,428 Arabic text samples, representing the seven categories.
Metrics
- Accuracy: Measures overall model performance.
- Precision: Reflects the relevancy of predicted categories.
- Recall: Indicates the model's ability to identify relevant categories.
- F1-score: The harmonic mean of precision and recall, balancing these two metrics.
Results
- Precision: 96.20%
- Recall: 95.40%
- F1: 95.79%
- Downloads last month
- 11