Arabic Real Fake news Classification Models Repository

Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.


Models Overview and Performance

This repository includes the following models with their corresponding accuracies:

Model Name Accuracy
Logistic Regression Model 0.97
Decision Tree Model 0.94
Gradient Boosting Classifier 0.91
Random Forest Classifier 0.99

Model Descriptions

1. Logistic Regression Model

Overview:

Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.

Features:

  • Fast and computationally efficient.
  • Performs well on linearly separable data.
  • Provides probabilistic predictions.

Use Cases:

  • Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
  • Interpretable model with coefficients indicating feature importance.

2. Decision Tree Model

Overview:

The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.

Features:

  • Easy to interpret and visualize.
  • Handles both numerical and categorical data.
  • Prone to overfitting on noisy data.

Use Cases:

  • Classifying Arabic news articles into different categories.
  • Tasks where interpretability is crucial.

3. Gradient Boosting Classifier

Overview:

Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.

Features:

  • Excellent for handling non-linear relationships.
  • Robust to overfitting with proper hyperparameter tuning.
  • Handles imbalanced datasets well.

Use Cases:

  • Classifying complex Arabic news articles with nuanced patterns.
  • Scenarios requiring high predictive performance.

4. Random Forest Classifier

Overview:

Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.

Features:

  • High accuracy and robustness to noise.
  • Handles large datasets with higher dimensionality.
  • Reduces overfitting compared to individual decision trees.

Use Cases:

  • Predicting the category of Arabic news articles.
  • Applications requiring feature importance insights.

How to Use the Models

All models are saved as .joblib files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the Random Forest Classifier with Arabic news data:

import joblib

# Load the model
model = joblib.load("RandomForestClassifier_model.joblib")

# Example input: Arabic news text
input_data = [
    "أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
]

# Get prediction
prediction = model.predict(input_data)
print(f"Predicted class: {prediction}")

This work was in collaboration between: Sanaa ABRIL and Sihame Mouanid

Downloads last month
95
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train sanaa-11/arabic-fake-news-classification

Space using sanaa-11/arabic-fake-news-classification 1