Arabic Real Fake news Classification Models Repository

Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.

Models Overview and Performance

This repository includes the following models with their corresponding accuracies:

Model Name	Accuracy
Logistic Regression Model	0.97
Decision Tree Model	0.94
Gradient Boosting Classifier	0.91
Random Forest Classifier	0.99

Model Descriptions

1. Logistic Regression Model

Overview:

Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.

Features:

Fast and computationally efficient.
Performs well on linearly separable data.
Provides probabilistic predictions.

Use Cases:

Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
Interpretable model with coefficients indicating feature importance.

2. Decision Tree Model

Overview:

The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.

Features:

Easy to interpret and visualize.
Handles both numerical and categorical data.
Prone to overfitting on noisy data.

Use Cases:

Classifying Arabic news articles into different categories.
Tasks where interpretability is crucial.

3. Gradient Boosting Classifier

Overview:

Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.

Features:

Excellent for handling non-linear relationships.
Robust to overfitting with proper hyperparameter tuning.
Handles imbalanced datasets well.

Use Cases:

Classifying complex Arabic news articles with nuanced patterns.
Scenarios requiring high predictive performance.

4. Random Forest Classifier

Overview:

Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.

Features:

High accuracy and robustness to noise.
Handles large datasets with higher dimensionality.
Reduces overfitting compared to individual decision trees.

Use Cases:

Predicting the category of Arabic news articles.
Applications requiring feature importance insights.

How to Use the Models

All models are saved as .joblib files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the Random Forest Classifier with Arabic news data:

import joblib

# Load the model
model = joblib.load("RandomForestClassifier_model.joblib")

# Example input: Arabic news text
input_data = [
    "أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
]

# Get prediction
prediction = model.predict(input_data)
print(f"Predicted class: {prediction}")

sanaa-11
/

arabic-fake-news-classification

Arabic Real Fake news Classification Models Repository

Models Overview and Performance

Model Descriptions

1. Logistic Regression Model

Overview:

Features:

Use Cases:

2. Decision Tree Model

Overview:

Features:

Use Cases:

3. Gradient Boosting Classifier

Overview:

Features:

Use Cases:

4. Random Forest Classifier

Overview:

Features:

Use Cases:

How to Use the Models

This work was in collaboration between: Sanaa ABRIL and Sihame Mouanid

Dataset used to train sanaa-11/arabic-fake-news-classification

Space using sanaa-11/arabic-fake-news-classification 1