Model Details

This project demonstrates the fine-tuning of the DistilBERT model on the IMDB dataset for text classification, using the Hugging Face Transformers library.

Model Architecture

Model: DistilBERT-base-uncased
Optimizer: AdamW
Loss Function: Cross-entropy loss
Epochs: 4
Learning Rate: 2e-5
Batch Size: 16

Dataset

The imdb data is the collection of reviews of movies categorized into TWO classes:

POSITIVE
NEGATIVE

You can access the dataset via the Hugging Face datasets library.

Training Configuration

The training arguments are set as follows:

training_args = TrainingArguments(
    output_dir="distilbert-base-uncased-finetuned-sentiment-analysis",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=True,
)

You can change the parameters according to your requirements!!

Model Evaluation Results

Epoch	Eval Loss	Eval Accuracy
1	0.1881	92.90%
2	0.2331	93.39%
3	0.2919	93.39%
4	0.3253	93.67%

Dependencies

The required dependencies for this project are:

transformers
datasets
torch
sklearn
numpy

How to Use the Model

You can use the fine-tuned model for sentiment analysis using the Hugging Face pipeline as follows:

from transformers import pipeline

# Load the model from Hugging Face Hub
sentiment_analysis = pipeline("sentiment-analysis", model="Sathyam03/distilbert-base-uncased-finetuned-sentiment-analysis")

# Example usage
reviews = [
    "I absolutely loved this movie! It was fantastic.",
    "The film was okay, but it dragged on in some parts.",
    "I didn't like this movie at all. It was boring."
]

results = sentiment_analysis(reviews)

# Print the results
for review, result in zip(reviews, results):
    print(f"Review: {review}")
    print(f"Sentiment: {result['label']}, Confidence: {result['score']:.4f}\n")
)