Model Details
This project demonstrates the fine-tuning of the DistilBERT model on the IMDB dataset for text classification, using the Hugging Face Transformers library.
Model Architecture
- Model:
DistilBERT-base-uncased
- Optimizer: AdamW
- Loss Function: Cross-entropy loss
- Epochs: 4
- Learning Rate: 2e-5
- Batch Size: 16
Dataset
The imdb data is the collection of reviews of movies categorized into TWO classes:
- POSITIVE
- NEGATIVE
You can access the dataset via the Hugging Face datasets
library.
Training Configuration
The training arguments are set as follows:
training_args = TrainingArguments(
output_dir="distilbert-base-uncased-finetuned-sentiment-analysis",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=4,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
push_to_hub=True,
)
You can change the parameters according to your requirements!!
Model Evaluation Results
Epoch | Eval Loss | Eval Accuracy |
---|---|---|
1 | 0.1881 | 92.90% |
2 | 0.2331 | 93.39% |
3 | 0.2919 | 93.39% |
4 | 0.3253 | 93.67% |
Dependencies
The required dependencies for this project are:
- transformers
- datasets
- torch
- sklearn
- numpy
How to Use the Model
You can use the fine-tuned model for sentiment analysis using the Hugging Face pipeline
as follows:
from transformers import pipeline
# Load the model from Hugging Face Hub
sentiment_analysis = pipeline("sentiment-analysis", model="Sathyam03/distilbert-base-uncased-finetuned-sentiment-analysis")
# Example usage
reviews = [
"I absolutely loved this movie! It was fantastic.",
"The film was okay, but it dragged on in some parts.",
"I didn't like this movie at all. It was boring."
]
results = sentiment_analysis(reviews)
# Print the results
for review, result in zip(reviews, results):
print(f"Review: {review}")
print(f"Sentiment: {result['label']}, Confidence: {result['score']:.4f}\n")
)
- Downloads last month
- 17