DistilBERT Fine-Tuned on IMDB for Masked Language Modeling (Accelerate)

Model Description

This model is a fine-tuned version of distilbert-base-uncased for the masked language modeling (MLM) task. It has been trained on the IMDb dataset using the Hugging Face 🤗 Accelerate library.

Model Training Details

Training Dataset

Dataset: IMDB dataset from Hugging Face.
Dataset Splits:
- Train: 25,000 samples
- Test: 25,000 samples
- Unsupervised: 50,000 samples
Training Strategy:
- Combined the train and unsupervised splits for training, resulting in 75,000 training examples.
- Applied fixed random masking to the evaluation set to ensure consistent perplexity scores.

Training Configuration

The model was trained using the following parameters:

Number of Training Epochs: 10
Batch Size: 64 (per device).
Learning Rate: 5e-5
Weight Decay: 0.01
Evaluation Strategy: After each epoch.
Early Stopping: Enabled (Patience = 3).
Metric for Best Model: eval_loss
- Direction: Lower eval_loss is better (greater_is_better = False).
Learning Rate Scheduler: Linear decay with no warmup steps.
Mixed Precision Training: Enabled (FP16).

Model Results

Best Epoch Performance

Best Epoch: 9
Loss: 2.0173
Perplexity: 7.5178

Early Stopping

The training ran for the full 10 epochs as the evaluation loss continued to improve.

Model Usage

This fine-tuned model can be used for masked language modeling tasks using the fill-mask pipeline from Hugging Face. Below is an example:

from transformers import pipeline

mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm-accelerate")

text = "This is a great [MASK]."
predictions = mask_filler(text)

for pred in predictions:
    print(f">>> {pred['sequence']}")

Example Output:

>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great story.
>>> This is a great documentary.

Prikshit7766
/

distilbert-finetuned-imdb-mlm-accelerate