Model Card: BERT Fine-tuned on SWAG

Model Overview

This model is a fine-tuned version of BERT-Base, Uncased developed by Google. The fine-tuning was performed on the SWAG dataset, a large-scale dataset for grounded commonsense inference, though the specific details of the dataset used were not provided. The model was fine-tuned for a single epoch and is optimized for tasks related to natural language understanding, particularly in scenarios requiring reasoning about the world using commonsense.

Model Architecture

  • Base Model: BERT-Base, Uncased
  • Layers: 12 Transformer layers
  • Parameters: 110M
  • Pre-training: The base model was pre-trained on the English Wikipedia and BookCorpus datasets.

Performance

The model achieved the following results on the evaluation set:

  • Validation Loss: 0.5240
  • Accuracy: 79.70%

Intended Use

Use Cases

This model is intended for tasks requiring natural language understanding, especially those involving commonsense reasoning. Potential use cases include:

  • Multiple-choice question answering
  • Contextual word embedding generation
  • Commonsense inference tasks

Limitations

  • Data Bias: As the dataset specifics are unknown, there might be biases in the training data that could affect the model’s predictions.
  • Generalization: The model's performance on domains outside of commonsense reasoning tasks (like domain-specific text) may be suboptimal.
  • Ethical Considerations: Users should be aware of potential ethical concerns when applying this model to sensitive or critical tasks. Misinterpretation of commonsense reasoning could lead to flawed or biased outcomes.

Training and Evaluation Data

Dataset

The model was fine-tuned on a dataset intended for grounded commonsense inference, likely the SWAG dataset. The specifics of the dataset, including size, distribution, and preprocessing methods, were not provided.

Training Procedure

Hyperparameters

The model was trained using the following hyperparameters:

  • Learning Rate: 5e-05
  • Train Batch Size: 16
  • Eval Batch Size: 16
  • Optimizer: Adam (betas: (0.9, 0.999), epsilon: 1e-08)
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 1
  • Seed: 42

Training Results

The training and evaluation results are summarized below:

Training Loss Epoch Step Validation Loss Accuracy
0.6971 1.0 4597 0.5240 0.7970

Framework Versions

The following software versions were used during training:

  • Transformers: 4.42.4
  • PyTorch: 2.4.0+cu121
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Ethical Considerations

When deploying this model, users should be cautious of potential biases and limitations inherent in the dataset and the model’s training process. Ensuring that the model is used in a manner that is fair, unbiased, and ethical is crucial, particularly in sensitive applications.

Contact Information

For further information or questions, please contact the maintainers of this model or refer to the associated documentation and code repository.

Downloads last month
320
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for ashaduzzaman/bert-finetuned-swag

Finetuned
(2422)
this model

Dataset used to train ashaduzzaman/bert-finetuned-swag