Switch Transformer (base-8) fine-tuned on samsum dataset for conversation summarization

This model is a fine-tuned version of google/switch-base-8 on the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4614
  • Rouge1: 46.1297
  • Rouge2: 22.9128
  • Rougel: 39.153
  • Rougelsum: 42.8502
  • Gen Len: 16.9719

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.8874 1.0 3683 1.5210 45.7651 22.9379 38.8554 42.6269 17.2482
1.6301 2.0 7366 1.4628 47.2719 24.8976 40.3913 43.9285 16.8362
1.4326 3.0 11049 1.4402 47.8275 25.2262 40.617 44.2948 16.9523
1.2992 4.0 14732 1.4489 48.393 25.3888 40.9534 44.797 17.1504
1.2259 5.0 18415 1.4495 49.2186 26.312 41.721 45.5087 17.1956
1.1477 6.0 22098 1.4610 49.0018 26.3474 41.5217 45.4081 17.0782

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train mrm8488/switch-base-8-finetuned-samsum

Evaluation results