md_mt5_base_boun_split_first_v2

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4874
  • Bleu: 0.5931
  • Gen Len: 18.7836

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
14.1813 1.0 975 2.7812 1.1085 18.9862
2.9943 2.0 1950 1.4463 1.4366 18.7331
1.9645 3.0 2925 1.0962 0.5916 18.7738
1.5852 4.0 3900 0.8990 0.5837 18.6944
1.3504 5.0 4875 0.7589 0.5952 18.7164
1.1926 6.0 5850 0.6843 0.6057 18.7367
1.0963 7.0 6825 0.6291 0.5969 18.7197
1.0192 8.0 7800 0.5902 0.6007 18.7428
0.9537 9.0 8775 0.5614 0.5879 18.7492
0.9127 10.0 9750 0.5366 0.5871 18.7667
0.8705 11.0 10725 0.5166 0.5841 18.7718
0.8472 12.0 11700 0.5041 0.5869 18.7777
0.8312 13.0 12675 0.4963 0.5917 18.7821
0.8243 14.0 13650 0.4890 0.5944 18.7838
0.8099 15.0 14625 0.4874 0.5931 18.7836

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
27
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Buseak/md_mt5_base_boun_split_first_v2

Base model

google/mt5-small
Finetuned
(370)
this model
Finetunes
1 model