mt5-gigatrue-tpb

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 19.8924

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
46.6558 0.1015 3000 21.7202
32.5002 0.2030 6000 21.1662
31.6686 0.3044 9000 20.6858
31.3231 0.4059 12000 20.5872
31.1165 0.5074 15000 20.3965
31.0253 0.6089 18000 20.2668
30.9446 0.7104 21000 20.1311
30.9031 0.8119 24000 20.0952
30.8587 0.9133 27000 20.0290
30.8229 1.0148 30000 20.1145
30.821 1.1163 33000 19.9165
30.7911 1.2178 36000 19.8157
30.784 1.3193 39000 19.9888
30.7765 1.4207 42000 19.9128
30.774 1.5222 45000 19.8938
30.7461 1.6237 48000 19.7771
30.7454 1.7252 51000 19.9231
30.724 1.8267 54000 19.7420
30.731 1.9282 57000 19.8704
30.7417 2.0296 60000 19.9472
30.7423 2.1311 63000 19.9312
30.7266 2.2326 66000 19.9009
30.7288 2.3341 69000 19.9468
30.7245 2.4356 72000 19.8720
30.7434 2.5370 75000 19.9333
30.731 2.6385 78000 19.8618
30.7104 2.7400 81000 19.8979
30.7213 2.8415 84000 19.8811
30.7303 2.9430 87000 19.8924

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
4
Safetensors
Model size
120M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support