duo-predict-gpt2-medium-wikitext

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2546
  • Accuracy: 0.0073
  • Perplexity: 9.5311
  • Bleu: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy Perplexity Bleu
7.6654 0.1403 500 3.7315 0.0073 41.7396 1.0
7.0276 0.2807 1000 3.4735 0.0073 32.2490 1.0
6.4629 0.4210 1500 3.1863 0.0073 24.1987 1.0
5.9671 0.5613 2000 2.9542 0.0073 19.1873 1.0
5.6969 0.7017 2500 2.8233 0.0073 16.8331 1.0
5.5077 0.8420 3000 2.7351 0.0073 15.4112 1.0
5.3536 0.9823 3500 2.6607 0.0073 14.3059 1.0
5.2099 1.1226 4000 2.6000 0.0073 13.4641 1.0
5.1158 1.2630 4500 2.5493 0.0073 12.7980 1.0
5.0453 1.4033 5000 2.5125 0.0073 12.3362 1.0
4.955 1.5436 5500 2.4806 0.0073 11.9489 1.0
4.9157 1.6840 6000 2.4537 0.0073 11.6310 1.0
4.8756 1.8243 6500 2.4300 0.0073 11.3584 1.0
4.844 1.9646 7000 2.4100 0.0073 11.1342 1.0
4.7136 2.1050 7500 2.3948 0.0073 10.9657 1.0
4.6911 2.2453 8000 2.3805 0.0073 10.8105 1.0
4.6741 2.3856 8500 2.3668 0.0073 10.6637 1.0
4.6485 2.5260 9000 2.3538 0.0073 10.5257 1.0
4.623 2.6663 9500 2.3416 0.0073 10.3976 1.0
4.6016 2.8066 10000 2.3303 0.0073 10.2806 1.0
4.5823 2.9470 10500 2.3202 0.0073 10.1776 1.0
4.4802 3.0873 11000 2.3143 0.0073 10.1182 1.0
4.4671 3.2276 11500 2.3073 0.0073 10.0469 1.0
4.4557 3.3679 12000 2.3006 0.0073 9.9800 1.0
4.4437 3.5083 12500 2.2928 0.0073 9.9023 1.0
4.4402 3.6486 13000 2.2862 0.0073 9.8375 1.0
4.4482 3.7889 13500 2.2800 0.0073 9.7763 1.0
4.4279 3.9293 14000 2.2752 0.0073 9.7303 1.0
4.3188 4.0696 14500 2.2730 0.0073 9.7087 1.0
4.3193 4.2099 15000 2.2691 0.0073 9.6704 1.0
4.3158 4.3503 15500 2.2652 0.0073 9.6329 1.0
4.3196 4.4906 16000 2.2619 0.0073 9.6012 1.0
4.2946 4.6309 16500 2.2589 0.0073 9.5722 1.0
4.3078 4.7713 17000 2.2564 0.0073 9.5487 1.0
4.2974 4.9116 17500 2.2546 0.0073 9.5311 1.0

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
26
Safetensors
Model size
354M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support