tfa_output_2025_m02_d02_t23h_28m_54s

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4602

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 3.1242
3.7239 0.5714 1 3.1242
4.5819 1.5714 2 3.1012
4.5655 2.5714 3 3.0775
4.5318 3.5714 4 3.0655
4.5864 4.5714 5 3.0469
4.4551 5.5714 6 3.0325
4.4845 6.5714 7 3.0158
4.5483 7.5714 8 3.0042
4.4145 8.5714 9 2.9926
4.4484 9.5714 10 2.9827
4.3074 10.5714 11 2.9709
4.3609 11.5714 12 2.9587
4.3821 12.5714 13 2.9485
4.386 13.5714 14 2.9399
4.3846 14.5714 15 2.9299
4.3531 15.5714 16 2.9202
4.3193 16.5714 17 2.9091
4.2898 17.5714 18 2.9001
4.3685 18.5714 19 2.8888
4.232 19.5714 20 2.8802
4.2805 20.5714 21 2.8718
4.275 21.5714 22 2.8589
4.2062 22.5714 23 2.8513
4.1492 23.5714 24 2.8427
4.1998 24.5714 25 2.8323
4.1638 25.5714 26 2.8219
4.1229 26.5714 27 2.8149
4.2027 27.5714 28 2.8057
4.1399 28.5714 29 2.7971
4.1457 29.5714 30 2.7907
4.1507 30.5714 31 2.7815
4.0924 31.5714 32 2.7740
4.1176 32.5714 33 2.7660
4.1109 33.5714 34 2.7583
3.9774 34.5714 35 2.7497
4.0628 35.5714 36 2.7429
4.0824 36.5714 37 2.7344
4.0686 37.5714 38 2.7263
4.0403 38.5714 39 2.7191
4.0444 39.5714 40 2.7140
3.9816 40.5714 41 2.7064
3.9371 41.5714 42 2.6999
3.9101 42.5714 43 2.6939
3.9853 43.5714 44 2.6860
3.9293 44.5714 45 2.6800
3.8705 45.5714 46 2.6748
3.9374 46.5714 47 2.6683
3.8989 47.5714 48 2.6611
3.9209 48.5714 49 2.6557
3.8378 49.5714 50 2.6503
3.9311 50.5714 51 2.6434
3.8503 51.5714 52 2.6379
3.7551 52.5714 53 2.6334
3.757 53.5714 54 2.6291
3.8337 54.5714 55 2.6228
3.8533 55.5714 56 2.6176
3.7737 56.5714 57 2.6125
3.7589 57.5714 58 2.6064
3.7929 58.5714 59 2.6018
3.7802 59.5714 60 2.5972
3.824 60.5714 61 2.5932
3.7761 61.5714 62 2.5883
3.7067 62.5714 63 2.5848
3.7647 63.5714 64 2.5791
3.6702 64.5714 65 2.5760
3.7744 65.5714 66 2.5721
3.7251 66.5714 67 2.5674
3.6592 67.5714 68 2.5618
3.8159 68.5714 69 2.5583
3.6529 69.5714 70 2.5554
3.6874 70.5714 71 2.5510
3.6516 71.5714 72 2.5466
3.5826 72.5714 73 2.5438
3.6663 73.5714 74 2.5397
3.6507 74.5714 75 2.5351
3.591 75.5714 76 2.5343
3.6226 76.5714 77 2.5294
3.5843 77.5714 78 2.5260
3.6361 78.5714 79 2.5216
3.5118 79.5714 80 2.5197
3.6315 80.5714 81 2.5154
3.5687 81.5714 82 2.5112
3.5679 82.5714 83 2.5103
3.4985 83.5714 84 2.5059
3.5778 84.5714 85 2.5034
3.5422 85.5714 86 2.5003
3.6483 86.5714 87 2.4969
3.5949 87.5714 88 2.4933
3.5475 88.5714 89 2.4904
3.5944 89.5714 90 2.4861
3.5698 90.5714 91 2.4841
3.5287 91.5714 92 2.4832
3.5029 92.5714 93 2.4792
3.4956 93.5714 94 2.4758
3.5941 94.5714 95 2.4739
3.4637 95.5714 96 2.4710
3.5336 96.5714 97 2.4683
3.4492 97.5714 98 2.4661
3.4548 98.5714 99 2.4624
3.5259 99.5714 100 2.4602

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for brando/tfa_output_2025_m02_d02_t23h_28m_54s

Finetuned
(1361)
this model