distilgpt_new_0100

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Train Loss: 1.0286
  • Validation Loss: 0.9952
  • Epoch: 99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: float32

Training results

Train Loss Validation Loss Epoch
3.5889 2.6197 0
2.4784 2.2040 1
2.1855 1.9980 2
2.0181 1.8643 3
1.9031 1.7652 4
1.8166 1.6924 5
1.7467 1.6360 6
1.6904 1.5843 7
1.6430 1.5421 8
1.6021 1.5059 9
1.5668 1.4761 10
1.5359 1.4481 11
1.5071 1.4220 12
1.4841 1.4020 13
1.4608 1.3797 14
1.4399 1.3595 15
1.4213 1.3426 16
1.4031 1.3266 17
1.3875 1.3113 18
1.3735 1.3024 19
1.3600 1.2871 20
1.3456 1.2753 21
1.3336 1.2648 22
1.3214 1.2539 23
1.3103 1.2451 24
1.3005 1.2335 25
1.2905 1.2258 26
1.2815 1.2179 27
1.2728 1.2123 28
1.2643 1.2029 29
1.2564 1.1980 30
1.2494 1.1877 31
1.2414 1.1806 32
1.2348 1.1788 33
1.2290 1.1699 34
1.2209 1.1654 35
1.2156 1.1575 36
1.2110 1.1537 37
1.2046 1.1499 38
1.1986 1.1436 39
1.1940 1.1408 40
1.1877 1.1356 41
1.1830 1.1314 42
1.1779 1.1278 43
1.1737 1.1211 44
1.1692 1.1192 45
1.1647 1.1163 46
1.1611 1.1107 47
1.1560 1.1066 48
1.1521 1.1060 49
1.1489 1.1002 50
1.1440 1.0960 51
1.1406 1.0931 52
1.1373 1.0897 53
1.1329 1.0855 54
1.1302 1.0842 55
1.1265 1.0818 56
1.1237 1.0784 57
1.1204 1.0737 58
1.1173 1.0714 59
1.1140 1.0694 60
1.1112 1.0691 61
1.1083 1.0668 62
1.1044 1.0611 63
1.1027 1.0607 64
1.0990 1.0586 65
1.0969 1.0545 66
1.0944 1.0522 67
1.0921 1.0517 68
1.0891 1.0496 69
1.0862 1.0457 70
1.0828 1.0448 71
1.0824 1.0439 72
1.0793 1.0389 73
1.0769 1.0375 74
1.0740 1.0362 75
1.0717 1.0358 76
1.0700 1.0299 77
1.0675 1.0312 78
1.0639 1.0288 79
1.0643 1.0270 80
1.0607 1.0258 81
1.0602 1.0233 82
1.0568 1.0225 83
1.0557 1.0198 84
1.0534 1.0179 85
1.0512 1.0165 86
1.0495 1.0170 87
1.0478 1.0124 88
1.0458 1.0134 89
1.0439 1.0104 90
1.0418 1.0092 91
1.0401 1.0057 92
1.0377 1.0035 93
1.0370 1.0037 94
1.0345 1.0029 95
1.0339 1.0014 96
1.0322 1.0016 97
1.0296 0.9986 98
1.0286 0.9952 99

Framework versions

  • Transformers 4.20.1
  • TensorFlow 2.8.2
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.