pixel-tiny-cont

This model was trained from scratch on the wikipedia + bookcorpus dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 250000

Training results

Training Loss Epoch Step Validation Loss
0.7411 0.06 1000 0.9070
0.7395 0.12 2000 0.9064
0.7387 0.18 3000 0.9047
0.7382 0.25 4000 0.9015
0.7381 0.31 5000 0.9044
0.7379 0.37 6000 0.9042
0.7379 0.43 7000 0.9054
0.7378 0.49 8000 0.9035
0.7378 0.55 9000 0.9026
0.7371 0.61 10000 0.9038
0.7369 0.67 11000 0.9027
0.7368 0.74 12000 0.9022
0.7368 0.8 13000 0.8987
0.7374 0.86 14000 0.9014
0.7369 0.92 15000 0.9002
0.7369 0.98 16000 0.9002
0.7372 1.04 17000 0.9019
0.737 1.1 18000 0.9001
0.737 1.16 19000 0.9006
0.7369 1.23 20000 0.9007
0.7365 1.29 21000 0.8698
0.7363 1.35 22000 0.8700
0.7366 1.41 23000 0.9021
0.7362 1.47 24000 0.8763
0.7082 1.53 25000 0.8719
0.6774 1.59 26000 0.8876
0.6525 1.65 27000 0.8905
0.6022 1.72 28000 0.8856
0.5874 1.78 29000 0.8794
0.5765 1.84 30000 0.8806
0.5685 1.9 31000 0.8747
0.564 1.96 32000 0.8779
0.5606 2.02 33000 0.8762
0.5574 2.08 34000 0.8703
0.5528 2.14 35000 0.8664
0.5494 2.21 36000 0.8717
0.5448 2.27 37000 0.8673
0.5419 2.33 38000 0.8637
0.5385 2.39 39000 0.8634
0.536 2.45 40000 0.8661
0.5336 2.51 41000 0.8631
0.5316 2.57 42000 0.8606
0.5297 2.63 43000 0.8589
0.5305 2.7 44000 0.8570
0.5262 2.76 45000 0.8559
0.5247 2.82 46000 0.8634
0.5235 2.88 47000 0.8606
0.5227 2.94 48000 0.8610
0.5206 3.0 49000 0.8610
0.5194 3.06 50000 0.8611
0.5183 3.12 51000 0.8579
0.5175 3.19 52000 0.8598
0.5163 3.25 53000 0.8521
0.5156 3.31 54000 0.8550
0.5148 3.37 55000 0.8504
0.5139 3.43 56000 0.8530
0.5133 3.49 57000 0.8589
0.5126 3.55 58000 0.8561
0.5119 3.62 59000 0.8574
0.5127 3.68 60000 0.8624
0.5105 3.74 61000 0.8522
0.5099 3.8 62000 0.8550
0.5094 3.86 63000 0.8537
0.509 3.92 64000 0.8535
0.5091 3.98 65000 0.8592
0.5079 4.04 66000 0.8554
0.5074 4.11 67000 0.8516
0.5069 4.17 68000 0.8491
0.5066 4.23 69000 0.8571
0.5068 4.29 70000 0.8536
0.5066 4.35 71000 0.9288
0.5051 4.41 72000 0.8597
0.5045 4.47 73000 0.8555
0.5043 4.53 74000 0.8547
0.5039 4.6 75000 0.8561
0.504 4.66 76000 0.8541
0.5026 4.72 77000 0.8490
0.5024 4.78 78000 0.8499
0.5019 4.84 79000 0.8522
0.5014 4.9 80000 0.8508
0.5008 4.96 81000 0.8512
0.5002 5.02 82000 0.8470
0.4995 5.09 83000 0.8462
0.4991 5.15 84000 0.8455
0.4982 5.21 85000 0.8465
0.4978 5.27 86000 0.8434
0.4969 5.33 87000 0.8432
0.4964 5.39 88000 0.8417
0.4957 5.45 89000 0.8363
0.495 5.51 90000 0.8392
0.4946 5.58 91000 0.8401
0.4935 5.64 92000 0.8373
0.4929 5.7 93000 0.8401
0.492 5.76 94000 0.8356
0.4912 5.82 95000 0.8334
0.4904 5.88 96000 0.8281
0.4898 5.94 97000 0.8338
0.4891 6.0 98000 0.8300
0.4882 6.07 99000 0.8262
0.4876 6.13 100000 0.8172
0.4868 6.19 101000 0.8240
0.4861 6.25 102000 0.8212
0.4854 6.31 103000 0.8243
0.4847 6.37 104000 0.8228
0.4841 6.43 105000 0.8185
0.4837 6.5 106000 0.8177
0.4827 6.56 107000 0.8140
0.4819 6.62 108000 0.8147
0.4813 6.68 109000 0.8172
0.4807 6.74 110000 0.8149
0.4801 6.8 111000 0.8152
0.4792 6.86 112000 0.8089
0.4785 6.92 113000 0.8084
0.4777 6.99 114000 0.8103
0.477 7.05 115000 0.8104
0.4772 7.11 116000 0.8142
0.4754 7.17 117000 0.8159
0.4748 7.23 118000 0.8092
0.4738 7.29 119000 0.8036
0.473 7.35 120000 0.8085
0.4724 7.41 121000 0.8084
0.4714 7.48 122000 0.8066
0.4705 7.54 123000 0.8094
0.4699 7.6 124000 0.8095
0.4693 7.66 125000 0.8101
0.4685 7.72 126000 0.8092
0.4679 7.78 127000 0.8025
0.4672 7.84 128000 0.8000
0.4665 7.9 129000 0.8020
0.4659 7.97 130000 0.8022
0.4653 8.03 131000 0.8071
0.4647 8.09 132000 0.7994
0.4639 8.15 133000 0.8034
0.4634 8.21 134000 0.8022
0.4656 8.27 135000 0.8052
0.4623 8.33 136000 0.7989
0.4617 8.39 137000 0.7993
0.4612 8.46 138000 0.8003
0.4608 8.52 139000 0.7990
0.4603 8.58 140000 0.8074
0.4597 8.64 141000 0.8089
0.4591 8.7 142000 0.8040
0.4586 8.76 143000 0.7993
0.4584 8.82 144000 0.8004
0.4594 8.88 145000 0.7991
0.4574 8.95 146000 0.7956
0.4571 9.01 147000 0.7948
0.4565 9.07 148000 0.7982
0.4563 9.13 149000 0.7960
0.4555 9.19 150000 0.8043
0.4551 9.25 151000 0.8021
0.4549 9.31 152000 0.7972
0.4545 9.38 153000 0.8003
0.4542 9.44 154000 0.8000
0.4539 9.5 155000 0.7960
0.4533 9.56 156000 0.8035
0.453 9.62 157000 0.7953
0.4527 9.68 158000 0.7937
0.4524 9.74 159000 0.8021
0.4519 9.8 160000 0.8028
0.4517 9.87 161000 0.8006
0.4514 9.93 162000 0.8067
0.4512 9.99 163000 0.7990
0.4508 10.05 164000 0.8041
0.4504 10.11 165000 0.7995
0.4501 10.17 166000 0.7979
0.4499 10.23 167000 0.7969
0.4497 10.29 168000 0.8041
0.4495 10.36 169000 0.8050
0.4492 10.42 170000 0.7999
0.4494 10.48 171000 0.7992
0.4486 10.54 172000 0.8019
0.4485 10.6 173000 0.8026
0.4483 10.66 174000 0.8009
0.448 10.72 175000 0.8022
0.4479 10.78 176000 0.8016
0.4476 10.85 177000 0.7988
0.4474 10.91 178000 0.8025
0.4471 10.97 179000 0.8035
0.4471 11.03 180000 0.7983
0.4467 11.09 181000 0.8010
0.4463 11.15 182000 0.8035
0.4463 11.21 183000 0.8049
0.4462 11.27 184000 0.7998
0.4459 11.34 185000 0.7988
0.4457 11.4 186000 0.8064
0.4456 11.46 187000 0.8042
0.4454 11.52 188000 0.7998
0.4453 11.58 189000 0.8026
0.4449 11.64 190000 0.7993
0.4448 11.7 191000 0.8037
0.4448 11.76 192000 0.8038
0.4445 11.83 193000 0.8010
0.4442 11.89 194000 0.7977
0.4443 11.95 195000 0.8008
0.4441 12.01 196000 0.8048
0.4439 12.07 197000 0.8034
0.4438 12.13 198000 0.8052
0.4437 12.19 199000 0.8041
0.4434 12.25 200000 0.8001
0.4434 12.32 201000 0.8013
0.4432 12.38 202000 0.7987
0.443 12.44 203000 0.7962
0.443 12.5 204000 0.8017
0.4429 12.56 205000 0.7996
0.4428 12.62 206000 0.7997
0.4425 12.68 207000 0.8017
0.4424 12.75 208000 0.8008
0.4424 12.81 209000 0.8052
0.4422 12.87 210000 0.8004
0.4421 12.93 211000 0.8023
0.4421 12.99 212000 0.8014
0.442 13.05 213000 0.7999
0.4418 13.11 214000 0.8019
0.4417 13.17 215000 0.7996
0.4416 13.24 216000 0.8007
0.4414 13.3 217000 0.8029
0.4415 13.36 218000 0.7990
0.4413 13.42 219000 0.7997
0.4413 13.48 220000 0.7997
0.4412 13.54 221000 0.7996
0.4411 13.6 222000 0.8003
0.4411 13.66 223000 0.7993
0.4411 13.73 224000 0.8005
0.4409 13.79 225000 0.8013
0.4409 13.85 226000 0.8016
0.4409 13.91 227000 0.7994
0.4408 13.97 228000 0.8023
0.4407 14.03 229000 0.8013
0.4406 14.09 230000 0.8038
0.4408 14.15 231000 0.7994
0.4406 14.22 232000 0.8007
0.4404 14.28 233000 0.8006
0.4403 14.34 234000 0.7987
0.4405 14.4 235000 0.8010
0.4404 14.46 236000 0.7982
0.4404 14.52 237000 0.7985
0.4403 14.58 238000 0.8016
0.4402 14.64 239000 0.8025
0.4402 14.71 240000 0.8020
0.4401 14.77 241000 0.8009
0.4401 14.83 242000 0.8015
0.4401 14.89 243000 0.8010
0.44 14.95 244000 0.7996
0.4402 15.01 245000 0.8014
0.44 15.07 246000 0.8007
0.44 15.13 247000 0.7984
0.44 15.2 248000 0.8009
0.4399 15.26 249000 0.8006
0.4399 15.32 250000 0.8016

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.11.0
  • Datasets 2.1.1.dev0
  • Tokenizers 0.12.1
Downloads last month
16
Inference API
Unable to determine this model’s pipeline type. Check the docs .