dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix

This model is a fine-tuned version of microsoft/dit-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4358
  • Accuracy: 0.195
  • Brier Loss: 0.9035
  • Nll: 12.0550
  • F1 Micro: 0.195
  • F1 Macro: 0.1471
  • Ece: 0.1675
  • Aurc: 0.6988

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss Accuracy Brier Loss Nll F1 Micro F1 Macro Ece Aurc
No log 1.0 25 1.5167 0.07 0.9368 20.8948 0.07 0.0305 0.1106 0.8850
No log 2.0 50 1.5246 0.08 0.9362 21.4368 0.08 0.0346 0.1200 0.8659
No log 3.0 75 1.5053 0.1 0.9340 23.7241 0.1000 0.0522 0.1280 0.8087
No log 4.0 100 1.5097 0.0975 0.9322 17.3004 0.0975 0.0487 0.1220 0.8220
No log 5.0 125 1.4926 0.12 0.9296 16.3893 0.12 0.0600 0.1284 0.7752
No log 6.0 150 1.4838 0.105 0.9273 19.3692 0.1050 0.0356 0.1254 0.7955
No log 7.0 175 1.4729 0.0975 0.9229 18.6899 0.0975 0.0411 0.1134 0.7963
No log 8.0 200 1.4754 0.125 0.9196 17.7842 0.125 0.0676 0.1238 0.7778
No log 9.0 225 1.4725 0.1125 0.9193 16.6572 0.1125 0.0505 0.1254 0.7839
No log 10.0 250 1.4702 0.1175 0.9168 16.3975 0.1175 0.0556 0.1183 0.7638
No log 11.0 275 1.4648 0.1175 0.9169 18.4274 0.1175 0.0558 0.1219 0.7806
No log 12.0 300 1.4660 0.155 0.9166 15.6492 0.155 0.0791 0.1411 0.7512
No log 13.0 325 1.4684 0.16 0.9164 17.1698 0.16 0.1140 0.1519 0.7285
No log 14.0 350 1.4662 0.1175 0.9158 17.6999 0.1175 0.0501 0.1269 0.7637
No log 15.0 375 1.4602 0.1675 0.9143 13.2540 0.1675 0.1153 0.1515 0.7223
No log 16.0 400 1.4556 0.1325 0.9138 13.3868 0.1325 0.0881 0.1323 0.7558
No log 17.0 425 1.4527 0.175 0.9128 11.1983 0.175 0.1334 0.1596 0.7153
No log 18.0 450 1.4535 0.1625 0.9111 17.6046 0.1625 0.1021 0.1435 0.7379
No log 19.0 475 1.4453 0.1825 0.9086 11.8948 0.1825 0.1228 0.1594 0.7098
1.4614 20.0 500 1.4431 0.1525 0.9078 14.2631 0.1525 0.1115 0.1410 0.7293
1.4614 21.0 525 1.4392 0.1825 0.9063 10.7664 0.1825 0.1378 0.1567 0.7058
1.4614 22.0 550 1.4469 0.1775 0.9055 13.4724 0.1775 0.1212 0.1483 0.7107
1.4614 23.0 575 1.4356 0.17 0.9039 11.8141 0.17 0.1232 0.1515 0.7091
1.4614 24.0 600 1.4370 0.1875 0.9039 12.9338 0.1875 0.1384 0.1539 0.7017
1.4614 25.0 625 1.4358 0.195 0.9035 12.0550 0.195 0.1471 0.1675 0.6988

Framework versions

  • Transformers 4.28.0.dev0
  • Pytorch 1.12.1+cu113
  • Datasets 2.12.0
  • Tokenizers 0.12.1
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.