dit-small_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix
This model is a fine-tuned version of microsoft/dit-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.8796
- Accuracy: 0.26
- Brier Loss: 0.8768
- Nll: 6.0962
- F1 Micro: 0.26
- F1 Macro: 0.2480
- Ece: 0.2002
- Aurc: 0.5815
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Brier Loss | Nll | F1 Micro | F1 Macro | Ece | Aurc |
---|---|---|---|---|---|---|---|---|---|---|
No log | 1.0 | 7 | 1.5365 | 0.065 | 0.9398 | 10.2864 | 0.065 | 0.0116 | 0.1183 | 0.9536 |
No log | 2.0 | 14 | 1.5332 | 0.06 | 0.9374 | 9.8468 | 0.06 | 0.0269 | 0.1067 | 0.9096 |
No log | 3.0 | 21 | 1.5119 | 0.085 | 0.9352 | 9.1495 | 0.085 | 0.0355 | 0.1135 | 0.8759 |
No log | 4.0 | 28 | 1.5040 | 0.0825 | 0.9333 | 8.6549 | 0.0825 | 0.0439 | 0.1181 | 0.8618 |
No log | 5.0 | 35 | 1.5021 | 0.1 | 0.9301 | 8.9643 | 0.1000 | 0.0558 | 0.1318 | 0.8030 |
No log | 6.0 | 42 | 1.4885 | 0.1 | 0.9276 | 7.8684 | 0.1000 | 0.0505 | 0.1205 | 0.8190 |
No log | 7.0 | 49 | 1.4882 | 0.0975 | 0.9254 | 9.4095 | 0.0975 | 0.0584 | 0.1220 | 0.7847 |
No log | 8.0 | 56 | 1.4909 | 0.1275 | 0.9227 | 9.4274 | 0.1275 | 0.0827 | 0.1335 | 0.7445 |
No log | 9.0 | 63 | 1.4837 | 0.115 | 0.9217 | 10.2918 | 0.115 | 0.0546 | 0.1366 | 0.7932 |
No log | 10.0 | 70 | 1.4857 | 0.1125 | 0.9186 | 9.5039 | 0.1125 | 0.0510 | 0.1277 | 0.7749 |
No log | 11.0 | 77 | 1.4804 | 0.1125 | 0.9183 | 8.5178 | 0.1125 | 0.0515 | 0.1315 | 0.7831 |
No log | 12.0 | 84 | 1.4701 | 0.11 | 0.9177 | 8.2398 | 0.11 | 0.0655 | 0.1310 | 0.7754 |
No log | 13.0 | 91 | 1.4721 | 0.16 | 0.9160 | 7.2379 | 0.16 | 0.1155 | 0.1462 | 0.7370 |
No log | 14.0 | 98 | 1.4717 | 0.11 | 0.9159 | 8.1355 | 0.11 | 0.0633 | 0.1221 | 0.7579 |
No log | 15.0 | 105 | 1.4739 | 0.1325 | 0.9138 | 7.4037 | 0.1325 | 0.0790 | 0.1419 | 0.7358 |
No log | 16.0 | 112 | 1.4657 | 0.1425 | 0.9135 | 7.8063 | 0.1425 | 0.0821 | 0.1285 | 0.7269 |
No log | 17.0 | 119 | 1.4632 | 0.1375 | 0.9112 | 7.8852 | 0.1375 | 0.0948 | 0.1389 | 0.7342 |
No log | 18.0 | 126 | 1.4769 | 0.15 | 0.9081 | 8.5375 | 0.15 | 0.0894 | 0.1399 | 0.7113 |
No log | 19.0 | 133 | 1.4547 | 0.1775 | 0.9045 | 6.4114 | 0.1775 | 0.1174 | 0.1507 | 0.7007 |
No log | 20.0 | 140 | 1.4470 | 0.1725 | 0.9031 | 8.1696 | 0.1725 | 0.1246 | 0.1464 | 0.7079 |
No log | 21.0 | 147 | 1.4615 | 0.19 | 0.9021 | 6.0696 | 0.19 | 0.1390 | 0.1646 | 0.7023 |
No log | 22.0 | 154 | 1.4588 | 0.2 | 0.8996 | 6.0038 | 0.2000 | 0.1384 | 0.1628 | 0.6821 |
No log | 23.0 | 161 | 1.4646 | 0.1525 | 0.8988 | 7.0678 | 0.1525 | 0.1075 | 0.1458 | 0.7000 |
No log | 24.0 | 168 | 1.4491 | 0.2125 | 0.8933 | 5.9276 | 0.2125 | 0.1503 | 0.1533 | 0.6457 |
No log | 25.0 | 175 | 1.4526 | 0.205 | 0.8916 | 7.6108 | 0.205 | 0.1479 | 0.1603 | 0.6676 |
No log | 26.0 | 182 | 1.4510 | 0.17 | 0.8910 | 5.6337 | 0.17 | 0.1333 | 0.1396 | 0.6868 |
No log | 27.0 | 189 | 1.4567 | 0.19 | 0.8850 | 5.2038 | 0.19 | 0.1380 | 0.1637 | 0.6547 |
No log | 28.0 | 196 | 1.4570 | 0.2225 | 0.8846 | 6.5368 | 0.2225 | 0.1840 | 0.1701 | 0.6554 |
No log | 29.0 | 203 | 1.4701 | 0.2075 | 0.8820 | 5.0057 | 0.2075 | 0.1663 | 0.1719 | 0.6598 |
No log | 30.0 | 210 | 1.4693 | 0.2225 | 0.8755 | 7.4456 | 0.2225 | 0.1729 | 0.1626 | 0.6355 |
No log | 31.0 | 217 | 1.4670 | 0.23 | 0.8787 | 5.8938 | 0.23 | 0.1904 | 0.1717 | 0.6424 |
No log | 32.0 | 224 | 1.4540 | 0.2275 | 0.8756 | 6.6513 | 0.2275 | 0.1673 | 0.1676 | 0.6306 |
No log | 33.0 | 231 | 1.4641 | 0.2275 | 0.8649 | 5.5689 | 0.2275 | 0.1751 | 0.1746 | 0.6138 |
No log | 34.0 | 238 | 1.4710 | 0.2425 | 0.8640 | 7.0556 | 0.2425 | 0.1957 | 0.1809 | 0.6048 |
No log | 35.0 | 245 | 1.4685 | 0.23 | 0.8632 | 5.5735 | 0.23 | 0.1940 | 0.1609 | 0.6188 |
No log | 36.0 | 252 | 1.4665 | 0.2375 | 0.8592 | 5.8835 | 0.2375 | 0.1952 | 0.1727 | 0.6050 |
No log | 37.0 | 259 | 1.4668 | 0.235 | 0.8540 | 5.3502 | 0.235 | 0.1966 | 0.1746 | 0.6056 |
No log | 38.0 | 266 | 1.4855 | 0.27 | 0.8510 | 5.3781 | 0.27 | 0.2124 | 0.1692 | 0.5825 |
No log | 39.0 | 273 | 1.5279 | 0.265 | 0.8562 | 6.2426 | 0.265 | 0.2126 | 0.1772 | 0.5831 |
No log | 40.0 | 280 | 1.5433 | 0.2425 | 0.8551 | 5.9574 | 0.2425 | 0.1867 | 0.1499 | 0.5874 |
No log | 41.0 | 287 | 1.5955 | 0.2525 | 0.8597 | 6.1628 | 0.2525 | 0.2024 | 0.1479 | 0.5891 |
No log | 42.0 | 294 | 1.5528 | 0.2475 | 0.8541 | 6.3624 | 0.2475 | 0.1908 | 0.1566 | 0.5735 |
No log | 43.0 | 301 | 1.5858 | 0.2675 | 0.8504 | 6.1261 | 0.2675 | 0.2174 | 0.1706 | 0.5674 |
No log | 44.0 | 308 | 1.6013 | 0.2725 | 0.8496 | 5.8409 | 0.2725 | 0.2463 | 0.1846 | 0.5807 |
No log | 45.0 | 315 | 1.5632 | 0.2625 | 0.8472 | 5.9669 | 0.2625 | 0.2307 | 0.1689 | 0.5689 |
No log | 46.0 | 322 | 1.6520 | 0.2675 | 0.8509 | 5.8544 | 0.2675 | 0.2325 | 0.1779 | 0.5622 |
No log | 47.0 | 329 | 1.6135 | 0.2625 | 0.8476 | 5.5208 | 0.2625 | 0.2504 | 0.1565 | 0.5759 |
No log | 48.0 | 336 | 1.6565 | 0.275 | 0.8466 | 5.9254 | 0.275 | 0.2527 | 0.2026 | 0.5616 |
No log | 49.0 | 343 | 1.6807 | 0.2625 | 0.8531 | 6.1297 | 0.2625 | 0.2259 | 0.1813 | 0.5664 |
No log | 50.0 | 350 | 1.7266 | 0.255 | 0.8560 | 6.0828 | 0.255 | 0.2315 | 0.1817 | 0.5735 |
No log | 51.0 | 357 | 1.7038 | 0.2525 | 0.8579 | 5.6442 | 0.2525 | 0.2405 | 0.1861 | 0.5828 |
No log | 52.0 | 364 | 1.7954 | 0.255 | 0.8583 | 5.7016 | 0.255 | 0.2227 | 0.1722 | 0.5725 |
No log | 53.0 | 371 | 1.7567 | 0.275 | 0.8557 | 6.1586 | 0.275 | 0.2523 | 0.1577 | 0.5619 |
No log | 54.0 | 378 | 1.7589 | 0.2525 | 0.8565 | 5.3969 | 0.2525 | 0.2325 | 0.1840 | 0.5661 |
No log | 55.0 | 385 | 1.7778 | 0.265 | 0.8569 | 5.8559 | 0.265 | 0.2447 | 0.1835 | 0.5640 |
No log | 56.0 | 392 | 1.8044 | 0.275 | 0.8592 | 5.9942 | 0.275 | 0.2517 | 0.1783 | 0.5627 |
No log | 57.0 | 399 | 1.8327 | 0.2625 | 0.8628 | 6.0224 | 0.2625 | 0.2333 | 0.1801 | 0.5560 |
No log | 58.0 | 406 | 1.8184 | 0.25 | 0.8609 | 6.0769 | 0.25 | 0.2333 | 0.1941 | 0.5718 |
No log | 59.0 | 413 | 1.8318 | 0.2575 | 0.8639 | 5.9454 | 0.2575 | 0.2364 | 0.1965 | 0.5743 |
No log | 60.0 | 420 | 1.8081 | 0.2525 | 0.8641 | 6.0119 | 0.2525 | 0.2380 | 0.1818 | 0.5755 |
No log | 61.0 | 427 | 1.8405 | 0.2625 | 0.8775 | 6.2129 | 0.2625 | 0.2474 | 0.1767 | 0.5908 |
No log | 62.0 | 434 | 1.9012 | 0.2625 | 0.8728 | 6.1015 | 0.2625 | 0.2373 | 0.1881 | 0.5716 |
No log | 63.0 | 441 | 1.8500 | 0.26 | 0.8728 | 6.3885 | 0.26 | 0.2414 | 0.1933 | 0.5809 |
No log | 64.0 | 448 | 1.8771 | 0.2675 | 0.8733 | 6.2730 | 0.2675 | 0.2553 | 0.2035 | 0.5800 |
No log | 65.0 | 455 | 1.8744 | 0.2575 | 0.8677 | 5.9805 | 0.2575 | 0.2392 | 0.1918 | 0.5663 |
No log | 66.0 | 462 | 1.8366 | 0.255 | 0.8694 | 6.0073 | 0.255 | 0.2403 | 0.2048 | 0.5807 |
No log | 67.0 | 469 | 1.8758 | 0.2575 | 0.8743 | 6.1015 | 0.2575 | 0.2381 | 0.2071 | 0.5825 |
No log | 68.0 | 476 | 1.8796 | 0.2675 | 0.8711 | 5.9457 | 0.2675 | 0.2470 | 0.2100 | 0.5737 |
No log | 69.0 | 483 | 1.8635 | 0.2675 | 0.8721 | 5.9312 | 0.2675 | 0.2493 | 0.1788 | 0.5751 |
No log | 70.0 | 490 | 1.8801 | 0.2625 | 0.8710 | 5.9629 | 0.2625 | 0.2467 | 0.1974 | 0.5721 |
No log | 71.0 | 497 | 1.8936 | 0.26 | 0.8791 | 6.0358 | 0.26 | 0.2481 | 0.1922 | 0.5844 |
0.9216 | 72.0 | 504 | 1.8736 | 0.275 | 0.8715 | 6.0493 | 0.275 | 0.2569 | 0.2099 | 0.5710 |
0.9216 | 73.0 | 511 | 1.8784 | 0.2525 | 0.8760 | 6.1441 | 0.2525 | 0.2401 | 0.1978 | 0.5849 |
0.9216 | 74.0 | 518 | 1.8843 | 0.2725 | 0.8763 | 6.1948 | 0.2725 | 0.2533 | 0.2007 | 0.5801 |
0.9216 | 75.0 | 525 | 1.8785 | 0.2675 | 0.8784 | 5.9868 | 0.2675 | 0.2578 | 0.1975 | 0.5851 |
0.9216 | 76.0 | 532 | 1.8812 | 0.275 | 0.8725 | 5.9367 | 0.275 | 0.2594 | 0.2037 | 0.5744 |
0.9216 | 77.0 | 539 | 1.8956 | 0.27 | 0.8746 | 5.9038 | 0.27 | 0.2541 | 0.1816 | 0.5738 |
0.9216 | 78.0 | 546 | 1.8897 | 0.265 | 0.8802 | 5.9763 | 0.265 | 0.2493 | 0.2098 | 0.5866 |
0.9216 | 79.0 | 553 | 1.8728 | 0.275 | 0.8752 | 6.0806 | 0.275 | 0.2623 | 0.1874 | 0.5794 |
0.9216 | 80.0 | 560 | 1.8887 | 0.2725 | 0.8759 | 6.2762 | 0.2725 | 0.2520 | 0.2005 | 0.5768 |
0.9216 | 81.0 | 567 | 1.8987 | 0.2725 | 0.8787 | 6.2444 | 0.2725 | 0.2587 | 0.2183 | 0.5773 |
0.9216 | 82.0 | 574 | 1.8759 | 0.2625 | 0.8773 | 6.1643 | 0.2625 | 0.2541 | 0.1922 | 0.5805 |
0.9216 | 83.0 | 581 | 1.8766 | 0.27 | 0.8748 | 6.0036 | 0.27 | 0.2554 | 0.1784 | 0.5762 |
0.9216 | 84.0 | 588 | 1.8809 | 0.2625 | 0.8764 | 6.0488 | 0.2625 | 0.2469 | 0.2030 | 0.5833 |
0.9216 | 85.0 | 595 | 1.8982 | 0.26 | 0.8775 | 6.0747 | 0.26 | 0.2453 | 0.1998 | 0.5851 |
0.9216 | 86.0 | 602 | 1.8912 | 0.27 | 0.8798 | 6.1894 | 0.27 | 0.2566 | 0.1938 | 0.5839 |
0.9216 | 87.0 | 609 | 1.8847 | 0.2775 | 0.8769 | 6.2744 | 0.2775 | 0.2643 | 0.2019 | 0.5775 |
0.9216 | 88.0 | 616 | 1.8734 | 0.265 | 0.8741 | 6.1928 | 0.265 | 0.2526 | 0.1763 | 0.5820 |
0.9216 | 89.0 | 623 | 1.8760 | 0.2725 | 0.8768 | 6.0274 | 0.2725 | 0.2620 | 0.2039 | 0.5792 |
0.9216 | 90.0 | 630 | 1.8860 | 0.265 | 0.8771 | 6.0912 | 0.265 | 0.2518 | 0.1924 | 0.5810 |
0.9216 | 91.0 | 637 | 1.8865 | 0.2625 | 0.8750 | 6.2350 | 0.2625 | 0.2476 | 0.1844 | 0.5791 |
0.9216 | 92.0 | 644 | 1.8815 | 0.2725 | 0.8733 | 6.0962 | 0.2725 | 0.2563 | 0.2013 | 0.5721 |
0.9216 | 93.0 | 651 | 1.8794 | 0.27 | 0.8756 | 6.2535 | 0.27 | 0.2562 | 0.2028 | 0.5764 |
0.9216 | 94.0 | 658 | 1.8835 | 0.2675 | 0.8769 | 6.2039 | 0.2675 | 0.2562 | 0.1928 | 0.5773 |
0.9216 | 95.0 | 665 | 1.8904 | 0.27 | 0.8786 | 6.1504 | 0.27 | 0.2543 | 0.2034 | 0.5768 |
0.9216 | 96.0 | 672 | 1.8911 | 0.26 | 0.8788 | 6.1527 | 0.26 | 0.2465 | 0.2025 | 0.5829 |
0.9216 | 97.0 | 679 | 1.8871 | 0.265 | 0.8776 | 6.0994 | 0.265 | 0.2519 | 0.2126 | 0.5794 |
0.9216 | 98.0 | 686 | 1.8825 | 0.265 | 0.8769 | 6.1564 | 0.265 | 0.2516 | 0.1987 | 0.5776 |
0.9216 | 99.0 | 693 | 1.8803 | 0.2675 | 0.8766 | 6.1183 | 0.2675 | 0.2561 | 0.2095 | 0.5798 |
0.9216 | 100.0 | 700 | 1.8796 | 0.26 | 0.8768 | 6.0962 | 0.26 | 0.2480 | 0.2002 | 0.5815 |
Framework versions
- Transformers 4.26.1
- Pytorch 1.13.1.post200
- Datasets 2.9.0
- Tokenizers 0.13.2
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.