categorization-finetuned-20220721-164940-distilled-20220811-132317
This model is a fine-tuned version of carted-nlp/categorization-finetuned-20220721-164940 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1522
- Accuracy: 0.8783
- F1: 0.8779
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 64
- eval_batch_size: 128
- seed: 314
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2000
- num_epochs: 30.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
0.5212 | 0.56 | 2500 | 0.2564 | 0.7953 | 0.7921 |
0.243 | 1.12 | 5000 | 0.2110 | 0.8270 | 0.8249 |
0.2105 | 1.69 | 7500 | 0.1925 | 0.8409 | 0.8391 |
0.1939 | 2.25 | 10000 | 0.1837 | 0.8476 | 0.8465 |
0.1838 | 2.81 | 12500 | 0.1771 | 0.8528 | 0.8517 |
0.1729 | 3.37 | 15000 | 0.1722 | 0.8564 | 0.8555 |
0.1687 | 3.94 | 17500 | 0.1684 | 0.8593 | 0.8576 |
0.1602 | 4.5 | 20000 | 0.1653 | 0.8614 | 0.8604 |
0.1572 | 5.06 | 22500 | 0.1629 | 0.8648 | 0.8638 |
0.1507 | 5.62 | 25000 | 0.1605 | 0.8654 | 0.8646 |
0.1483 | 6.19 | 27500 | 0.1602 | 0.8661 | 0.8653 |
0.1431 | 6.75 | 30000 | 0.1597 | 0.8669 | 0.8663 |
0.1393 | 7.31 | 32500 | 0.1581 | 0.8691 | 0.8687 |
0.1374 | 7.87 | 35000 | 0.1556 | 0.8704 | 0.8697 |
0.1321 | 8.43 | 37500 | 0.1558 | 0.8707 | 0.8700 |
0.1328 | 9.0 | 40000 | 0.1536 | 0.8719 | 0.8711 |
0.1261 | 9.56 | 42500 | 0.1544 | 0.8716 | 0.8708 |
0.1256 | 10.12 | 45000 | 0.1541 | 0.8731 | 0.8725 |
0.122 | 10.68 | 47500 | 0.1520 | 0.8741 | 0.8734 |
0.1196 | 11.25 | 50000 | 0.1529 | 0.8734 | 0.8728 |
0.1182 | 11.81 | 52500 | 0.1510 | 0.8758 | 0.8751 |
0.1145 | 12.37 | 55000 | 0.1526 | 0.8746 | 0.8737 |
0.1141 | 12.93 | 57500 | 0.1512 | 0.8765 | 0.8759 |
0.1094 | 13.5 | 60000 | 0.1517 | 0.8760 | 0.8753 |
0.1098 | 14.06 | 62500 | 0.1513 | 0.8771 | 0.8764 |
0.1058 | 14.62 | 65000 | 0.1506 | 0.8775 | 0.8768 |
0.1048 | 15.18 | 67500 | 0.1521 | 0.8774 | 0.8768 |
0.1028 | 15.74 | 70000 | 0.1520 | 0.8778 | 0.8773 |
0.1006 | 16.31 | 72500 | 0.1517 | 0.8780 | 0.8774 |
0.1001 | 16.87 | 75000 | 0.1505 | 0.8794 | 0.8790 |
0.0971 | 17.43 | 77500 | 0.1520 | 0.8784 | 0.8778 |
0.0973 | 17.99 | 80000 | 0.1514 | 0.8796 | 0.8790 |
0.0938 | 18.56 | 82500 | 0.1516 | 0.8795 | 0.8789 |
0.0942 | 19.12 | 85000 | 0.1522 | 0.8794 | 0.8789 |
0.0918 | 19.68 | 87500 | 0.1518 | 0.8799 | 0.8793 |
0.0909 | 20.24 | 90000 | 0.1528 | 0.8803 | 0.8796 |
0.0901 | 20.81 | 92500 | 0.1516 | 0.8799 | 0.8793 |
0.0882 | 21.37 | 95000 | 0.1519 | 0.8800 | 0.8794 |
0.088 | 21.93 | 97500 | 0.1517 | 0.8802 | 0.8798 |
0.086 | 22.49 | 100000 | 0.1530 | 0.8800 | 0.8795 |
0.0861 | 23.05 | 102500 | 0.1523 | 0.8806 | 0.8801 |
0.0846 | 23.62 | 105000 | 0.1524 | 0.8808 | 0.8802 |
0.0843 | 24.18 | 107500 | 0.1522 | 0.8805 | 0.8800 |
0.0836 | 24.74 | 110000 | 0.1525 | 0.8808 | 0.8803 |
0.083 | 25.3 | 112500 | 0.1528 | 0.8810 | 0.8803 |
0.0829 | 25.87 | 115000 | 0.1528 | 0.8808 | 0.8802 |
0.082 | 26.43 | 117500 | 0.1529 | 0.8808 | 0.8802 |
0.0818 | 26.99 | 120000 | 0.1525 | 0.8811 | 0.8805 |
0.0816 | 27.55 | 122500 | 0.1526 | 0.8811 | 0.8806 |
0.0809 | 28.12 | 125000 | 0.1528 | 0.8810 | 0.8805 |
0.0809 | 28.68 | 127500 | 0.1527 | 0.8810 | 0.8804 |
0.0814 | 29.24 | 130000 | 0.1528 | 0.8808 | 0.8802 |
0.0807 | 29.8 | 132500 | 0.1528 | 0.8808 | 0.8802 |
Framework versions
- Transformers 4.17.0
- Pytorch 1.11.0+cu113
- Datasets 2.3.2
- Tokenizers 0.11.6
- Downloads last month
- 22
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.