distily_bench_obj_cross_v2.12_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 665.9925
  • eval_frwikippl: 995.4457
  • eval_zhwikippl: 405.3946
  • eval_tinystoriesppl: 1100.5725
  • eval_loss: 1.3024
  • eval_runtime: 12.5753
  • eval_samples_per_second: 47.713
  • eval_steps_per_second: 11.928

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9293 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 147374.6094 4251118206976.0 19.8108 12.6652 47.374 11.843 74.6838 6171058503680.0
1500 0.0253 1012.5726 4501.9321 2.2064 12.5479 47.817 11.954 1084.7205 39061.2969
3000 0.0505 761.3547 2880.7776 1.7218 12.6141 47.566 11.891 932.5889 1552.8525
4500 0.0758 682.1792 1444.0309 1.5343 12.6458 47.447 11.862 963.2644 421.1599
6000 0.1010 673.6849 1216.2458 1.4424 12.6927 47.271 11.818 1035.7787 983.8034
7500 0.1263 630.5226 924.8793 1.3688 12.561 47.767 11.942 971.2607 351.8923
9000 0.1515 665.9925 995.4457 1.3024 12.5753 47.713 11.928 1100.5725 405.3946
10500 0.1768 649.4595 870.4929 1.2363 12.5912 47.652 11.913 1147.8689 379.8699
12000 0.2020 552.0709 756.2815 1.1687 12.5514 47.804 11.951 915.4786 247.3208
13500 0.2273 574.5076 775.2103 1.1446 12.6584 47.399 11.85 1022.3383 258.0553
15000 0.2525 570.0630 872.7639 1.1033 12.573 47.721 11.93 1034.7090 205.1337
16500 0.2778 524.1483 695.0405 1.0708 12.5445 47.83 11.957 960.6801 179.8155
18000 0.3030 558.0261 722.4153 1.0562 12.6414 47.463 11.866 1092.5500 238.2534
19500 0.3283 535.8491 646.8846 1.0133 12.5343 47.869 11.967 1038.2650 224.3871
21000 0.3535 498.7090 643.3860 0.9866 12.6044 47.602 11.901 945.8655 325.0199
22500 0.3788 501.5469 612.7169 0.9680 12.5367 47.86 11.965 979.3635 253.6864
24000 0.4040 376.6320 629.0483 0.9542 12.5557 47.787 11.947 639.3351 209.0216
25500 0.4293 481.3532 705.2970 0.9196 12.6849 47.3 11.825 966.3749 375.7875
27000 0.4545 459.1099 522.3182 0.8577 12.5747 47.715 11.929 958.1420 189.4054
28500 0.4798 413.4502 431.4271 0.7560 12.5416 47.841 11.96 891.3210 176.5119
30000 0.5051 403.5616 415.3713 0.7195 12.548 47.817 11.954 882.3771 152.6556
31500 0.5303 406.3142 383.7035 0.7008 12.7238 47.156 11.789 912.3057 155.9905
33000 0.5556 424.4844 373.8076 0.6957 12.5614 47.765 11.941 974.8803 171.0759
34500 0.5808 403.1555 398.5213 0.6867 12.5658 47.748 11.937 913.2111 178.8704
36000 0.6061 399.7424 356.4906 0.6771 12.5757 47.711 11.928 904.7578 169.4632
37500 0.6313 398.5905 372.6379 0.6750 12.652 47.423 11.856 912.7961 158.8251
39000 0.6566 392.1436 371.0796 0.6723 12.6742 47.34 11.835 882.8148 176.4061
40500 0.6818 393.4750 371.6812 0.6672 12.6703 47.355 11.839 901.9575 134.3779
42000 0.7071 399.2395 357.3452 0.6651 12.6545 47.414 11.853 913.0604 135.6295
43500 0.7323 391.1350 370.6879 0.6558 12.6748 47.338 11.834 896.4939 156.0113
45000 0.7576 382.1500 345.0898 0.6354 12.6893 47.284 11.821 884.7507 140.7350
46500 0.7828 379.9360 334.1126 0.6281 12.6503 47.43 11.857 877.5396 127.1069
48000 0.8081 379.3625 342.2339 0.6241 12.6749 47.338 11.834 882.8514 128.6507
49500 0.8333 379.1130 333.6659 0.6222 12.6951 47.262 11.816 881.2473 125.1969
51000 0.8586 378.2769 332.6569 0.6217 12.6252 47.524 11.881 883.0703 128.0856
52500 0.8838 377.0043 335.4331 0.6182 12.6655 47.373 11.843 880.3371 128.4364
54000 0.9091 376.5811 333.1023 0.6165 12.6459 47.446 11.862 877.0681 129.0633
55500 0.9343 377.9547 333.2431 0.6157 12.6412 47.464 11.866 883.1432 127.1832
57000 0.9596 378.2183 332.4462 0.6147 12.6477 47.439 11.86 884.0200 126.3209
58500 0.9848 377.9839 333.1023 0.6146 12.6522 47.422 11.856 883.7274 126.2198
59400 1.0 378.0425 333.0085 0.6147 12.651 47.427 11.857 883.7274 126.2198

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
8
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.12_gpt2

Quantized
(53)
this model