lapp0's picture
End of training
a4cd357 verified
|
raw
history blame
3.13 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_batch_size
    results: []

distily_bench_gpt2_batch_size

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 929.2385
  • eval_frwikippl: 4940.1445
  • eval_zhwikippl: 14653.9951
  • eval_loss: 7394.7842
  • eval_runtime: 22.0822
  • eval_samples_per_second: 45.285
  • eval_steps_per_second: 11.321

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: <distily.objectives.LegacyObjective object at 0x7f7eb07a3700>
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 16
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 15.7299 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2385 57.2728 18.1772
0 0 57512.4023 57507.9609 331030.5312 21.9079 45.646 11.411 56939.4805
500 0.0808 3373.4189 12600.2275 12434.4316 21.4223 46.68 11.67 43231.4297
1000 0.1616 2185.7129 8084.5122 10273.4082 21.7859 45.901 11.475 31395.9453
1500 0.2424 1793.0647 6660.5288 9647.7441 21.3865 46.759 11.69 23173.5977
2000 0.3232 1548.9072 6412.6167 9225.7920 21.433 46.657 11.664 18881.2090
2500 0.4040 1398.5220 5857.9961 8856.8965 21.3942 46.742 11.685 15137.3125
3000 0.4848 1300.4033 5470.7554 8459.7764 21.5101 46.49 11.622 18107.0938
3500 0.5656 1197.4879 5503.0571 8287.1680 21.4532 46.613 11.653 17759.8516
4000 0.6464 1128.0508 5481.5659 8032.7041 21.4263 46.672 11.668 16724.0586
4500 0.7272 1077.5973 5100.6572 7940.9922 21.4997 46.512 11.628 16846.2285
5000 0.8080 1003.7661 5090.5928 7673.4722 22.0409 45.37 11.343 13699.2832
5500 0.8888 983.7222 4890.5869 7606.2720 21.916 45.629 11.407 15087.8770
6000 0.9696 936.5551 4860.5083 7409.6958 21.5036 46.504 11.626 13386.4326
6188 1.0 929.2385 4940.1445 7394.7842 22.0822 45.285 11.321 14653.9951

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0