Training in progress, step 6188

Browse files

Files changed (5) hide show

README.md +24 -24
logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=0, optim=p/events.out.tfevents.1723839050.5f530b1cf724 +2 -2
logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=1000, opti/events.out.tfevents.1723839267.5f530b1cf724 +3 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 3588.0862
-- eval_frwikippl: 29491.5098
-- eval_zhwikippl: 52398.3594
-- eval_tinystoriesppl: 1160.5111
-- eval_loss: 5.1062
-- eval_runtime: 6.5853
-- eval_samples_per_second: 75.926
-- eval_steps_per_second: 9.567
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -56,29 +56,29 @@ The following hyperparameters were used during training:
 - num_epochs: 1.0
 ### Resource Usage
-Peak GPU Memory: 8.0568 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
-| 0 | 0 | 21321.3555 | 56774.5312 | 6.6010 | 6.5827 | 75.956 | 9.57 | 11289.9248 | 60744.7383 |
-| 500 | 0.0808 | 3639.8804 | 29520.6055 | 5.1093 | 6.602 | 75.735 | 9.543 | 1181.7124 | 52932.3008 |
-| 1000 | 0.1616 | 3605.0818 | 29512.3027 | 5.1083 | 6.5998 | 75.76 | 9.546 | 1168.1150 | 52622.5078 |
-| 1500 | 0.2424 | 3596.4351 | 29524.7734 | 5.1073 | 6.6103 | 75.64 | 9.531 | 1163.0084 | 52566.3789 |
-| 2000 | 0.3232 | 3585.8628 | 29491.5098 | 5.1077 | 6.6062 | 75.686 | 9.536 | 1158.5942 | 52426.3516 |
-| 2500 | 0.4040 | 3586.9744 | 29491.5098 | 5.1077 | 6.6186 | 75.544 | 9.519 | 1159.1688 | 52426.3516 |
-| 3000 | 0.4848 | 3585.8628 | 29491.5098 | 5.1077 | 6.5957 | 75.807 | 9.552 | 1158.2108 | 52398.3594 |
-| 3500 | 0.5656 | 3585.8628 | 29491.5098 | 5.1077 | 6.6105 | 75.638 | 9.53 | 1158.7859 | 52398.3594 |
-| 4000 | 0.6464 | 3585.8628 | 29491.5098 | 5.1077 | 6.6047 | 75.704 | 9.539 | 1158.5942 | 52398.3594 |
-| 4500 | 0.7272 | 3586.9744 | 29491.5098 | 5.1077 | 6.6182 | 75.55 | 9.519 | 1158.9771 | 52398.3594 |
-| 5000 | 0.8080 | 3585.8628 | 29491.5098 | 5.1077 | 6.594 | 75.827 | 9.554 | 1158.5942 | 52398.3594 |
-| 5500 | 0.8888 | 3588.0862 | 29508.1367 | 5.1068 | 6.5974 | 75.787 | 9.549 | 1159.9358 | 52398.3594 |
-| 6000 | 0.9696 | 3588.0862 | 29491.5098 | 5.1062 | 6.5958 | 75.805 | 9.551 | 1160.1277 | 52398.3594 |
-| 6188 | 1.0 | 3588.0862 | 29491.5098 | 5.1062 | 6.5853 | 75.926 | 9.567 | 1160.5111 | 52398.3594 |
 ### Framework versions
 - Distily 0.2.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
-- Datasets 2.20.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 7370.6421
+- eval_frwikippl: 36625.3633
+- eval_zhwikippl: 69136.0938
+- eval_tinystoriesppl: 3403.9065
+- eval_loss: 5.1768
+- eval_runtime: 6.4845
+- eval_samples_per_second: 77.107
+- eval_steps_per_second: 9.715
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 - num_epochs: 1.0
 ### Resource Usage
+Peak GPU Memory: 8.0557 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
+| 0 | 0 | 43423.2812 | 70766.6328 | 6.6982 | 6.5086 | 76.821 | 9.679 | 33276.4844 | 75720.9297 |
+| 500 | 0.0808 | 7652.8320 | 36775.2695 | 5.1768 | 6.4848 | 77.103 | 9.715 | 3608.4768 | 70684.1406 |
+| 1000 | 0.1616 | 7409.5664 | 36723.5039 | 5.1768 | 6.491 | 77.03 | 9.706 | 3437.2712 | 69450.3828 |
+| 1500 | 0.2424 | 7313.7646 | 36645.9766 | 5.1778 | 6.4918 | 77.02 | 9.705 | 3335.3831 | 68878.3828 |
+| 2000 | 0.3232 | 7313.7646 | 36645.9766 | 5.1778 | 6.4851 | 77.099 | 9.715 | 3339.7979 | 68841.6016 |
+| 2500 | 0.4040 | 7354.6680 | 36625.3633 | 5.1772 | 6.49 | 77.042 | 9.707 | 3393.2302 | 69062.3516 |
+| 3000 | 0.4848 | 7388.9336 | 36656.3242 | 5.1762 | 6.5016 | 76.905 | 9.69 | 3415.7463 | 69173.0234 |
+| 3500 | 0.5656 | 7393.5151 | 36676.9883 | 5.1762 | 6.4831 | 77.123 | 9.718 | 3418.0046 | 69173.0234 |
+| 4000 | 0.6464 | 7359.2285 | 36645.9766 | 5.1772 | 6.4881 | 77.064 | 9.71 | 3393.2302 | 69062.3516 |
+| 4500 | 0.7272 | 7320.5684 | 36645.9766 | 5.1772 | 6.486 | 77.089 | 9.713 | 3356.4048 | 69025.5469 |
+| 5000 | 0.8080 | 7320.5684 | 36645.9766 | 5.1772 | 6.6011 | 75.745 | 9.544 | 3351.9680 | 68988.6953 |
+| 5500 | 0.8888 | 7327.3711 | 36625.3633 | 5.1778 | 6.5132 | 76.767 | 9.673 | 3361.9597 | 69025.5469 |
+| 6000 | 0.9696 | 7384.3545 | 36625.3633 | 5.1762 | 6.4883 | 77.062 | 9.71 | 3409.5400 | 69173.0234 |
+| 6188 | 1.0 | 7370.6421 | 36625.3633 | 5.1768 | 6.4845 | 77.107 | 9.715 | 3403.9065 | 69136.0938 |
 ### Framework versions
 - Distily 0.2.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
+- Datasets 2.21.0

logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=0, optim=p/events.out.tfevents.1723839050.5f530b1cf724 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f31f94600fbce10692afe98d1d981038b3e5647d97e7b7a9d771b06cee49dbed
-size 307

 version https://git-lfs.github.com/spec/v1
+oid sha256:701c98471a1b7101f30262a592c7e32461b08db16eff1d905b1cc67268ed24f7
+size 578

logs/attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=0, hs_weight=0, learning_rate=0.0004, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, num_warmup_steps=1000, opti/events.out.tfevents.1723839267.5f530b1cf724 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbb63e796ae75f55df680ff655ee7d6a41b4c0b18ddcbf275072c65549cbcc81
+size 2932929

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b4cd16d7352f58b542ae2b31e986e20aa1ad58363876d0fdb552464f26eff300
 size 137033984

 version https://git-lfs.github.com/spec/v1
+oid sha256:704365b2903f4f9092aa0d2bac61b7186825189c62f717760b29a73900327a4a
 size 137033984

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ae77ed9d5da881c82ec366e8d74e46f1a9fe6f68c6877f4450a9c37640920326
 size 1017948232

 version https://git-lfs.github.com/spec/v1
+oid sha256:e27dca91af41039e47b1d6a0fb0b33c27148d47b49cfa77503c16cc0d5db6bdb
 size 1017948232