Fine-tuned GPT-2 on Wikitext-2

Browse files

Files changed (4) hide show

README.md +39 -16
model.safetensors +1 -1
runs/Jun30_21-23-50_Delta6112/events.out.tfevents.1719797030.Delta6112.24560.4 +2 -2
runs/Jun30_21-23-50_Delta6112/events.out.tfevents.1719799714.Delta6112.24560.5 +3 -0

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [cuba6112/orion](https://huggingface.co/cuba6112/orion) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.8471
 ## Model description
@@ -34,31 +34,54 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
-- num_epochs: 1
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0871 | 400  | 2.8882          |
-| 2.9006        | 0.1743 | 800  | 2.9229          |
-| 2.6909        | 0.2614 | 1200 | 2.9341          |
-| 2.6634        | 0.3486 | 1600 | 2.9170          |
-| 2.769         | 0.4357 | 2000 | 2.9012          |
-| 2.769         | 0.5229 | 2400 | 2.8874          |
-| 2.8258        | 0.6100 | 2800 | 2.8755          |
-| 2.8313        | 0.6972 | 3200 | 2.8689          |
-| 2.9336        | 0.7843 | 3600 | 2.8605          |
-| 2.9614        | 0.8715 | 4000 | 2.8522          |
-| 2.9614        | 0.9586 | 4400 | 2.8481          |
 ### Framework versions

 This model is a fine-tuned version of [cuba6112/orion](https://huggingface.co/cuba6112/orion) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.9051
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
+- num_epochs: 3
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch  | Step  | Validation Loss |
+|:-------------:|:------:|:-----:|:---------------:|
+| No log        | 0.0871 | 400   | 2.9643          |
+| 2.3368        | 0.1743 | 800   | 3.0703          |
+| 1.9165        | 0.2614 | 1200  | 3.0753          |
+| 1.9558        | 0.3486 | 1600  | 3.0568          |
+| 2.2234        | 0.4357 | 2000  | 2.9665          |
+| 2.2234        | 0.5229 | 2400  | 2.9577          |
+| 2.3365        | 0.6100 | 2800  | 2.9338          |
+| 2.3944        | 0.6972 | 3200  | 2.9116          |
+| 2.5464        | 0.7843 | 3600  | 2.8948          |
+| 2.6256        | 0.8715 | 4000  | 2.8809          |
+| 2.6256        | 0.9586 | 4400  | 2.8674          |
+| 2.7123        | 1.0458 | 4800  | 2.9054          |
+| 2.5281        | 1.1329 | 5200  | 2.9048          |
+| 2.494         | 1.2200 | 5600  | 2.9072          |
+| 2.4786        | 1.3072 | 6000  | 2.9004          |
+| 2.4786        | 1.3943 | 6400  | 2.9019          |
+| 2.4863        | 1.4815 | 6800  | 2.8996          |
+| 2.4702        | 1.5686 | 7200  | 2.8983          |
+| 2.5037        | 1.6558 | 7600  | 2.9061          |
+| 2.4735        | 1.7429 | 8000  | 2.9015          |
+| 2.4735        | 1.8301 | 8400  | 2.8933          |
+| 2.5219        | 1.9172 | 8800  | 2.8995          |
+| 2.4786        | 2.0044 | 9200  | 2.9007          |
+| 2.4405        | 2.0915 | 9600  | 2.9105          |
+| 2.4194        | 2.1786 | 10000 | 2.9117          |
+| 2.4194        | 2.2658 | 10400 | 2.9109          |
+| 2.4449        | 2.3529 | 10800 | 2.9095          |
+| 2.4213        | 2.4401 | 11200 | 2.9069          |
+| 2.4322        | 2.5272 | 11600 | 2.9115          |
+| 2.4498        | 2.6144 | 12000 | 2.9072          |
+| 2.4498        | 2.7015 | 12400 | 2.9053          |
+| 2.4326        | 2.7887 | 12800 | 2.9054          |
+| 2.4407        | 2.8758 | 13200 | 2.9059          |
+| 2.4504        | 2.9630 | 13600 | 2.9053          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7d5c41b82f9cc483019ddbb4e75328a29a1fe216d64a61d47bc9a234f3349b32
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:cdcf918da9bee0cc4b60ad138e3e17a643ba3d0b82d990072fd93c0623999ee1
 size 497774208

runs/Jun30_21-23-50_Delta6112/events.out.tfevents.1719797030.Delta6112.24560.4 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:799abf9f1cb12162c2f61c45f5ed9ad5f1a55b74113242900d95c352bf046e2a
-size 19875

 version https://git-lfs.github.com/spec/v1
+oid sha256:95b55918e14083128b72c2b47536c7fc9f9b386802d27c15ca1cd4b7949c4175
+size 20229

runs/Jun30_21-23-50_Delta6112/events.out.tfevents.1719799714.Delta6112.24560.5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a38f9eb13132da0c9957ea61b8874e37c5975c3c2563e30a60a0ce5d82f1c428
+size 359