lukasgrouleff/shawgpt-ft-lr1e-4

Files changed (3) hide show

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.4815
 ## Model description
@@ -35,30 +35,32 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 3.2757        | 1.0   | 13   | 2.0919          |
-| 1.4655        | 2.0   | 26   | 1.2813          |
-| 1.083         | 3.0   | 39   | 1.2278          |
-| 0.955         | 4.0   | 52   | 1.2262          |
-| 0.8728        | 5.0   | 65   | 1.2411          |
-| 0.7577        | 6.0   | 78   | 1.2721          |
-| 0.6645        | 7.0   | 91   | 1.3039          |
-| 0.6134        | 8.0   | 104  | 1.4319          |
-| 0.5539        | 9.0   | 117  | 1.4250          |
-| 0.5064        | 10.0  | 130  | 1.4815          |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.3064
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- num_epochs: 12
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 3.6686        | 1.0   | 13   | 2.8302          |
+| 2.3256        | 2.0   | 26   | 1.6927          |
+| 1.3394        | 3.0   | 39   | 1.3108          |
+| 1.1095        | 4.0   | 52   | 1.2532          |
+| 1.0687        | 5.0   | 65   | 1.2385          |
+| 0.9773        | 6.0   | 78   | 1.2419          |
+| 0.9038        | 7.0   | 91   | 1.2381          |
+| 0.883         | 8.0   | 104  | 1.2653          |
+| 0.8353        | 9.0   | 117  | 1.2638          |
+| 0.7847        | 10.0  | 130  | 1.2803          |
+| 0.7793        | 11.0  | 143  | 1.3103          |
+| 0.7277        | 12.0  | 156  | 1.3064          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f37c668bdb9a10cd49a1e281da9e564e5c3246d3111894d38fa83b67417389df
 size 13650608

 version https://git-lfs.github.com/spec/v1
+oid sha256:5f1ca2c516f486690d220c674e3c37eaee734999f4a1f4b7091b1b5140e472a3
 size 13650608

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:086d2c0e5102ad0aef3809c960a797cab9aa54ab34c46de47a5efe0c8c298e75
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:ada0e7f1784c49223a31be4084a520694c2946f79b871a485c7f21e110a0464a
 size 5176