irenewds/shawgpt-ft

Browse files

Files changed (3) hide show

README.md +22 -19
runs/Oct24_01-54-24_89957c487371/events.out.tfevents.1729734872.89957c487371.1061.3 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.1407
 ## Model description
@@ -39,8 +39,8 @@ The following hyperparameters were used during training:
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
-- gradient_accumulation_steps: 5
-- total_train_batch_size: 20
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 5
@@ -51,22 +51,25 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
-| 5.5727        | 0.7692  | 2    | 4.2207          |
-| 3.6433        | 1.9231  | 5    | 4.0730          |
-| 5.2005        | 2.6923  | 7    | 3.8684          |
-| 3.2231        | 3.8462  | 10   | 3.5776          |
-| 3.0225        | 5.0     | 13   | 3.2987          |
-| 4.1939        | 5.7692  | 15   | 3.1272          |
-| 2.6236        | 6.9231  | 18   | 2.9029          |
-| 3.6835        | 7.6923  | 20   | 2.7760          |
-| 2.3205        | 8.8462  | 23   | 2.6148          |
-| 2.2036        | 10.0    | 26   | 2.4778          |
-| 3.1613        | 10.7692 | 28   | 2.3953          |
-| 1.9944        | 11.9231 | 31   | 2.2868          |
-| 2.8896        | 12.6923 | 33   | 2.2336          |
-| 1.8602        | 13.8462 | 36   | 2.1767          |
-| 1.8298        | 15.0    | 39   | 2.1446          |
-| 2.1152        | 15.3846 | 40   | 2.1407          |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.3449
 ## Model description
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 5
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
+| 4.6313        | 0.9231  | 3    | 4.1546          |
+| 4.3756        | 1.8462  | 6    | 3.7939          |
+| 3.8728        | 2.7692  | 9    | 3.3716          |
+| 2.5467        | 4.0     | 13   | 2.8737          |
+| 2.9976        | 4.9231  | 16   | 2.5724          |
+| 2.63          | 5.8462  | 19   | 2.3146          |
+| 2.2989        | 6.7692  | 22   | 2.0719          |
+| 1.5111        | 8.0     | 26   | 1.7852          |
+| 1.7837        | 8.9231  | 29   | 1.6516          |
+| 1.6008        | 9.8462  | 32   | 1.5426          |
+| 1.5227        | 10.7692 | 35   | 1.4755          |
+| 1.0621        | 12.0    | 39   | 1.4190          |
+| 1.3918        | 12.9231 | 42   | 1.3859          |
+| 1.332         | 13.8462 | 45   | 1.3720          |
+| 1.3299        | 14.7692 | 48   | 1.3635          |
+| 0.9924        | 16.0    | 52   | 1.3535          |
+| 1.2924        | 16.9231 | 55   | 1.3485          |
+| 1.2743        | 17.8462 | 58   | 1.3457          |
+| 0.8987        | 18.4615 | 60   | 1.3449          |
 ### Framework versions

runs/Oct24_01-54-24_89957c487371/events.out.tfevents.1729734872.89957c487371.1061.3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3de0e3d861de306bb6c2040904fbd67edbdb26e53ebe300de42c88d8a73ebf35
+size 14918

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:75223f49f289519acc4885b96fa688542e9a214ddaaeba0ac7c8bd79460d1f22
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:304f434190b1303d94734635ded020e88e6368a5deccfff5586e6c8287e56ba2
 size 5176