irenewds/shawgpt-ft

Browse files

Files changed (3) hide show

README.md +20 -23
runs/Oct24_01-49-11_89957c487371/events.out.tfevents.1729734558.89957c487371.1061.2 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.3489
 ## Model description
@@ -39,11 +39,11 @@ The following hyperparameters were used during training:
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 2
 - num_epochs: 20
 - mixed_precision_training: Native AMP
@@ -51,25 +51,22 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
-| 4.6062        | 0.9231  | 3    | 4.0362          |
-| 4.1781        | 1.8462  | 6    | 3.6161          |
-| 3.6808        | 2.7692  | 9    | 3.2051          |
-| 2.4181        | 4.0     | 13   | 2.7329          |
-| 2.845         | 4.9231  | 16   | 2.4553          |
-| 2.4808        | 5.8462  | 19   | 2.1947          |
-| 2.1606        | 6.7692  | 22   | 1.9599          |
-| 1.425         | 8.0     | 26   | 1.7251          |
-| 1.7116        | 8.9231  | 29   | 1.6083          |
-| 1.5518        | 9.8462  | 32   | 1.5094          |
-| 1.4834        | 10.7692 | 35   | 1.4522          |
-| 1.0371        | 12.0    | 39   | 1.3994          |
-| 1.3682        | 12.9231 | 42   | 1.3823          |
-| 1.3207        | 13.8462 | 45   | 1.3714          |
-| 1.3218        | 14.7692 | 48   | 1.3640          |
-| 0.9873        | 16.0    | 52   | 1.3561          |
-| 1.2872        | 16.9231 | 55   | 1.3521          |
-| 1.2689        | 17.8462 | 58   | 1.3495          |
-| 0.8971        | 18.4615 | 60   | 1.3489          |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.1407
 ## Model description
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
+- gradient_accumulation_steps: 5
+- total_train_batch_size: 20
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 5
 - num_epochs: 20
 - mixed_precision_training: Native AMP
 | Training Loss | Epoch   | Step | Validation Loss |
 |:-------------:|:-------:|:----:|:---------------:|
+| 5.5727        | 0.7692  | 2    | 4.2207          |
+| 3.6433        | 1.9231  | 5    | 4.0730          |
+| 5.2005        | 2.6923  | 7    | 3.8684          |
+| 3.2231        | 3.8462  | 10   | 3.5776          |
+| 3.0225        | 5.0     | 13   | 3.2987          |
+| 4.1939        | 5.7692  | 15   | 3.1272          |
+| 2.6236        | 6.9231  | 18   | 2.9029          |
+| 3.6835        | 7.6923  | 20   | 2.7760          |
+| 2.3205        | 8.8462  | 23   | 2.6148          |
+| 2.2036        | 10.0    | 26   | 2.4778          |
+| 3.1613        | 10.7692 | 28   | 2.3953          |
+| 1.9944        | 11.9231 | 31   | 2.2868          |
+| 2.8896        | 12.6923 | 33   | 2.2336          |
+| 1.8602        | 13.8462 | 36   | 2.1767          |
+| 1.8298        | 15.0    | 39   | 2.1446          |
+| 2.1152        | 15.3846 | 40   | 2.1407          |
 ### Framework versions

runs/Oct24_01-49-11_89957c487371/events.out.tfevents.1729734558.89957c487371.1061.2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63881f956dbf3e1a6bc63243bbe4c1b800fa97dd6a9672ea0c71b6edb3ecd701
+size 13499

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:00f503e49e741f1ddad3d08ba2dd98cfccccd8d44fbe62354a8313f41c36708e
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:75223f49f289519acc4885b96fa688542e9a214ddaaeba0ac7c8bd79460d1f22
 size 5176