thorirhrafn commited on
Commit
f352da5
·
verified ·
1 Parent(s): 05a9d86

End of training

Browse files
Files changed (1) hide show
  1. README.md +39 -29
README.md CHANGED
@@ -18,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0072
22
- - Rewards/chosen: 0.0941
23
- - Rewards/rejected: -6.5038
24
  - Rewards/accuracies: 1.0
25
- - Rewards/margins: 6.5979
26
- - Logps/rejected: -286.6546
27
- - Logps/chosen: -129.1379
28
- - Logits/rejected: -2.8651
29
- - Logits/chosen: -3.0778
30
 
31
  ## Model description
32
 
@@ -53,32 +53,42 @@ The following hyperparameters were used during training:
53
  - total_train_batch_size: 8
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
- - num_epochs: 2
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.4138 | 0.1 | 25 | 0.2662 | 0.1334 | -1.1557 | 0.9933 | 1.2891 | -233.1731 | -128.7441 | -3.1801 | -3.3393 |
63
- | 0.0737 | 0.2 | 50 | 0.0751 | 0.0936 | -3.0054 | 0.9933 | 3.0990 | -251.6701 | -129.1423 | -3.0398 | -3.2238 |
64
- | 0.0326 | 0.3 | 75 | 0.0306 | 0.0458 | -4.4598 | 0.9967 | 4.5056 | -266.2143 | -129.6201 | -2.9415 | -3.1395 |
65
- | 0.0152 | 0.4 | 100 | 0.0185 | 0.0487 | -5.1528 | 0.9967 | 5.2015 | -273.1442 | -129.5917 | -2.8904 | -3.0965 |
66
- | 0.012 | 0.5 | 125 | 0.0142 | 0.0364 | -5.6266 | 0.9967 | 5.6630 | -277.8825 | -129.7142 | -2.8681 | -3.0781 |
67
- | 0.0094 | 0.6 | 150 | 0.0117 | 0.0507 | -5.8471 | 1.0 | 5.8978 | -280.0875 | -129.5719 | -2.8740 | -3.0845 |
68
- | 0.0079 | 0.7 | 175 | 0.0100 | 0.0869 | -5.9291 | 1.0 | 6.0160 | -280.9072 | -129.2093 | -2.8778 | -3.0889 |
69
- | 0.0087 | 0.79 | 200 | 0.0087 | 0.1196 | -6.0755 | 1.0 | 6.1950 | -282.3711 | -128.8827 | -2.8810 | -3.0926 |
70
- | 0.0062 | 0.89 | 225 | 0.0084 | 0.1049 | -6.2234 | 1.0 | 6.3283 | -283.8507 | -129.0298 | -2.8740 | -3.0867 |
71
- | 0.0044 | 0.99 | 250 | 0.0081 | 0.0951 | -6.3294 | 1.0 | 6.4244 | -284.9102 | -129.1279 | -2.8684 | -3.0814 |
72
- | 0.0032 | 1.09 | 275 | 0.0075 | 0.0833 | -6.4222 | 1.0 | 6.5055 | -285.8390 | -129.2458 | -2.8659 | -3.0786 |
73
- | 0.0059 | 1.19 | 300 | 0.0074 | 0.0877 | -6.4574 | 1.0 | 6.5451 | -286.1902 | -129.2014 | -2.8653 | -3.0780 |
74
- | 0.0039 | 1.29 | 325 | 0.0075 | 0.0901 | -6.4657 | 1.0 | 6.5558 | -286.2732 | -129.1771 | -2.8652 | -3.0779 |
75
- | 0.0046 | 1.39 | 350 | 0.0072 | 0.0935 | -6.4795 | 1.0 | 6.5730 | -286.4112 | -129.1433 | -2.8657 | -3.0782 |
76
- | 0.0034 | 1.49 | 375 | 0.0071 | 0.0925 | -6.4899 | 1.0 | 6.5823 | -286.5153 | -129.1540 | -2.8656 | -3.0782 |
77
- | 0.0029 | 1.59 | 400 | 0.0072 | 0.0944 | -6.4945 | 1.0 | 6.5889 | -286.5615 | -129.1349 | -2.8652 | -3.0780 |
78
- | 0.0045 | 1.69 | 425 | 0.0071 | 0.0930 | -6.4944 | 1.0 | 6.5874 | -286.5602 | -129.1481 | -2.8651 | -3.0779 |
79
- | 0.0032 | 1.79 | 450 | 0.0073 | 0.0945 | -6.4961 | 1.0 | 6.5906 | -286.5771 | -129.1332 | -2.8650 | -3.0776 |
80
- | 0.0031 | 1.89 | 475 | 0.0072 | 0.0930 | -6.4987 | 1.0 | 6.5917 | -286.6037 | -129.1484 | -2.8651 | -3.0778 |
81
- | 0.0034 | 1.99 | 500 | 0.0072 | 0.0941 | -6.5038 | 1.0 | 6.5979 | -286.6546 | -129.1379 | -2.8651 | -3.0778 |
 
 
 
 
 
 
 
 
 
 
82
 
83
 
84
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.0057
22
+ - Rewards/chosen: 0.0954
23
+ - Rewards/rejected: -6.8658
24
  - Rewards/accuracies: 1.0
25
+ - Rewards/margins: 6.9612
26
+ - Logps/rejected: -290.2744
27
+ - Logps/chosen: -129.1244
28
+ - Logits/rejected: -2.8564
29
+ - Logits/chosen: -3.0705
30
 
31
  ## Model description
32
 
 
53
  - total_train_batch_size: 8
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
+ - num_epochs: 3
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.423 | 0.1 | 25 | 0.2732 | 0.1382 | -1.1127 | 0.9900 | 1.2509 | -232.7433 | -128.6963 | -3.1835 | -3.3419 |
63
+ | 0.0751 | 0.2 | 50 | 0.0774 | 0.1041 | -2.9483 | 0.9933 | 3.0524 | -251.0999 | -129.0378 | -3.0438 | -3.2267 |
64
+ | 0.0333 | 0.3 | 75 | 0.0306 | 0.0559 | -4.4302 | 0.9967 | 4.4861 | -265.9183 | -129.5195 | -2.9445 | -3.1419 |
65
+ | 0.0148 | 0.4 | 100 | 0.0184 | 0.0605 | -5.1453 | 0.9967 | 5.2058 | -273.0695 | -129.4732 | -2.8915 | -3.0970 |
66
+ | 0.0117 | 0.5 | 125 | 0.0137 | 0.0536 | -5.6291 | 1.0 | 5.6827 | -277.9078 | -129.5426 | -2.8703 | -3.0794 |
67
+ | 0.0089 | 0.6 | 150 | 0.0111 | 0.0653 | -5.8777 | 1.0 | 5.9429 | -280.3931 | -129.4260 | -2.8768 | -3.0866 |
68
+ | 0.0072 | 0.7 | 175 | 0.0094 | 0.1032 | -5.9748 | 1.0 | 6.0780 | -281.3649 | -129.0464 | -2.8814 | -3.0918 |
69
+ | 0.0086 | 0.79 | 200 | 0.0079 | 0.1381 | -6.1527 | 1.0 | 6.2909 | -283.1440 | -128.6971 | -2.8828 | -3.0942 |
70
+ | 0.0055 | 0.89 | 225 | 0.0075 | 0.1177 | -6.3321 | 1.0 | 6.4498 | -284.9379 | -128.9017 | -2.8750 | -3.0877 |
71
+ | 0.0039 | 0.99 | 250 | 0.0072 | 0.1005 | -6.4757 | 1.0 | 6.5762 | -286.3734 | -129.0737 | -2.8678 | -3.0809 |
72
+ | 0.0028 | 1.09 | 275 | 0.0067 | 0.0874 | -6.6119 | 1.0 | 6.6992 | -287.7352 | -129.2047 | -2.8641 | -3.0772 |
73
+ | 0.0049 | 1.19 | 300 | 0.0063 | 0.0960 | -6.6507 | 1.0 | 6.7467 | -288.1233 | -129.1187 | -2.8637 | -3.0768 |
74
+ | 0.0032 | 1.29 | 325 | 0.0064 | 0.0981 | -6.6773 | 1.0 | 6.7753 | -288.3892 | -129.0980 | -2.8629 | -3.0763 |
75
+ | 0.0035 | 1.39 | 350 | 0.0061 | 0.0994 | -6.7027 | 1.0 | 6.8021 | -288.6437 | -129.0850 | -2.8638 | -3.0770 |
76
+ | 0.0027 | 1.49 | 375 | 0.0059 | 0.0970 | -6.7348 | 1.0 | 6.8318 | -288.9645 | -129.1081 | -2.8629 | -3.0763 |
77
+ | 0.0024 | 1.59 | 400 | 0.0059 | 0.1007 | -6.7581 | 1.0 | 6.8588 | -289.1978 | -129.0716 | -2.8616 | -3.0752 |
78
+ | 0.0035 | 1.69 | 425 | 0.0058 | 0.1019 | -6.7665 | 1.0 | 6.8684 | -289.2819 | -129.0595 | -2.8609 | -3.0746 |
79
+ | 0.0025 | 1.79 | 450 | 0.0059 | 0.1000 | -6.7839 | 1.0 | 6.8839 | -289.4559 | -129.0789 | -2.8602 | -3.0739 |
80
+ | 0.0024 | 1.89 | 475 | 0.0056 | 0.0982 | -6.8036 | 1.0 | 6.9018 | -289.6526 | -129.0969 | -2.8595 | -3.0732 |
81
+ | 0.0026 | 1.99 | 500 | 0.0057 | 0.0978 | -6.8226 | 1.0 | 6.9204 | -289.8423 | -129.1003 | -2.8586 | -3.0724 |
82
+ | 0.0033 | 2.09 | 525 | 0.0057 | 0.0952 | -6.8383 | 1.0 | 6.9335 | -289.9999 | -129.1269 | -2.8571 | -3.0711 |
83
+ | 0.003 | 2.19 | 550 | 0.0056 | 0.0966 | -6.8546 | 1.0 | 6.9513 | -290.1629 | -129.1121 | -2.8571 | -3.0712 |
84
+ | 0.0024 | 2.29 | 575 | 0.0057 | 0.0957 | -6.8546 | 1.0 | 6.9503 | -290.1624 | -129.1215 | -2.8571 | -3.0712 |
85
+ | 0.0038 | 2.38 | 600 | 0.0057 | 0.0959 | -6.8568 | 1.0 | 6.9527 | -290.1844 | -129.1196 | -2.8572 | -3.0712 |
86
+ | 0.0026 | 2.48 | 625 | 0.0056 | 0.0943 | -6.8630 | 1.0 | 6.9574 | -290.2470 | -129.1351 | -2.8571 | -3.0710 |
87
+ | 0.0031 | 2.58 | 650 | 0.0056 | 0.0937 | -6.8627 | 1.0 | 6.9564 | -290.2435 | -129.1417 | -2.8565 | -3.0704 |
88
+ | 0.0024 | 2.68 | 675 | 0.0057 | 0.0961 | -6.8653 | 1.0 | 6.9614 | -290.2693 | -129.1175 | -2.8568 | -3.0709 |
89
+ | 0.0022 | 2.78 | 700 | 0.0057 | 0.0960 | -6.8628 | 1.0 | 6.9588 | -290.2445 | -129.1185 | -2.8567 | -3.0707 |
90
+ | 0.002 | 2.88 | 725 | 0.0057 | 0.0944 | -6.8626 | 1.0 | 6.9570 | -290.2426 | -129.1347 | -2.8565 | -3.0705 |
91
+ | 0.0023 | 2.98 | 750 | 0.0057 | 0.0954 | -6.8658 | 1.0 | 6.9612 | -290.2744 | -129.1244 | -2.8564 | -3.0705 |
92
 
93
 
94
  ### Framework versions