mNLP-project
/

gpt2-dpo-with-cosine-lr-scheduler

@@ -17,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mNLP-project/gpt2-finetuned](https://huggingface.co/mNLP-project/gpt2-finetuned) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.5027
-- Rewards/chosen: 7.0965
-- Rewards/rejected: 5.7124
-- Rewards/accuracies: 0.6101
-- Rewards/margins: 1.3842
-- Logps/rejected: -736.1544
-- Logps/chosen: -878.4832
-- Logits/rejected: -37.8324
-- Logits/chosen: -32.9004
 ## Model description
@@ -44,7 +44,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 3e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 1.2678        | 1.0   | 1337  | 1.5076          | 4.1438         | 3.2671           | 0.5687             | 0.8767          | -760.6065      | -908.0106    | -41.6142        | -35.6298      |
-| 0.8767        | 2.0   | 2674  | 1.5027          | 7.0965         | 5.7124           | 0.6101             | 1.3842          | -736.1544      | -878.4832    | -37.8324        | -32.9004      |
-| 0.431         | 3.0   | 4011  | 1.5905          | 6.0978         | 4.7517           | 0.5929             | 1.3462          | -745.7613      | -888.4703    | -38.2186        | -32.5863      |
-| 0.1242        | 4.0   | 5348  | 1.7672          | 8.8069         | 6.9080           | 0.6138             | 1.8988          | -724.1977      | -861.3801    | -35.3133        | -29.3914      |
-| 0.0166        | 5.0   | 6685  | 1.9424          | 8.8192         | 6.7038           | 0.6011             | 2.1154          | -726.2397      | -861.2565    | -39.2436        | -33.0524      |
-| 0.0031        | 6.0   | 8022  | 2.0099          | 7.4468         | 5.2575           | 0.6071             | 2.1894          | -740.7034      | -874.9804    | -39.1570        | -32.4214      |
-| 0.0111        | 7.0   | 9359  | 2.0798          | 6.9472         | 4.8187           | 0.6004             | 2.1285          | -745.0905      | -879.9766    | -39.8656        | -32.8297      |
-| 0.0176        | 8.0   | 10696 | 2.1751          | 6.9736         | 4.7371           | 0.6034             | 2.2364          | -745.9068      | -879.7130    | -39.2893        | -32.1535      |
-| 0.0089        | 9.0   | 12033 | 2.2161          | 6.6595         | 4.4256           | 0.6019             | 2.2339          | -749.0217      | -882.8531    | -39.3982        | -32.1973      |
-| 0.0045        | 10.0  | 13370 | 2.2229          | 6.5755         | 4.3480           | 0.6007             | 2.2275          | -749.7980      | -883.6937    | -39.5822        | -32.3730      |
 ### Framework versions

 This model is a fine-tuned version of [mNLP-project/gpt2-finetuned](https://huggingface.co/mNLP-project/gpt2-finetuned) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1168
+- Rewards/chosen: 3.8849
+- Rewards/rejected: 3.2031
+- Rewards/accuracies: 0.5892
+- Rewards/margins: 0.6818
+- Logps/rejected: -761.2470
+- Logps/chosen: -910.5992
+- Logits/rejected: -36.5651
+- Logits/chosen: -30.3810
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 | Training Loss | Epoch | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.9846        | 1.0   | 1337  | 1.1168          | 3.8849         | 3.2031           | 0.5892             | 0.6818          | -761.2470      | -910.5992    | -36.5651        | -30.3810      |
+| 0.6025        | 2.0   | 2674  | 1.1405          | 5.0060         | 4.0992           | 0.6175             | 0.9068          | -752.2864      | -899.3887    | -35.0528        | -28.9839      |
+| 0.2464        | 3.0   | 4011  | 1.1202          | 4.6754         | 3.6835           | 0.6160             | 0.9919          | -756.4427      | -902.6943    | -39.6513        | -33.3219      |
+| 0.1182        | 4.0   | 5348  | 1.3054          | 7.3114         | 5.8367           | 0.6131             | 1.4747          | -734.9108      | -876.3349    | -35.1974        | -28.6005      |
+| 0.0669        | 5.0   | 6685  | 1.3846          | 6.5378         | 5.0738           | 0.6093             | 1.4640          | -742.5399      | -884.0710    | -39.0355        | -31.8814      |
+| 0.0226        | 6.0   | 8022  | 1.4662          | 6.2901         | 4.6812           | 0.6052             | 1.6089          | -746.4659      | -886.5475    | -40.3811        | -32.9593      |
+| 0.0128        | 7.0   | 9359  | 1.5557          | 5.8081         | 4.1554           | 0.6108             | 1.6527          | -751.7241      | -891.3676    | -39.1744        | -31.2704      |
+| 0.019         | 8.0   | 10696 | 1.6676          | 5.5428         | 3.8458           | 0.6011             | 1.6970          | -754.8205      | -894.0207    | -40.5161        | -32.4700      |
+| 0.0101        | 9.0   | 12033 | 1.7100          | 5.5531         | 3.8215           | 0.6022             | 1.7315          | -755.0627      | -893.9178    | -40.7171        | -32.5929      |
+| 0.0053        | 10.0  | 13370 | 1.7177          | 5.4221         | 3.7030           | 0.6000             | 1.7191          | -756.2481      | -895.2274    | -40.8064        | -32.6689      |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:26ab6777efa1486fbedaa67fd9f8ffc5db2b450dd61768a689397ed08eafa178
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:dfca1a44eee10523ba16f368b4b7c634d3a5869375730b11248d79f343712a2d
 size 497774208