mNLP-project
/

gpt2-dpo-from_base_gpt2

@@ -1,31 +1,31 @@
 ---
 license: mit
-base_model: mNLP-project/gpt2-finetuned-mcqa
 tags:
 - trl
 - dpo
 - generated_from_trainer
 model-index:
-- name: gpt2-dpo-mcqa
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# gpt2-dpo-mcqa
-This model is a fine-tuned version of [mNLP-project/gpt2-finetuned-mcqa](https://huggingface.co/mNLP-project/gpt2-finetuned-mcqa) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6310
-- Rewards/chosen: 1.4580
-- Rewards/rejected: 1.1845
-- Rewards/accuracies: 0.6414
-- Rewards/margins: 0.2735
-- Logps/rejected: -659.0944
-- Logps/chosen: -787.4795
-- Logits/rejected: -14.9328
-- Logits/chosen: -11.6364
 ## Model description
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6407        | 0.9993 | 668  | 0.6460          | 0.7721         | 0.6216           | 0.6295             | 0.1505          | -664.7236      | -794.3383    | -15.1273        | -11.7899      |
-| 0.6498        | 2.0    | 1337 | 0.6374          | 1.2927         | 1.0475           | 0.6325             | 0.2453          | -660.4651      | -789.1318    | -14.9517        | -11.6401      |
-| 0.6468        | 2.9993 | 2005 | 0.6342          | 1.3734         | 1.1102           | 0.6388             | 0.2632          | -659.8373      | -788.3249    | -14.9535        | -11.6481      |
-| 0.6113        | 4.0    | 2674 | 0.6332          | 1.3317         | 1.0769           | 0.6444             | 0.2548          | -660.1705      | -788.7426    | -14.9930        | -11.6897      |
-| 0.5826        | 4.9993 | 3342 | 0.6310          | 1.4580         | 1.1845           | 0.6414             | 0.2735          | -659.0944      | -787.4795    | -14.9328        | -11.6364      |
-| 0.5613        | 6.0    | 4011 | 0.6317          | 1.4979         | 1.2181           | 0.6407             | 0.2798          | -658.7584      | -787.0804    | -14.9234        | -11.6271      |
-| 0.581         | 6.9993 | 4679 | 0.6316          | 1.5084         | 1.2260           | 0.6437             | 0.2825          | -658.6798      | -786.9750    | -14.9319        | -11.6377      |
-| 0.571         | 8.0    | 5348 | 0.6320          | 1.4992         | 1.2184           | 0.6425             | 0.2808          | -658.7557      | -787.0676    | -14.9334        | -11.6373      |
-| 0.5943        | 8.9993 | 6016 | 0.6317          | 1.5126         | 1.2294           | 0.6437             | 0.2832          | -658.6454      | -786.9331    | -14.9226        | -11.6269      |
-| 0.5635        | 9.9925 | 6680 | 0.6317          | 1.5142         | 1.2308           | 0.6433             | 0.2835          | -658.6317      | -786.9168    | -14.9211        | -11.6256      |
 ### Framework versions

 ---
 license: mit
+base_model: openai-community/gpt2
 tags:
 - trl
 - dpo
 - generated_from_trainer
 model-index:
+- name: gpt2-dpo-from_base_gpt2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# gpt2-dpo-from_base_gpt2
+This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6406
+- Rewards/chosen: 1.1312
+- Rewards/rejected: 0.9208
+- Rewards/accuracies: 0.6373
+- Rewards/margins: 0.2103
+- Logps/rejected: -429.5498
+- Logps/chosen: -508.5024
+- Logits/rejected: -96.1598
+- Logits/chosen: -94.9073
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6679        | 0.9993 | 668  | 0.6728          | 0.2747         | 0.2209           | 0.625              | 0.0538          | -436.5490      | -517.0669    | -96.0258        | -94.8005      |
+| 0.6697        | 2.0    | 1337 | 0.6545          | 0.6507         | 0.5283           | 0.6295             | 0.1224          | -433.4745      | -513.3065    | -96.0560        | -94.8147      |
+| 0.6516        | 2.9993 | 2005 | 0.6467          | 0.8424         | 0.6867           | 0.6336             | 0.1557          | -431.8912      | -511.3903    | -96.1361        | -94.8919      |
+| 0.6264        | 4.0    | 2674 | 0.6436          | 0.9803         | 0.7989           | 0.6336             | 0.1814          | -430.7686      | -510.0109    | -96.1278        | -94.8762      |
+| 0.6114        | 4.9993 | 3342 | 0.6420          | 1.0453         | 0.8518           | 0.6377             | 0.1935          | -430.2403      | -509.3612    | -96.1435        | -94.8917      |
+| 0.6016        | 6.0    | 4011 | 0.6412          | 1.0870         | 0.8859           | 0.6377             | 0.2011          | -429.8991      | -508.9442    | -96.1471        | -94.8941      |
+| 0.6115        | 6.9993 | 4679 | 0.6408          | 1.1137         | 0.9071           | 0.6384             | 0.2066          | -429.6871      | -508.6768    | -96.1587        | -94.9064      |
+| 0.6079        | 8.0    | 5348 | 0.6406          | 1.1274         | 0.9178           | 0.6388             | 0.2096          | -429.5802      | -508.5403    | -96.1573        | -94.9046      |
+| 0.6066        | 8.9993 | 6016 | 0.6406          | 1.1310         | 0.9207           | 0.6373             | 0.2103          | -429.5507      | -508.5036    | -96.1593        | -94.9068      |
+| 0.5968        | 9.9925 | 6680 | 0.6406          | 1.1312         | 0.9208           | 0.6373             | 0.2103          | -429.5498      | -508.5024    | -96.1598        | -94.9073      |
 ### Framework versions