Luca-Engel commited on
Commit
ba5371c
·
verified ·
1 Parent(s): e8d0b5f

do test run on scitas with ref_model

Browse files
Files changed (1) hide show
  1. README.md +23 -23
README.md CHANGED
@@ -1,31 +1,31 @@
1
  ---
2
  license: mit
3
- base_model: mNLP-project/gpt2-finetuned-mcqa
4
  tags:
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
  model-index:
9
- - name: gpt2-dpo-mcqa
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # gpt2-dpo-mcqa
17
 
18
- This model is a fine-tuned version of [mNLP-project/gpt2-finetuned-mcqa](https://huggingface.co/mNLP-project/gpt2-finetuned-mcqa) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.6310
21
- - Rewards/chosen: 1.4580
22
- - Rewards/rejected: 1.1845
23
- - Rewards/accuracies: 0.6414
24
- - Rewards/margins: 0.2735
25
- - Logps/rejected: -659.0944
26
- - Logps/chosen: -787.4795
27
- - Logits/rejected: -14.9328
28
- - Logits/chosen: -11.6364
29
 
30
  ## Model description
31
 
@@ -59,16 +59,16 @@ The following hyperparameters were used during training:
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
- | 0.6407 | 0.9993 | 668 | 0.6460 | 0.7721 | 0.6216 | 0.6295 | 0.1505 | -664.7236 | -794.3383 | -15.1273 | -11.7899 |
63
- | 0.6498 | 2.0 | 1337 | 0.6374 | 1.2927 | 1.0475 | 0.6325 | 0.2453 | -660.4651 | -789.1318 | -14.9517 | -11.6401 |
64
- | 0.6468 | 2.9993 | 2005 | 0.6342 | 1.3734 | 1.1102 | 0.6388 | 0.2632 | -659.8373 | -788.3249 | -14.9535 | -11.6481 |
65
- | 0.6113 | 4.0 | 2674 | 0.6332 | 1.3317 | 1.0769 | 0.6444 | 0.2548 | -660.1705 | -788.7426 | -14.9930 | -11.6897 |
66
- | 0.5826 | 4.9993 | 3342 | 0.6310 | 1.4580 | 1.1845 | 0.6414 | 0.2735 | -659.0944 | -787.4795 | -14.9328 | -11.6364 |
67
- | 0.5613 | 6.0 | 4011 | 0.6317 | 1.4979 | 1.2181 | 0.6407 | 0.2798 | -658.7584 | -787.0804 | -14.9234 | -11.6271 |
68
- | 0.581 | 6.9993 | 4679 | 0.6316 | 1.5084 | 1.2260 | 0.6437 | 0.2825 | -658.6798 | -786.9750 | -14.9319 | -11.6377 |
69
- | 0.571 | 8.0 | 5348 | 0.6320 | 1.4992 | 1.2184 | 0.6425 | 0.2808 | -658.7557 | -787.0676 | -14.9334 | -11.6373 |
70
- | 0.5943 | 8.9993 | 6016 | 0.6317 | 1.5126 | 1.2294 | 0.6437 | 0.2832 | -658.6454 | -786.9331 | -14.9226 | -11.6269 |
71
- | 0.5635 | 9.9925 | 6680 | 0.6317 | 1.5142 | 1.2308 | 0.6433 | 0.2835 | -658.6317 | -786.9168 | -14.9211 | -11.6256 |
72
 
73
 
74
  ### Framework versions
 
1
  ---
2
  license: mit
3
+ base_model: openai-community/gpt2
4
  tags:
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
  model-index:
9
+ - name: gpt2-dpo-from_base_gpt2
10
  results: []
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
+ # gpt2-dpo-from_base_gpt2
17
 
18
+ This model is a fine-tuned version of [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6406
21
+ - Rewards/chosen: 1.1312
22
+ - Rewards/rejected: 0.9208
23
+ - Rewards/accuracies: 0.6373
24
+ - Rewards/margins: 0.2103
25
+ - Logps/rejected: -429.5498
26
+ - Logps/chosen: -508.5024
27
+ - Logits/rejected: -96.1598
28
+ - Logits/chosen: -94.9073
29
 
30
  ## Model description
31
 
 
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.6679 | 0.9993 | 668 | 0.6728 | 0.2747 | 0.2209 | 0.625 | 0.0538 | -436.5490 | -517.0669 | -96.0258 | -94.8005 |
63
+ | 0.6697 | 2.0 | 1337 | 0.6545 | 0.6507 | 0.5283 | 0.6295 | 0.1224 | -433.4745 | -513.3065 | -96.0560 | -94.8147 |
64
+ | 0.6516 | 2.9993 | 2005 | 0.6467 | 0.8424 | 0.6867 | 0.6336 | 0.1557 | -431.8912 | -511.3903 | -96.1361 | -94.8919 |
65
+ | 0.6264 | 4.0 | 2674 | 0.6436 | 0.9803 | 0.7989 | 0.6336 | 0.1814 | -430.7686 | -510.0109 | -96.1278 | -94.8762 |
66
+ | 0.6114 | 4.9993 | 3342 | 0.6420 | 1.0453 | 0.8518 | 0.6377 | 0.1935 | -430.2403 | -509.3612 | -96.1435 | -94.8917 |
67
+ | 0.6016 | 6.0 | 4011 | 0.6412 | 1.0870 | 0.8859 | 0.6377 | 0.2011 | -429.8991 | -508.9442 | -96.1471 | -94.8941 |
68
+ | 0.6115 | 6.9993 | 4679 | 0.6408 | 1.1137 | 0.9071 | 0.6384 | 0.2066 | -429.6871 | -508.6768 | -96.1587 | -94.9064 |
69
+ | 0.6079 | 8.0 | 5348 | 0.6406 | 1.1274 | 0.9178 | 0.6388 | 0.2096 | -429.5802 | -508.5403 | -96.1573 | -94.9046 |
70
+ | 0.6066 | 8.9993 | 6016 | 0.6406 | 1.1310 | 0.9207 | 0.6373 | 0.2103 | -429.5507 | -508.5036 | -96.1593 | -94.9068 |
71
+ | 0.5968 | 9.9925 | 6680 | 0.6406 | 1.1312 | 0.9208 | 0.6373 | 0.2103 | -429.5498 | -508.5024 | -96.1598 | -94.9073 |
72
 
73
 
74
  ### Framework versions