gpt2-dpo
This model is a fine-tuned version of mNLP-project/gpt2-finetuned on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6350
- Rewards/chosen: 1.6222
- Rewards/rejected: 1.3204
- Rewards/accuracies: 0.6496
- Rewards/margins: 0.3018
- Logps/rejected: -780.0735
- Logps/chosen: -933.2262
- Logits/rejected: -34.5449
- Logits/chosen: -28.7838
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6286 | 0.9993 | 668 | 0.6350 | 1.6222 | 1.3204 | 0.6496 | 0.3018 | -780.0735 | -933.2262 | -34.5449 | -28.7838 |
0.6387 | 2.0 | 1337 | 0.6662 | 1.8546 | 1.5416 | 0.6302 | 0.3130 | -777.8622 | -930.9024 | -34.5110 | -28.7424 |
0.5643 | 2.9993 | 2005 | 0.6635 | 2.0534 | 1.6918 | 0.6396 | 0.3616 | -776.3599 | -928.9147 | -34.5066 | -28.7168 |
0.4487 | 4.0 | 2674 | 0.6677 | 2.2748 | 1.8809 | 0.6451 | 0.3940 | -774.4694 | -926.7002 | -34.1409 | -28.2530 |
0.3831 | 4.9993 | 3342 | 0.6783 | 2.4765 | 2.0527 | 0.6418 | 0.4238 | -772.7513 | -924.6838 | -34.0051 | -28.0668 |
0.352 | 6.0 | 4011 | 0.6782 | 2.4441 | 2.0097 | 0.6440 | 0.4344 | -773.1808 | -925.0074 | -34.0868 | -28.1418 |
0.3189 | 6.9993 | 4679 | 0.6840 | 2.2310 | 1.8303 | 0.6343 | 0.4008 | -774.9752 | -927.1384 | -33.9525 | -27.9466 |
0.3006 | 8.0 | 5348 | 0.6882 | 2.4339 | 1.9918 | 0.6388 | 0.4422 | -773.3604 | -925.1093 | -33.7716 | -27.7551 |
0.3152 | 8.9993 | 6016 | 0.6891 | 2.4920 | 2.0457 | 0.6407 | 0.4462 | -772.8206 | -924.5289 | -33.6753 | -27.6463 |
0.2752 | 9.9925 | 6680 | 0.6892 | 2.4562 | 2.0151 | 0.6410 | 0.4411 | -773.1274 | -924.8871 | -33.6818 | -27.6538 |
Framework versions
- Transformers 4.40.2
- Pytorch 2.1.0+cu118
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 47
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.