gpt2-dpo-with-cosine-lr-scheduler

This model is a fine-tuned version of mNLP-project/gpt2-finetuned on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1168
  • Rewards/chosen: 3.8849
  • Rewards/rejected: 3.2031
  • Rewards/accuracies: 0.5892
  • Rewards/margins: 0.6818
  • Logps/rejected: -761.2470
  • Logps/chosen: -910.5992
  • Logits/rejected: -36.5651
  • Logits/chosen: -30.3810

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.9846 1.0 1337 1.1168 3.8849 3.2031 0.5892 0.6818 -761.2470 -910.5992 -36.5651 -30.3810
0.6025 2.0 2674 1.1405 5.0060 4.0992 0.6175 0.9068 -752.2864 -899.3887 -35.0528 -28.9839
0.2464 3.0 4011 1.1202 4.6754 3.6835 0.6160 0.9919 -756.4427 -902.6943 -39.6513 -33.3219
0.1182 4.0 5348 1.3054 7.3114 5.8367 0.6131 1.4747 -734.9108 -876.3349 -35.1974 -28.6005
0.0669 5.0 6685 1.3846 6.5378 5.0738 0.6093 1.4640 -742.5399 -884.0710 -39.0355 -31.8814
0.0226 6.0 8022 1.4662 6.2901 4.6812 0.6052 1.6089 -746.4659 -886.5475 -40.3811 -32.9593
0.0128 7.0 9359 1.5557 5.8081 4.1554 0.6108 1.6527 -751.7241 -891.3676 -39.1744 -31.2704
0.019 8.0 10696 1.6676 5.5428 3.8458 0.6011 1.6970 -754.8205 -894.0207 -40.5161 -32.4700
0.0101 9.0 12033 1.7100 5.5531 3.8215 0.6022 1.7315 -755.0627 -893.9178 -40.7171 -32.5929
0.0053 10.0 13370 1.7177 5.4221 3.7030 0.6000 1.7191 -756.2481 -895.2274 -40.8064 -32.6689

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
20
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mNLP-project/gpt2-dpo-with-cosine-lr-scheduler

Finetuned
(2)
this model