tsavage68's picture
End of training
b984999 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e7rate_03beta_cSFTDPO
    results: []

IE_L3_1000steps_1e7rate_03beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.0922
  • Rewards/rejected: -10.0336
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 8.9414
  • Logps/rejected: -109.0726
  • Logps/chosen: -86.4386
  • Logits/rejected: -0.8003
  • Logits/chosen: -0.7150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5177 0.4 50 0.4502 -0.0545 -0.6708 0.7400 0.6163 -77.8632 -82.9793 -0.7959 -0.7377
0.1418 0.8 100 0.1816 -0.5946 -5.5441 0.7400 4.9495 -94.1076 -84.7799 -0.7931 -0.7224
0.1388 1.2 150 0.1803 -0.8790 -8.0921 0.7400 7.2131 -102.6009 -85.7278 -0.7962 -0.7173
0.1733 1.6 200 0.1803 -0.9325 -8.7008 0.7400 7.7683 -104.6298 -85.9060 -0.7980 -0.7170
0.2253 2.0 250 0.1803 -0.9653 -8.8898 0.7400 7.9244 -105.2598 -86.0156 -0.7979 -0.7163
0.1387 2.4 300 0.1802 -0.9837 -9.1362 0.7400 8.1525 -106.0812 -86.0766 -0.7975 -0.7157
0.1213 2.8 350 0.1802 -1.0210 -9.4276 0.7400 8.4066 -107.0527 -86.2011 -0.7989 -0.7159
0.1906 3.2 400 0.1802 -1.0245 -9.5511 0.7400 8.5265 -107.4642 -86.2129 -0.7991 -0.7152
0.1906 3.6 450 0.1802 -1.0419 -9.6482 0.7400 8.6063 -107.7879 -86.2706 -0.7995 -0.7155
0.208 4.0 500 0.1802 -1.0676 -9.8319 0.7400 8.7643 -108.4001 -86.3564 -0.7999 -0.7153
0.156 4.4 550 0.1802 -1.0697 -9.9071 0.7400 8.8374 -108.6509 -86.3635 -0.8011 -0.7160
0.1213 4.8 600 0.1802 -1.0716 -9.9151 0.7400 8.8436 -108.6776 -86.3697 -0.8002 -0.7154
0.1906 5.2 650 0.1802 -1.0758 -9.9883 0.7400 8.9125 -108.9217 -86.3839 -0.8005 -0.7154
0.2426 5.6 700 0.1802 -1.0847 -10.0383 0.7400 8.9536 -109.0882 -86.4134 -0.8003 -0.7150
0.2599 6.0 750 0.1802 -1.0957 -10.0559 0.7400 8.9602 -109.1469 -86.4500 -0.8008 -0.7156
0.1213 6.4 800 0.1802 -1.0865 -10.0490 0.7400 8.9625 -109.1239 -86.4195 -0.7997 -0.7139
0.2426 6.8 850 0.1802 -1.0859 -10.0366 0.7400 8.9506 -109.0825 -86.4176 -0.8000 -0.7146
0.1733 7.2 900 0.1802 -1.0860 -10.0398 0.7400 8.9538 -109.0932 -86.4178 -0.8002 -0.7149
0.1386 7.6 950 0.1802 -1.0922 -10.0336 0.7400 8.9414 -109.0726 -86.4386 -0.8003 -0.7150
0.156 8.0 1000 0.1802 -1.0922 -10.0336 0.7400 8.9414 -109.0726 -86.4386 -0.8003 -0.7150

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1