IE_L3_1000steps_1e6rate_03beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.3199
  • Rewards/rejected: -13.3530
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 12.0331
  • Logps/rejected: -120.1372
  • Logps/chosen: -87.1973
  • Logits/rejected: -0.8052
  • Logits/chosen: -0.7124

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1907 0.4 50 0.1802 -1.0923 -10.4680 0.7400 9.3757 -110.5205 -86.4386 -0.7963 -0.7114
0.1386 0.8 100 0.1802 -1.2190 -11.5716 0.7400 10.3526 -114.1993 -86.8611 -0.7960 -0.7088
0.1386 1.2 150 0.1802 -1.2269 -11.8797 0.7400 10.6528 -115.2263 -86.8875 -0.7973 -0.7092
0.1733 1.6 200 0.1802 -1.2628 -12.4562 0.7400 11.1934 -117.1479 -87.0072 -0.7983 -0.7088
0.2253 2.0 250 0.1802 -1.2811 -12.6109 0.7400 11.3298 -117.6637 -87.0682 -0.8005 -0.7100
0.1386 2.4 300 0.1802 -1.2819 -12.6821 0.7400 11.4002 -117.9011 -87.0709 -0.8009 -0.7104
0.1213 2.8 350 0.1802 -1.2857 -12.9252 0.7400 11.6395 -118.7114 -87.0834 -0.8024 -0.7110
0.1906 3.2 400 0.1802 -1.2904 -12.9929 0.7400 11.7024 -118.9368 -87.0992 -0.8026 -0.7109
0.1906 3.6 450 0.1802 -1.2935 -13.0320 0.7400 11.7385 -119.0673 -87.1095 -0.8030 -0.7112
0.2079 4.0 500 0.1802 -1.3034 -13.1728 0.7400 11.8694 -119.5364 -87.1423 -0.8047 -0.7126
0.156 4.4 550 0.1802 -1.3085 -13.2242 0.7400 11.9157 -119.7078 -87.1593 -0.8035 -0.7118
0.1213 4.8 600 0.1802 -1.2992 -13.2411 0.7400 11.9418 -119.7642 -87.1285 -0.8054 -0.7131
0.1906 5.2 650 0.1802 -1.3144 -13.3156 0.7400 12.0011 -120.0125 -87.1792 -0.8048 -0.7117
0.2426 5.6 700 0.1802 -1.2925 -13.3031 0.7400 12.0106 -119.9710 -87.1061 -0.8043 -0.7117
0.2599 6.0 750 0.1802 -1.3084 -13.3298 0.7400 12.0213 -120.0597 -87.1592 -0.8052 -0.7126
0.1213 6.4 800 0.1802 -1.3118 -13.3477 0.7400 12.0359 -120.1197 -87.1704 -0.8039 -0.7116
0.2426 6.8 850 0.1802 -1.3228 -13.3620 0.7400 12.0392 -120.1673 -87.2071 -0.8052 -0.7125
0.1733 7.2 900 0.1802 -1.3137 -13.3379 0.7400 12.0242 -120.0870 -87.1768 -0.8052 -0.7125
0.1386 7.6 950 0.1802 -1.3070 -13.3530 0.7400 12.0460 -120.1374 -87.1545 -0.8053 -0.7127
0.156 8.0 1000 0.1802 -1.3199 -13.3530 0.7400 12.0331 -120.1372 -87.1973 -0.8052 -0.7124

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/IE_L3_1000steps_1e6rate_03beta_cSFTDPO