tsavage68's picture
End of training
2a43a37 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e7rate_05beta_cSFTDPO
    results: []

IE_L3_1000steps_1e7rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.1386
  • Rewards/rejected: -10.9339
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 9.7954
  • Logps/rejected: -97.4951
  • Logps/chosen: -85.0749
  • Logits/rejected: -0.7939
  • Logits/chosen: -0.7200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4416 0.4 50 0.3457 -0.0969 -1.1506 0.7400 1.0537 -77.9284 -82.9916 -0.7954 -0.7373
0.1388 0.8 100 0.1803 -0.7835 -7.7662 0.7400 6.9827 -91.1596 -84.3647 -0.7936 -0.7251
0.1387 1.2 150 0.1802 -0.9415 -9.2178 0.7400 8.2763 -94.0629 -84.6808 -0.7940 -0.7226
0.1733 1.6 200 0.1802 -0.9618 -9.5890 0.7400 8.6272 -94.8052 -84.7213 -0.7940 -0.7227
0.2253 2.0 250 0.1802 -1.0365 -9.8116 0.7400 8.7750 -95.2504 -84.8709 -0.7938 -0.7219
0.1386 2.4 300 0.1802 -1.0393 -10.0428 0.7400 9.0035 -95.7128 -84.8764 -0.7938 -0.7216
0.1213 2.8 350 0.1802 -1.0477 -10.3216 0.7400 9.2739 -96.2705 -84.8933 -0.7934 -0.7207
0.1906 3.2 400 0.1802 -1.0921 -10.5149 0.7400 9.4228 -96.6571 -84.9820 -0.7947 -0.7217
0.1906 3.6 450 0.1802 -1.0970 -10.5317 0.7400 9.4347 -96.6906 -84.9917 -0.7945 -0.7214
0.208 4.0 500 0.1802 -1.1136 -10.7153 0.7400 9.6017 -97.0578 -85.0249 -0.7951 -0.7219
0.156 4.4 550 0.1802 -1.1237 -10.8074 0.7400 9.6837 -97.2419 -85.0451 -0.7948 -0.7214
0.1213 4.8 600 0.1802 -1.1291 -10.8336 0.7400 9.7045 -97.2944 -85.0559 -0.7943 -0.7205
0.1906 5.2 650 0.1802 -1.1297 -10.8980 0.7400 9.7683 -97.4233 -85.0572 -0.7939 -0.7202
0.2426 5.6 700 0.1802 -1.1277 -10.8859 0.7400 9.7582 -97.3990 -85.0531 -0.7953 -0.7215
0.2599 6.0 750 0.1802 -1.1398 -10.9204 0.7400 9.7806 -97.4681 -85.0774 -0.7944 -0.7204
0.1213 6.4 800 0.1802 -1.1496 -10.9309 0.7400 9.7813 -97.4891 -85.0970 -0.7947 -0.7207
0.2426 6.8 850 0.1802 -1.1208 -10.9075 0.7400 9.7867 -97.4422 -85.0394 -0.7944 -0.7204
0.1733 7.2 900 0.1802 -1.1302 -10.9173 0.7400 9.7871 -97.4618 -85.0581 -0.7939 -0.7201
0.1386 7.6 950 0.1802 -1.1386 -10.9339 0.7400 9.7954 -97.4951 -85.0749 -0.7939 -0.7200
0.156 8.0 1000 0.1802 -1.1386 -10.9339 0.7400 9.7954 -97.4951 -85.0749 -0.7939 -0.7200

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1