tsavage68's picture
End of training
01c357c verified
|
raw
history blame
5.81 kB
metadata
library_name: transformers
license: llama3
base_model: tsavage68/IE_L3_1000steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: IE_L3_1000steps_1e6rate_05beta_cSFTDPO
    results: []

IE_L3_1000steps_1e6rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_L3_1000steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1802
  • Rewards/chosen: -1.4168
  • Rewards/rejected: -13.8543
  • Rewards/accuracies: 0.7400
  • Rewards/margins: 12.4374
  • Logps/rejected: -103.3358
  • Logps/chosen: -85.6314
  • Logits/rejected: -0.7970
  • Logits/chosen: -0.7188

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1906 0.4 50 0.1802 -1.0109 -11.1903 0.7400 10.1794 -98.0078 -84.8196 -0.7939 -0.7206
0.1386 0.8 100 0.1802 -1.2190 -12.1625 0.7400 10.9435 -99.9523 -85.2358 -0.7944 -0.7197
0.1386 1.2 150 0.1802 -1.2782 -12.5852 0.7400 11.3070 -100.7976 -85.3541 -0.7943 -0.7189
0.1733 1.6 200 0.1802 -1.3094 -13.0296 0.7400 11.7202 -101.6864 -85.4166 -0.7948 -0.7186
0.2253 2.0 250 0.1802 -1.3248 -13.1625 0.7400 11.8377 -101.9522 -85.4473 -0.7952 -0.7186
0.1386 2.4 300 0.1802 -1.3337 -13.2622 0.7400 11.9285 -102.1515 -85.4652 -0.7942 -0.7174
0.1213 2.8 350 0.1802 -1.3670 -13.4507 0.7400 12.0837 -102.5286 -85.5317 -0.7953 -0.7178
0.1906 3.2 400 0.1802 -1.3818 -13.5334 0.7400 12.1517 -102.6941 -85.5613 -0.7964 -0.7189
0.1906 3.6 450 0.1802 -1.3800 -13.5899 0.7400 12.2099 -102.8071 -85.5577 -0.7964 -0.7189
0.2079 4.0 500 0.1802 -1.3816 -13.6722 0.7400 12.2906 -102.9716 -85.5610 -0.7966 -0.7187
0.156 4.4 550 0.1802 -1.4142 -13.7800 0.7400 12.3657 -103.1872 -85.6262 -0.7956 -0.7175
0.1213 4.8 600 0.1802 -1.3864 -13.7736 0.7400 12.3872 -103.1744 -85.5705 -0.7974 -0.7192
0.1906 5.2 650 0.1802 -1.4252 -13.8450 0.7400 12.4197 -103.3172 -85.6483 -0.7969 -0.7187
0.2426 5.6 700 0.1802 -1.4087 -13.8154 0.7400 12.4068 -103.2581 -85.6151 -0.7974 -0.7196
0.2599 6.0 750 0.1802 -1.4077 -13.8712 0.7400 12.4635 -103.3696 -85.6131 -0.7977 -0.7194
0.1213 6.4 800 0.1802 -1.4158 -13.9034 0.7400 12.4876 -103.4339 -85.6293 -0.7977 -0.7195
0.2426 6.8 850 0.1802 -1.4105 -13.8922 0.7400 12.4817 -103.4116 -85.6187 -0.7979 -0.7200
0.1733 7.2 900 0.1802 -1.4075 -13.8657 0.7400 12.4582 -103.3587 -85.6128 -0.7970 -0.7189
0.1386 7.6 950 0.1802 -1.4138 -13.8523 0.7400 12.4386 -103.3319 -85.6253 -0.7971 -0.7188
0.156 8.0 1000 0.1802 -1.4168 -13.8543 0.7400 12.4374 -103.3358 -85.6314 -0.7970 -0.7188

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1