tsavage68's picture
End of training
b627550 verified
metadata
library_name: transformers
license: llama3
base_model: tsavage68/Na_L3_100steps_1e6rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_L3_350steps_1e7rate_01beta_cSFTDPO
    results: []

Na_L3_350steps_1e7rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_L3_100steps_1e6rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0088
  • Rewards/chosen: 0.6901
  • Rewards/rejected: -4.1623
  • Rewards/accuracies: 1.0
  • Rewards/margins: 4.8524
  • Logps/rejected: -83.1246
  • Logps/chosen: -17.9889
  • Logits/rejected: -0.9502
  • Logits/chosen: -0.8819

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 350

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6442 0.2667 50 0.6172 0.0288 -0.1304 1.0 0.1591 -42.8057 -24.6026 -0.9518 -0.8851
0.2897 0.5333 100 0.2504 0.1177 -1.1534 1.0 1.2711 -53.0359 -23.7135 -0.9534 -0.8871
0.0587 0.8 150 0.0469 0.4687 -2.7071 1.0 3.1758 -68.5731 -20.2031 -0.9553 -0.8874
0.0185 1.0667 200 0.0155 0.6102 -3.6824 1.0 4.2926 -78.3254 -18.7883 -0.9531 -0.8845
0.0097 1.3333 250 0.0096 0.6743 -4.0935 1.0 4.7678 -82.4367 -18.1468 -0.9518 -0.8835
0.0083 1.6 300 0.0088 0.6862 -4.1645 1.0 4.8507 -83.1466 -18.0285 -0.9504 -0.8819
0.0079 1.8667 350 0.0088 0.6901 -4.1623 1.0 4.8524 -83.1246 -17.9889 -0.9502 -0.8819

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1