hZzy's picture
End of training
dee307d verified
|
raw
history blame
5.71 kB
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-DPO-ES-1
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3243
  • Logps: -83.2882
  • Logits: -0.6651
  • Objective: 2.2471
  • Dpo Loss: 2.2471
  • Regularize: 2.2471
  • Ranking Simple: 0.5378
  • Ranking Idealized: 0.5295
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 6.6815

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.7017 0.1417 50 0.8470 -93.0243 -1.4582 0.8570 0.8570 0.8570 0.5238 0.5295 0.5212 7.8507
0.8112 0.2834 100 1.0529 -86.6835 -1.4382 1.0273 1.0273 1.0273 0.5285 0.5295 0.5212 7.4982
1.0895 0.4251 150 1.4497 -84.4337 -1.2965 1.4010 1.4010 1.4010 0.5321 0.5295 0.5212 7.2692
1.2363 0.5668 200 1.7035 -77.7201 -1.2956 1.6116 1.6116 1.6116 0.5321 0.5295 0.5212 7.2264
1.3152 0.7085 250 1.9222 -92.7241 -1.2565 1.8319 1.8319 1.8319 0.5311 0.5295 0.5212 7.1856
1.1899 0.8503 300 2.0298 -90.9373 -0.9785 1.9588 1.9588 1.9588 0.5367 0.5295 0.5212 6.9336
1.1443 0.9920 350 2.1654 -82.1414 -1.0214 2.0541 2.0541 2.0541 0.5435 0.5295 0.5212 7.0024
0.725 1.1337 400 2.2884 -84.2526 -0.7535 2.2360 2.2360 2.2360 0.5336 0.5295 0.5212 7.1525
0.7629 1.2754 450 2.1606 -80.4165 -0.8866 2.0671 2.0671 2.0671 0.5321 0.5295 0.5212 6.7949
0.8044 1.4171 500 2.2094 -82.3927 -0.7503 2.0981 2.0981 2.0981 0.5347 0.5295 0.5212 6.8050
0.7105 1.5588 550 2.1697 -84.9780 -0.6734 2.0733 2.0733 2.0733 0.5321 0.5295 0.5212 6.8722
0.6925 1.7005 600 2.1957 -81.5342 -0.7411 2.0558 2.0558 2.0558 0.5357 0.5295 0.5212 6.7186
0.6883 1.8422 650 2.2080 -82.7303 -0.6908 2.1330 2.1330 2.1330 0.5383 0.5295 0.5212 6.8081
0.6486 1.9839 700 2.3243 -83.2882 -0.6651 2.2471 2.2471 2.2471 0.5378 0.5295 0.5212 6.6815
0.3793 2.1256 750 2.2675 -84.2296 -0.7879 2.1825 2.1825 2.1825 0.5409 0.5295 0.5212 6.8794
0.3314 2.2674 800 2.2106 -84.3675 -0.6651 2.1041 2.1041 2.1041 0.5414 0.5295 0.5212 6.7463
0.3301 2.4091 850 2.2964 -84.8913 -0.6177 2.2221 2.2221 2.2221 0.5388 0.5295 0.5212 6.8020
0.3509 2.5508 900 2.2796 -84.3833 -0.6097 2.2099 2.2099 2.2099 0.5393 0.5295 0.5212 6.7934
0.321 2.6925 950 2.3403 -83.2967 -0.7158 2.2649 2.2649 2.2649 0.5331 0.5295 0.5212 6.8864

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1