Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-TRY

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8143
  • Logps: -120.0593
  • Logits: -2.4478
  • Objective: 0.8312
  • Dpo Loss: 0.8312
  • Regularize: 0.8312
  • Ranking Simple: 0.5839
  • Ranking Idealized: 0.6046
  • Ranking Idealized Expo: 0.5280
  • Dpo Wo Beta: -5.3609

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 72
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Dpo Loss Dpo Wo Beta Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize
0.5954 0.3004 53 0.7113 -2.2659 -1.8928 -101.3674 0.6816 0.7113 0.5888 0.5093 0.5238 0.7113
0.4618 0.6009 106 0.6936 -2.4624 -1.9007 -94.3571 0.6913 0.6936 0.5888 0.5093 0.5351 0.6936
0.3986 0.9013 159 0.7215 -3.1229 -2.1450 -95.6001 0.7014 0.7215 0.5888 0.5093 0.5351 0.7215
0.2551 1.2017 212 0.7525 -3.7750 -2.2678 -98.1427 0.7351 0.7525 0.5888 0.5093 0.5372 0.7525
0.2623 1.5021 265 0.7739 -4.1634 -2.1478 -100.8313 0.7400 0.7739 0.5888 0.5093 0.5393 0.7739
0.2571 1.8026 318 0.7665 -4.0950 -1.9888 -102.3712 0.7401 0.7665 0.5888 0.5093 0.5393 0.7665
0.1227 2.1030 371 0.9224 -6.4510 -1.8645 -122.0016 0.8844 0.9224 0.5888 0.5093 0.5424 0.9224
0.133 2.4034 424 0.8786 -5.8878 -2.0277 -117.1217 0.8448 0.8786 0.5888 0.5093 0.5413 0.8786
0.1211 2.7085 477 0.8739 -5.8152 -2.0272 -116.4230 0.8371 0.8739 0.5888 0.5093 0.5403 0.8739
0.0858 1.5045 530 0.8753 -5.9229 -2.4530 -118.2529 0.8505 0.8753 0.6046 0.5280 0.5683 0.8753
0.1274 1.6547 583 0.8264 -5.2847 -2.4380 -119.5907 0.8086 0.8264 0.6046 0.5280 0.5870 0.8264
0.1614 1.8049 636 0.8243 -5.2813 -2.4850 -117.8585 0.8209 0.8243 0.6046 0.5280 0.5818 0.8243
0.1616 1.9551 689 0.8576 -5.7234 -2.4656 -119.3221 0.8383 0.8576 0.6046 0.5280 0.5797 0.8576
0.1063 2.1053 742 0.9824 -7.3310 -2.2712 -133.3637 0.9486 0.9824 0.6046 0.5280 0.5518 0.9824
0.1017 2.2556 795 0.8904 -6.2055 -2.4490 -123.5745 0.8711 0.8904 0.6046 0.5280 0.5683 0.8904
0.1225 2.4058 848 0.9035 -6.3529 -2.4743 -124.5336 0.8822 0.9035 0.6046 0.5280 0.5569 0.9035
0.1157 2.5583 901 0.8718 -124.4583 -2.4886 0.8941 0.8941 0.8941 0.5621 0.6046 0.5280 -6.2136
0.1387 2.7085 954 0.8688 -123.4143 -2.5086 0.8892 0.8892 0.8892 0.5580 0.6046 0.5280 -6.1610
0.1219 2.8588 1007 0.8682 -123.1454 -2.5127 0.8882 0.8882 0.8882 0.5600 0.6046 0.5280 -6.1537

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY3

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY3