Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-TRY

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8437
  • Logps: -117.1633
  • Logits: -2.0009
  • Objective: 0.8798
  • Dpo Loss: 0.8798
  • Regularize: 0.8798
  • Ranking Simple: 0.5403
  • Ranking Idealized: 0.5888
  • Ranking Idealized Expo: 0.5093
  • Dpo Wo Beta: -5.9006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 288
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Dpo Loss Dpo Wo Beta Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize
0.5954 0.3004 53 0.7113 -2.2659 -1.8928 -101.3674 0.6816 0.7113 0.5888 0.5093 0.5238 0.7113
0.4618 0.6009 106 0.6936 -2.4624 -1.9007 -94.3571 0.6913 0.6936 0.5888 0.5093 0.5351 0.6936
0.3986 0.9013 159 0.7215 -3.1229 -2.1450 -95.6001 0.7014 0.7215 0.5888 0.5093 0.5351 0.7215
0.2551 1.2017 212 0.7525 -3.7750 -2.2678 -98.1427 0.7351 0.7525 0.5888 0.5093 0.5372 0.7525
0.2623 1.5021 265 0.7739 -4.1634 -2.1478 -100.8313 0.7400 0.7739 0.5888 0.5093 0.5393 0.7739
0.2571 1.8026 318 0.7665 -4.0950 -1.9888 -102.3712 0.7401 0.7665 0.5888 0.5093 0.5393 0.7665
0.1227 2.1030 371 0.9224 -6.4510 -1.8645 -122.0016 0.8844 0.9224 0.5888 0.5093 0.5424 0.9224
0.133 2.4034 424 0.8786 -5.8878 -2.0277 -117.1217 0.8448 0.8786 0.5888 0.5093 0.5413 0.8786
0.1211 2.7085 477 0.8371 -116.4230 -2.0272 0.8739 0.8739 0.8739 0.5403 0.5888 0.5093 -5.8152

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY2

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY2