Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3420
  • Logps: -83.4105
  • Logits: -0.6597
  • Objective: 2.2592
  • Dpo Loss: 2.2592
  • Regularize: 2.2592
  • Ranking Simple: 0.5404
  • Ranking Idealized: 0.5295
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 6.6836

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.7017 0.1417 50 0.8470 -93.0245 -1.4582 0.8570 0.8570 0.8570 0.5238 0.5295 0.5212 7.8506
0.8112 0.2834 100 1.0529 -86.6804 -1.4383 1.0274 1.0274 1.0274 0.5285 0.5295 0.5212 7.4982
1.0895 0.4251 150 1.4498 -84.4335 -1.2965 1.4010 1.4010 1.4010 0.5321 0.5295 0.5212 7.2692
1.2362 0.5668 200 1.7035 -77.7194 -1.2955 1.6116 1.6116 1.6116 0.5321 0.5295 0.5212 7.2264
1.3151 0.7085 250 1.9222 -92.7224 -1.2565 1.8319 1.8319 1.8319 0.5311 0.5295 0.5212 7.1855
1.1899 0.8503 300 2.0297 -90.9351 -0.9786 1.9587 1.9587 1.9587 0.5367 0.5295 0.5212 6.9337
1.1441 0.9920 350 2.1653 -82.1291 -1.0211 2.0545 2.0545 2.0545 0.5424 0.5295 0.5212 7.0017
0.725 1.1337 400 2.2886 -84.3458 -0.7529 2.2360 2.2360 2.2360 0.5331 0.5295 0.5212 7.1541
0.7626 1.2754 450 2.1595 -80.5955 -0.8863 2.0657 2.0657 2.0657 0.5326 0.5295 0.5212 6.7939
0.8048 1.4171 500 2.2134 -82.3489 -0.7432 2.0975 2.0975 2.0975 0.5342 0.5295 0.5212 6.7984
0.7106 1.5588 550 2.1705 -85.0673 -0.6665 2.0696 2.0696 2.0696 0.5321 0.5295 0.5212 6.8614
0.6934 1.7005 600 2.2127 -81.6773 -0.7358 2.0693 2.0693 2.0693 0.5362 0.5295 0.5212 6.7265
0.6885 1.8422 650 2.2198 -82.8870 -0.6787 2.1432 2.1432 2.1432 0.5362 0.5295 0.5212 6.8202
0.6477 1.9839 700 2.3420 -83.4105 -0.6597 2.2592 2.2592 2.2592 0.5404 0.5295 0.5212 6.6836
0.3785 2.1256 750 2.2919 -84.0369 -0.7841 2.2005 2.2005 2.2005 0.5435 0.5295 0.5212 6.8514
0.3316 2.2674 800 2.2220 -84.2990 -0.6767 2.1123 2.1123 2.1123 0.5409 0.5295 0.5212 6.7663
0.3283 2.4091 850 2.3020 -85.0834 -0.6538 2.2212 2.2212 2.2212 0.5409 0.5295 0.5212 6.7773
0.3516 2.5508 900 2.2723 -84.7564 -0.6225 2.1911 2.1911 2.1911 0.5362 0.5295 0.5212 6.8162
0.3245 2.6925 950 2.3304 -83.6421 -0.7129 2.2523 2.2523 2.2523 0.5336 0.5295 0.5212 6.8942

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-1

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-1