Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-10

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 21.3506
  • Logps: -80.2051
  • Logits: -0.6148
  • Objective: 20.5661
  • Dpo Loss: 20.5661
  • Regularize: 20.5661
  • Ranking Simple: 0.5383
  • Ranking Idealized: 0.5212
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 6.6513

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
2.0094 0.1417 50 3.1068 -90.6242 -1.4592 3.0980 3.0980 3.0980 0.5259 0.5212 0.5212 7.7179
5.9165 0.2834 100 7.1487 -82.8335 -1.4642 7.1399 7.1399 7.1399 0.5300 0.5212 0.5212 7.4498
9.9617 0.4251 150 11.8998 -83.0745 -1.3437 11.3536 11.3536 11.3536 0.5305 0.5212 0.5212 7.2609
12.4724 0.5668 200 17.0987 -79.9360 -1.3880 16.0617 16.0617 16.0617 0.5300 0.5212 0.5212 7.2290
13.2936 0.7085 250 18.5309 -77.3150 -1.3641 17.7971 17.7971 17.7971 0.5342 0.5212 0.5212 7.2078
11.5204 0.8503 300 19.4344 -76.9798 -0.9941 18.7017 18.7017 18.7017 0.5357 0.5212 0.5212 7.0136
11.3717 0.9920 350 20.3959 -76.1623 -1.0426 19.0398 19.0398 19.0398 0.5409 0.5212 0.5212 7.0261
7.0971 1.1337 400 21.9279 -76.1458 -0.6236 21.6902 21.6902 21.6902 0.5388 0.5212 0.5212 7.1227
7.5725 1.2754 450 20.9480 -76.3924 -0.8352 20.3853 20.3853 20.3853 0.5373 0.5212 0.5212 6.8500
7.6466 1.4171 500 20.9821 -80.7806 -0.7483 20.2651 20.2651 20.2651 0.5326 0.5212 0.5212 6.8824
6.9565 1.5588 550 21.3506 -80.2051 -0.6148 20.5661 20.5661 20.5661 0.5383 0.5212 0.5212 6.6513
6.7183 1.7005 600 21.1265 -78.5344 -0.6067 20.0027 20.0027 20.0027 0.5367 0.5212 0.5212 6.6768
6.9931 1.8422 650 22.2083 -77.6509 -0.5872 21.4455 21.4455 21.4455 0.5383 0.5212 0.5212 6.8190
6.1685 1.9839 700 22.3607 -77.1493 -0.5436 21.5512 21.5512 21.5512 0.5404 0.5212 0.5212 6.7299
3.4811 2.1256 750 21.8349 -78.9312 -0.7313 21.1379 21.1379 21.1379 0.5424 0.5212 0.5212 6.8213
3.3995 2.2674 800 21.3539 -79.7115 -0.5475 20.4532 20.4532 20.4532 0.5362 0.5212 0.5212 6.6867

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
20
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-10

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-10