Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0580
  • Logps: -81.6322
  • Logits: -0.3472
  • Objective: 0.0564
  • Dpo Loss: 0.7177
  • Regularize: 0.6547
  • Ranking Simple: 0.5450
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 14.1503

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.0387 0.1417 50 0.0387 -89.7823 -1.4878 0.0386 0.6849 0.4152 0.5259 0.6030 0.5223 16.3818
0.0376 0.2834 100 0.0400 -87.3577 -1.4813 0.0392 0.6794 0.4266 0.5326 0.6030 0.5223 16.1307
0.041 0.4251 150 0.0453 -80.2072 -1.3514 0.0446 0.6965 0.4931 0.5280 0.6030 0.5223 15.6134
0.0451 0.5668 200 0.0473 -78.7022 -0.9891 0.0464 0.6948 0.5121 0.5300 0.6030 0.5223 15.2852
0.0483 0.7085 250 0.0523 -73.9930 -0.9959 0.0507 0.7054 0.5778 0.5393 0.6030 0.5223 15.1111
0.0487 0.8503 300 0.0531 -79.5956 -1.0801 0.0509 0.7126 0.5977 0.5342 0.6030 0.5223 14.5847
0.0485 0.9920 350 0.0548 -76.9095 -0.8726 0.0533 0.7110 0.6159 0.5378 0.6030 0.5223 14.4121
0.0529 1.1337 400 0.0587 -78.7635 -0.4139 0.0575 0.7255 0.6577 0.5378 0.6030 0.5223 14.3951
0.0493 1.2754 450 0.0584 -78.9623 -0.4738 0.0572 0.7243 0.6702 0.5430 0.6030 0.5223 14.5363
0.0447 1.4171 500 0.0572 -78.1551 -0.4434 0.0565 0.7180 0.6433 0.5336 0.6030 0.5223 14.5089
0.0421 1.5588 550 0.0577 -78.4112 -0.3865 0.0563 0.7126 0.6425 0.5399 0.6030 0.5223 14.5141
0.0415 1.7005 600 0.0583 -80.4593 -0.2526 0.0569 0.7205 0.6520 0.5352 0.6030 0.5223 14.5863
0.0409 1.8422 650 0.0573 -78.7705 -0.3179 0.0556 0.7195 0.6460 0.5409 0.6030 0.5223 14.3763
0.0377 1.9839 700 0.0579 -79.7789 -0.4899 0.0557 0.7221 0.6579 0.5450 0.6030 0.5223 14.5156
0.0339 2.1256 750 0.0577 -80.8265 -0.4062 0.0555 0.7193 0.6551 0.5455 0.6030 0.5223 14.2194
0.0346 2.2674 800 0.0577 -81.8186 -0.2681 0.0559 0.7190 0.6534 0.5440 0.6030 0.5223 14.3033
0.0334 2.4091 850 0.0585 -83.2126 -0.2941 0.0564 0.7213 0.6627 0.5419 0.6030 0.5223 14.4189
0.032 2.5508 900 0.0580 -82.8344 -0.2672 0.0564 0.7173 0.6562 0.5404 0.6030 0.5223 14.2070
0.029 2.6925 950 0.0580 -81.6322 -0.3472 0.0564 0.7177 0.6547 0.5450 0.6030 0.5223 14.1503
0.0242 2.8342 1000 0.0572 -81.8476 -0.3613 0.0555 0.7141 0.6463 0.5435 0.6030 0.5223 14.2684
0.0262 2.9759 1050 0.0582 -82.2240 -0.3030 0.0566 0.7193 0.6593 0.5409 0.6030 0.5223 14.2806
0.0234 3.1176 1100 0.0584 -83.6653 -0.2790 0.0568 0.7198 0.6624 0.5404 0.6030 0.5223 14.3429
0.022 3.2593 1150 0.0581 -83.5282 -0.3076 0.0563 0.7167 0.6564 0.5440 0.6030 0.5223 14.3960
0.021 3.4010 1200 0.0574 -82.2867 -0.3495 0.0557 0.7152 0.6455 0.5393 0.6030 0.5223 14.2067

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1

Finetuned
(68)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W2-ES-0.1