Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8493
  • Logps: -132.8567
  • Logits: -1.8165
  • Objective: 0.8653
  • Dpo Loss: 0.8653
  • Regularize: 0.8653
  • Ranking Simple: 0.5347
  • Wo Beta: 10.9418

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Wo Beta
0.6719 0.1417 50 0.6856 -89.6776 -1.4697 0.6879 0.6879 0.6879 0.5269 7.9221
0.6459 0.2834 100 0.6765 -92.9954 -1.6511 0.6793 0.6793 0.6793 0.5347 7.8727
0.5993 0.4251 150 0.6771 -95.2729 -1.6963 0.6805 0.6805 0.6805 0.5347 8.2155
0.5557 0.5668 200 0.6858 -115.4680 -1.8150 0.6866 0.6866 0.6866 0.5295 7.9607
0.5428 0.7085 250 0.6745 -102.5668 -1.8495 0.6741 0.6741 0.6741 0.5367 7.9891
0.4987 0.8503 300 0.7119 -110.0949 -1.9277 0.7203 0.7203 0.7203 0.5373 8.9267
0.4599 0.9920 350 0.6886 -104.9833 -1.8474 0.6912 0.6912 0.6912 0.5352 8.3749
0.3498 1.1337 400 0.7463 -115.0889 -1.8807 0.7518 0.7518 0.7518 0.5518 9.5505
0.3361 1.2754 450 0.7563 -116.8004 -1.8356 0.7673 0.7673 0.7673 0.5419 9.7252
0.3584 1.4171 500 0.7635 -117.5167 -1.8626 0.7695 0.7695 0.7695 0.5419 9.6319
0.3343 1.5588 550 0.7698 -123.3863 -1.8209 0.7814 0.7814 0.7814 0.5352 9.8258
0.3105 1.7005 600 0.7679 -119.8231 -1.7866 0.7761 0.7761 0.7761 0.5383 9.8031
0.3412 1.8422 650 0.7750 -122.2944 -1.8323 0.7848 0.7848 0.7848 0.5383 9.9494
0.3156 1.9839 700 0.8013 -126.3939 -1.8338 0.8139 0.8139 0.8139 0.5378 10.3247
0.2183 2.1256 750 0.8467 -131.1257 -1.7999 0.8604 0.8604 0.8604 0.5352 10.8931
0.2338 2.2674 800 0.8480 -132.1160 -1.8070 0.8641 0.8641 0.8641 0.5352 10.9810
0.2015 2.4091 850 0.8572 -133.3811 -1.8018 0.8720 0.8720 0.8720 0.5378 11.0252
0.2348 2.5508 900 0.8530 -133.6796 -1.8114 0.8675 0.8675 0.8675 0.5378 10.9423
0.2268 2.6925 950 0.8525 -133.2829 -1.8136 0.8684 0.8684 0.8684 0.5336 10.9785
0.2198 2.8342 1000 0.8493 -132.8809 -1.8167 0.8652 0.8652 0.8652 0.5342 10.9383
0.2221 2.9759 1050 0.8493 -132.8567 -1.8165 0.8653 0.8653 0.8653 0.5347 10.9418

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
14
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-noES-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-noES-0.1