Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-TRY

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6811
  • Logps: -89.5089
  • Logits: -2.2697
  • Objective: 0.6619
  • Dpo Loss: 0.6619
  • Regularize: 0.6619
  • Ranking Simple: 0.5735
  • Ranking Idealized: 0.6046
  • Ranking Idealized Expo: 0.5280
  • Dpo Wo Beta: -2.3796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 72
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Dpo Loss Dpo Wo Beta Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize
0.6857 0.0709 50 0.6927 -1.2807 -1.9606 -88.9841 0.6914 0.6927 0.6046 0.5280 0.5362 0.6927
0.6524 0.1417 100 0.7010 -1.8911 -2.0579 -98.6358 0.6922 0.7010 0.6046 0.5280 0.5269 0.7010
0.6123 0.2126 150 0.7015 -2.1166 -1.9033 -102.8927 0.6967 0.7015 0.6046 0.5280 0.5280 0.7015
0.5779 0.2834 200 0.6816 -2.1417 -2.0716 -106.4944 0.6794 0.6816 0.6046 0.5280 0.5507 0.6816
0.5709 0.3543 250 0.6817 -2.2676 -2.2470 -87.7326 0.6883 0.6817 0.6046 0.5280 0.5424 0.6817
0.5563 0.4251 300 0.6619 -2.3796 -2.2697 -89.5089 0.6811 0.6619 0.6046 0.5280 0.5735 0.6619
0.5321 0.4960 350 0.6773 -2.6295 -2.3683 -99.0927 0.6926 0.6773 0.6046 0.5280 0.5735 0.6773
0.4963 0.5668 400 0.6836 -2.6913 -2.2508 -106.7073 0.6914 0.6836 0.6046 0.5280 0.5673 0.6836
0.4745 0.6377 450 0.6938 -105.8669 -2.2347 0.6815 0.6815 0.6815 0.5631 0.6046 0.5280 -2.6738
0.4867 0.7085 500 0.7040 -105.1848 -2.2182 0.6995 0.6995 0.6995 0.5507 0.6046 0.5280 -2.7257
0.4582 0.7794 550 0.6995 -102.6643 -2.3855 0.7027 0.7027 0.7027 0.5683 0.6046 0.5280 -3.1023
0.4339 0.8503 600 0.6965 -103.5456 -2.4456 0.7050 0.7050 0.7050 0.5735 0.6046 0.5280 -3.2166

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY