Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES2-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 324.6227
  • Logps: -88.5336
  • Logits: -1.1616
  • Objective: 321.3043
  • Dpo Loss: 0.6772
  • Ranking Simple: 0.5471

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Ranking Simple
298.227 0.1417 50 317.0191 -89.8084 -1.5244 315.7606 0.6789 0.5311
275.3632 0.2834 100 315.6042 -94.9782 -1.5619 311.9210 0.6685 0.5430
263.5541 0.4251 150 323.5653 -86.5190 -1.4200 317.0345 0.6707 0.5466
251.2096 0.5668 200 331.8273 -95.4989 -1.6021 324.8527 0.6841 0.5455
260.8009 0.7085 250 335.3750 -89.0524 -1.3996 327.2677 0.6886 0.5383
241.17 0.8503 300 326.9033 -86.8043 -1.3637 321.4521 0.6751 0.5518
217.6821 0.9920 350 330.4503 -84.6762 -1.1931 320.4708 0.6764 0.5497
196.3838 1.1337 400 331.8378 -85.2363 -1.1204 324.6776 0.6834 0.5492
196.8245 1.2754 450 328.4219 -87.7967 -1.1766 321.0422 0.6758 0.5502
206.0427 1.4171 500 327.5086 -86.1759 -1.1174 321.6342 0.6782 0.5502
185.0637 1.5588 550 325.0724 -92.8798 -1.1304 320.1057 0.6694 0.5554
183.7322 1.7005 600 324.3962 -89.2066 -0.9798 319.9032 0.6742 0.5554
206.6587 1.8422 650 324.2286 -88.9897 -1.1055 319.4041 0.6736 0.5533
188.4019 1.9839 700 323.3723 -89.3024 -1.0999 318.8545 0.6727 0.5497
165.7588 2.1256 750 324.4387 -89.4604 -1.1334 320.7416 0.6772 0.5502
164.7524 2.2674 800 323.6566 -89.3705 -1.1091 320.3278 0.6756 0.5497
160.4428 2.4091 850 324.1759 -88.9819 -1.1590 321.1419 0.6765 0.5461
164.2802 2.5508 900 324.5910 -88.9856 -1.1737 321.6709 0.6779 0.5461
168.6074 2.6925 950 324.7719 -88.5500 -1.1605 321.5456 0.6777 0.5461
165.6921 2.8342 1000 324.6280 -88.5341 -1.1614 321.3309 0.6773 0.5461
160.429 2.9759 1050 324.6226 -88.5336 -1.1616 321.3043 0.6772 0.5471

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES2-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES2-0.1