Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-0.001

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3942
  • Logps: -573.5075
  • Logits: -8.8910
  • Objective: 0.3931
  • Dpo Loss: 0.6728
  • Regularize: 0.3931
  • Ranking Simple: 0.6102
  • Ranking Idealized: 0.9871
  • Ranking Idealized Expo: 0.6320
  • Wo Beta: 160.3578

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
0.418 0.1417 50 0.6927 -1.8105 -107.4392 0.4150 0.4128 0.9871 0.6320 0.5352 0.4128 22.0269
0.416 0.2834 100 0.6896 -2.0485 -230.8855 0.4087 0.4081 0.9871 0.6320 0.5559 0.4081 52.9494
0.387 0.4251 150 0.6844 -3.9840 -343.5519 0.4032 0.4021 0.9871 0.6320 0.5766 0.4021 90.2215
0.3587 0.5668 200 0.6754 -6.1681 -390.3867 0.3917 0.3893 0.9871 0.6320 0.6004 0.3893 124.6577
0.3299 0.7085 250 0.6765 -7.7444 -474.0688 0.3968 0.3968 0.9871 0.6320 0.5958 0.3968 147.7626
0.294 0.8503 300 0.6728 -8.8910 -573.5075 0.3942 0.3931 0.9871 0.6320 0.6102 0.3931 160.3578
0.2753 0.9920 350 0.6731 -9.9981 -593.1101 0.3965 0.3960 0.9871 0.6320 0.5937 0.3960 171.5761
0.2316 1.1337 400 0.6718 -9.6479 -564.7661 0.3966 0.3956 0.9871 0.6320 0.5875 0.3956 171.6054
0.2205 1.2754 450 0.6725 -10.9673 -599.2516 0.3962 0.3983 0.9871 0.6320 0.5859 0.3983 182.4877
0.2058 1.4171 500 0.6741 -9.6175 -589.5045 0.4005 0.4029 0.9871 0.6320 0.5797 0.4029 188.1013
0.2027 1.5588 550 0.6730 -10.3937 -622.4691 0.3995 0.4000 0.9871 0.6320 0.5947 0.4000 185.8620
0.1897 1.7029 600 0.4028 -755.1119 -11.5540 0.4023 0.6716 0.4023 0.5952 0.9871 0.6320 201.2357
0.1797 1.8446 650 0.3997 -673.7770 -10.8193 0.3992 0.6730 0.3992 0.5942 0.9871 0.6320 188.3079
0.1689 1.9863 700 0.3985 -653.8336 -11.0772 0.3970 0.6713 0.3970 0.5911 0.9871 0.6320 182.3852
0.1492 2.1280 750 0.3959 -624.3672 -11.4717 0.3956 0.6708 0.3956 0.6025 0.9871 0.6320 182.7602
0.143 2.2697 800 0.3955 -657.3067 -11.2559 0.3958 0.6701 0.3958 0.6009 0.9871 0.6320 190.5371

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.001

Finetuned
(68)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.001