Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 187.9473
  • Logps: -88.4161
  • Logits: -1.2945
  • Objective: 183.8510
  • Dpo Loss: 0.6799
  • Regularize: 0.4168
  • Ranking Simple: 0.5326
  • Ranking Idealized: 0.6025
  • Ranking Idealized Expo: 0.5233
  • Wo Beta: 15.9953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
181.5404 0.1417 50 182.3598 -90.9189 -1.4233 180.3279 0.6890 0.4088 0.5264 0.6025 0.5233 16.2974
156.9096 0.2834 100 181.8641 -91.4510 -1.4702 180.3150 0.6855 0.4101 0.5316 0.6025 0.5233 16.3731
145.838 0.4251 150 180.6479 -90.7049 -1.4504 178.1705 0.6790 0.4023 0.5383 0.6025 0.5233 16.5880
140.9398 0.5668 200 184.4987 -90.3330 -1.3895 181.5451 0.6803 0.4101 0.5326 0.6025 0.5233 16.1415
131.1439 0.7085 250 182.2077 -91.2246 -1.4789 178.4409 0.6797 0.4059 0.5326 0.6025 0.5233 16.3683
118.3192 0.8503 300 183.4459 -92.5771 -1.4552 180.4714 0.6817 0.4123 0.5326 0.6025 0.5233 16.4041
108.5029 0.9920 350 183.9593 -92.1804 -1.4317 180.1151 0.6782 0.4095 0.5321 0.6025 0.5233 16.3367
104.6813 1.1337 400 183.8759 -89.7261 -1.3930 180.2840 0.6801 0.4094 0.5311 0.6025 0.5233 16.2209
90.5585 1.2754 450 184.0673 -91.2037 -1.3663 180.6296 0.6795 0.4105 0.5357 0.6025 0.5233 16.2889
91.2372 1.4171 500 185.7194 -89.4298 -1.3281 180.8790 0.6782 0.4110 0.5347 0.6025 0.5233 16.0443
85.7307 1.5588 550 186.2241 -91.6866 -1.3683 182.1382 0.6799 0.4147 0.5336 0.6025 0.5233 16.1863
79.9458 1.7005 600 186.2137 -91.0847 -1.3519 181.8686 0.6794 0.4135 0.5373 0.6025 0.5233 16.1060
86.7578 1.8422 650 186.7196 -89.4070 -1.3403 182.4970 0.6797 0.4141 0.5316 0.6025 0.5233 16.0270
76.2665 1.9839 700 186.2802 -89.5857 -1.3223 182.3933 0.6800 0.4136 0.5311 0.6025 0.5233 16.1117
65.1575 2.1256 750 188.1571 -90.2454 -1.3253 184.2076 0.6806 0.4179 0.5321 0.6025 0.5233 15.9179
66.0375 2.2674 800 186.7221 -88.5874 -1.3137 181.9355 0.6781 0.4137 0.5336 0.6025 0.5233 15.9879
55.6773 2.4091 850 189.5397 -88.2689 -1.3112 185.2096 0.6809 0.4203 0.5300 0.6025 0.5233 15.9311
54.3682 2.5508 900 188.2381 -88.3611 -1.3259 184.1678 0.6793 0.4167 0.5311 0.6025 0.5233 15.9680
50.3775 2.6925 950 189.5419 -88.8005 -1.3091 185.0044 0.6802 0.4183 0.5331 0.6025 0.5233 15.9986
45.9449 2.8342 1000 187.7079 -87.8148 -1.2990 183.5676 0.6792 0.4161 0.5300 0.6025 0.5233 15.9960
49.0003 2.9759 1050 188.0040 -88.3004 -1.2778 184.0016 0.6792 0.4173 0.5342 0.6025 0.5233 16.0403
40.2428 3.1176 1100 188.7166 -88.6470 -1.2988 184.3815 0.6801 0.4181 0.5326 0.6025 0.5233 15.9981
37.177 3.2593 1150 188.2563 -87.9239 -1.2843 184.3351 0.6804 0.4183 0.5357 0.6025 0.5233 16.0123
34.9809 3.4010 1200 189.1705 -88.1129 -1.2900 184.8718 0.6806 0.4193 0.5326 0.6025 0.5233 15.9531
34.073 3.5427 1250 188.2203 -88.5617 -1.2892 184.1966 0.6802 0.4177 0.5336 0.6025 0.5233 16.0022
28.4565 3.6845 1300 188.3189 -88.0836 -1.2942 184.1293 0.6803 0.4178 0.5331 0.6025 0.5233 16.0081
27.4636 3.8262 1350 188.3022 -88.4586 -1.2973 184.2191 0.6803 0.4178 0.5321 0.6025 0.5233 15.9996
27.3902 3.9679 1400 187.9691 -88.3135 -1.2974 183.7816 0.6798 0.4168 0.5321 0.6025 0.5233 15.9788
21.2906 4.1096 1450 187.8985 -88.1212 -1.2976 183.6546 0.6796 0.4164 0.5321 0.6025 0.5233 15.9853
19.8787 4.2513 1500 188.0825 -88.3078 -1.2942 183.8684 0.6799 0.4169 0.5321 0.6025 0.5233 15.9839
18.4741 4.3930 1550 188.0407 -88.4855 -1.2951 184.0446 0.6802 0.4173 0.5326 0.6025 0.5233 15.9950
20.4794 4.5347 1600 187.9061 -88.4381 -1.2950 183.8276 0.6799 0.4168 0.5331 0.6025 0.5233 16.0004
17.2115 4.6764 1650 187.9504 -88.4174 -1.2938 183.8566 0.6798 0.4168 0.5326 0.6025 0.5233 15.9942
16.5799 4.8181 1700 187.9360 -88.4220 -1.2946 183.8405 0.6799 0.4168 0.5326 0.6025 0.5233 15.9963
16.689 4.9598 1750 187.9473 -88.4162 -1.2945 183.8510 0.6799 0.4168 0.5326 0.6025 0.5233 15.9953

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
17
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-noES2-0.1