Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-0.1-W0

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 283.0078
  • Logps: -81.0689
  • Logits: -0.5212
  • Objective: 277.3703
  • Dpo Loss: 0.7209
  • Regularize: 0.6310
  • Ranking Simple: 0.5331
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 14.2695

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
176.8183 0.1417 50 0.6874 -1.4745 -93.3291 185.6754 183.7365 0.6030 0.5223 0.5212 0.4178 16.4509
168.1755 0.2834 100 0.6819 -1.4487 -93.7829 195.2546 190.2426 0.6030 0.5223 0.5342 0.4320 16.2920
182.7148 0.4251 150 0.6969 -1.2730 -89.7329 218.5299 213.4884 0.6030 0.5223 0.5336 0.4859 15.8045
203.0993 0.5668 200 0.7051 -1.0447 -79.7062 251.9243 242.6406 0.6030 0.5223 0.5326 0.5518 14.6949
207.5481 0.7085 250 0.7055 -1.0362 -80.0940 251.9905 244.3510 0.6030 0.5223 0.5305 0.5542 14.8158
193.4843 0.8503 300 0.7150 -0.7137 -80.4296 266.7107 258.3957 0.6030 0.5223 0.5290 0.5881 14.5431
182.6922 0.9920 350 0.7073 -0.6448 -76.3638 262.3346 254.6360 0.6030 0.5223 0.5357 0.5802 14.6176
166.9683 1.1337 400 0.7152 -0.6392 -78.3482 272.3288 264.9111 0.6030 0.5223 0.5274 0.6056 14.6513
155.9364 1.2754 450 0.7186 -0.4207 -80.5230 275.0490 268.8637 0.6030 0.5223 0.5321 0.6129 14.7777
143.4724 1.4171 500 0.7209 -0.5141 -80.5587 275.9663 270.0383 0.6030 0.5223 0.5269 0.6150 14.4364
141.3444 1.5588 550 0.7139 -0.6338 -81.1271 275.0851 269.2189 0.6030 0.5223 0.5378 0.6159 14.6425
136.172 1.7029 600 273.6681 -79.4221 -0.5857 264.6510 0.7111 0.6012 0.5373 0.6030 0.5223 14.5631
130.7133 1.8446 650 276.3609 -80.2130 -0.4215 269.6939 0.7193 0.6141 0.5342 0.6030 0.5223 14.5456
122.624 1.9863 700 278.4690 -80.9968 -0.5263 271.4757 0.7178 0.6190 0.5378 0.6030 0.5223 14.4664
108.7022 2.1280 750 282.5668 -84.0088 -0.4657 276.0201 0.7207 0.6302 0.5347 0.6030 0.5223 14.4517
104.1923 2.2697 800 278.0555 -81.6313 -0.4640 272.7622 0.7166 0.6210 0.5383 0.6030 0.5223 14.4307
99.0867 2.4114 850 283.0078 -81.0689 -0.5212 277.3703 0.7209 0.6310 0.5331 0.6030 0.5223 14.2695
91.7475 2.5531 900 279.6676 -81.6144 -0.5149 275.1769 0.7200 0.6279 0.5373 0.6030 0.5223 14.3570
87.8681 2.6949 950 281.5718 -81.8544 -0.4428 275.7560 0.7191 0.6277 0.5362 0.6030 0.5223 14.3509
81.742 2.8366 1000 279.1324 -81.4412 -0.4951 274.5647 0.7197 0.6257 0.5336 0.6030 0.5223 14.3551
76.4372 2.9783 1050 279.1884 -82.3960 -0.4502 273.9026 0.7184 0.6249 0.5336 0.6030 0.5223 14.3203
67.4698 3.1200 1100 280.5317 -82.9107 -0.4190 274.7932 0.7169 0.6260 0.5326 0.6030 0.5223 14.3418

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-ES-0.1

Finetuned
(50)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W0-ES-0.1