Visualize in Weights & Biases

qwen2.5-0.5b-expo-L1EXPO-ES-10

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 51.6794
  • Logps: -80.6186
  • Logits: -0.5140
  • Objective: 51.5628
  • Dpo Loss: 26.2794
  • Regularize: 51.5628
  • Ranking Simple: 0.5248
  • Ranking Idealized: 0.5212
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 14.0744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
4.2778 0.1417 50 2.8787 -1.4301 -91.7813 5.6511 5.5786 0.5212 0.5212 0.5243 5.5786 16.1070
17.3516 0.2834 100 7.9687 -1.3171 -86.6635 15.6835 15.7548 0.5212 0.5212 0.5280 15.7548 15.6261
28.6009 0.4251 150 15.0002 -1.1259 -81.4986 29.0753 28.9045 0.5212 0.5212 0.5243 28.9045 15.2369
35.0698 0.5668 200 21.3918 -0.8776 -82.1578 41.1263 40.4593 0.5212 0.5212 0.5124 40.4593 14.9112
37.7822 0.7085 250 21.9288 -0.6419 -83.0039 44.0746 43.3933 0.5212 0.5212 0.5280 43.3933 14.6204
35.2811 0.8503 300 21.4307 -0.5316 -83.8429 43.6626 43.4643 0.5212 0.5212 0.5321 43.4643 14.5447
33.8034 0.9920 350 23.3301 -0.5934 -84.0573 45.2649 45.3586 0.5212 0.5212 0.5238 45.3586 14.6023
30.8702 1.1337 400 23.8270 -0.6271 -82.2022 47.2698 47.2674 0.5212 0.5212 0.5248 47.2674 14.3367
29.5027 1.2754 450 25.1794 -0.5508 -82.7233 49.3412 49.4737 0.5212 0.5212 0.5202 49.4737 14.3433
27.7693 1.4171 500 24.6274 -0.5208 -83.1404 48.4138 48.5616 0.5212 0.5212 0.5181 48.5616 14.3259
26.3455 1.5588 550 24.8876 -0.5377 -81.6711 49.4754 49.7513 0.5212 0.5212 0.5264 49.7513 14.2335
25.3777 1.7005 600 24.6279 -0.5633 -81.3699 48.8078 49.2645 0.5212 0.5212 0.5238 49.2645 14.1972
24.4429 1.8422 650 25.3419 -0.4757 -81.6565 49.7105 49.8172 0.5212 0.5212 0.5192 49.8172 14.3368
22.5358 1.9839 700 26.2794 -0.5140 -80.6186 51.6794 51.5628 0.5212 0.5212 0.5248 51.5628 14.0744
20.6864 2.1256 750 25.7920 -0.4511 -83.9474 50.9028 51.1398 0.5212 0.5212 0.5274 51.1398 14.2847
19.5881 2.2674 800 26.2232 -0.4519 -84.1413 51.4440 51.8351 0.5212 0.5212 0.5274 51.8351 14.2120
18.5246 2.4091 850 26.5269 -0.5061 -82.9639 52.2825 52.2313 0.5212 0.5212 0.5285 52.2313 14.1205
17.4115 2.5508 900 26.5477 -0.5079 -83.9889 52.2686 52.2795 0.5212 0.5212 0.5290 52.2795 14.1975
16.2052 2.6925 950 26.6571 -0.4691 -83.1267 52.4042 52.3891 0.5212 0.5212 0.5238 52.3891 14.2985
15.0384 2.8389 1000 51.7636 -82.8277 -0.4551 51.6447 26.1645 51.6447 0.5264 0.5212 0.5212 14.2036
14.381 2.9806 1050 51.8214 -83.0540 -0.4122 51.9024 26.5043 51.9024 0.5248 0.5212 0.5212 14.1669
12.5437 3.1223 1100 51.6017 -83.8731 -0.4408 51.8998 26.1851 51.8998 0.5254 0.5212 0.5212 14.1769
11.3828 3.2641 1150 51.5869 -84.2104 -0.4506 51.7268 26.2023 51.7268 0.5259 0.5212 0.5212 14.1768
10.5152 3.4058 1200 51.5859 -84.1485 -0.4568 51.6626 26.3073 51.6626 0.5254 0.5212 0.5212 14.1450

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-10

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-ES-10