hZzy's picture
End of training
42e9b46 verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L1EXPO-ES-0.1
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L1EXPO-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5234
  • Logps: -82.5192
  • Logits: -0.4757
  • Objective: 0.5225
  • Dpo Loss: 0.7512
  • Regularize: 0.5225
  • Ranking Simple: 0.5254
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 14.0055

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
0.0448 0.1417 50 0.6936 -1.4299 -90.3888 0.0622 0.0621 0.6030 0.5223 0.5243 0.0621 16.0768
0.1716 0.2834 100 0.6982 -1.3597 -88.7675 0.1556 0.1559 0.6030 0.5223 0.5274 0.1559 15.9436
0.2858 0.4251 150 0.7183 -1.2546 -79.5067 0.2912 0.2923 0.6030 0.5223 0.5228 0.2923 15.0570
0.3544 0.5668 200 0.7309 -0.8432 -83.8485 0.3898 0.3890 0.6030 0.5223 0.5228 0.3890 14.7122
0.375 0.7085 250 0.7353 -0.6734 -81.2900 0.4398 0.4375 0.6030 0.5223 0.5243 0.4375 14.4729
0.3592 0.8503 300 0.7348 -0.5501 -84.4144 0.4422 0.4388 0.6030 0.5223 0.5233 0.4388 14.4403
0.3351 0.9920 350 0.7354 -0.5360 -82.9375 0.4676 0.4602 0.6030 0.5223 0.5342 0.4602 14.2722
0.3056 1.1337 400 0.7470 -0.5686 -80.5606 0.4842 0.4804 0.6030 0.5223 0.5254 0.4804 14.2812
0.2932 1.2754 450 0.7439 -0.5565 -83.6231 0.4805 0.4755 0.6030 0.5223 0.5280 0.4755 14.4640
0.2864 1.4171 500 0.7510 -0.6557 -82.9178 0.4964 0.4971 0.6030 0.5223 0.5274 0.4971 14.2823
0.2635 1.5588 550 0.7503 -0.6184 -81.1614 0.5023 0.5043 0.6030 0.5223 0.5228 0.5043 14.0632
0.2561 1.7005 600 0.7487 -0.5805 -84.7039 0.4980 0.4964 0.6030 0.5223 0.5233 0.4964 14.3352
0.2448 1.8422 650 0.7503 -0.4274 -83.4629 0.5171 0.5191 0.6030 0.5223 0.5233 0.5191 14.2153
0.2235 1.9839 700 0.7483 -0.5057 -81.7196 0.4963 0.4949 0.6030 0.5223 0.5233 0.4949 14.2026
0.21 2.1256 750 0.7512 -0.4757 -82.5192 0.5234 0.5225 0.6030 0.5223 0.5254 0.5225 14.0055
0.1988 2.2674 800 0.7496 -0.5578 -81.0564 0.5140 0.5114 0.6030 0.5223 0.5295 0.5114 14.1030
0.1845 2.4091 850 0.7516 -0.5129 -82.6326 0.5205 0.5186 0.6030 0.5223 0.5311 0.5186 14.1518
0.1741 2.5508 900 0.7507 -0.4790 -82.9809 0.5132 0.5118 0.6030 0.5223 0.5238 0.5118 14.2459
0.1659 2.6925 950 0.7500 -0.4840 -83.8330 0.5189 0.5193 0.6030 0.5223 0.5238 0.5193 14.3029
0.1539 2.8342 1000 0.7499 -0.4671 -82.8831 0.5137 0.5127 0.6030 0.5223 0.5269 0.5127 14.1925
0.1445 2.9806 1050 0.5116 -83.1677 -0.5531 0.5112 0.7478 0.5112 0.5248 0.6030 0.5223 14.2141
0.1261 3.1223 1100 0.5157 -83.5954 -0.5488 0.5165 0.7515 0.5165 0.5233 0.6030 0.5223 14.1783
0.1146 3.2641 1150 0.5175 -83.4265 -0.5372 0.5161 0.7487 0.5161 0.5264 0.6030 0.5223 14.1956
0.1076 3.4058 1200 0.5169 -83.9912 -0.4946 0.5160 0.7492 0.5160 0.5274 0.6030 0.5223 14.1241
0.0981 3.5475 1250 0.5175 -83.3791 -0.5087 0.5185 0.7500 0.5185 0.5311 0.6030 0.5223 14.2158

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1