hZzy's picture
End of training
8ac4c6a verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise_weighted
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-ES-10
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-10

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 38.5263
  • Logps: -75.2715
  • Logits: -0.8236
  • Objective: 37.4366
  • Dpo Loss: 18.9116
  • Regularize: 37.4366
  • Ranking Simple: 0.5295
  • Ranking Idealized: 0.5212
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 14.6816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
4.4544 0.1417 50 3.0769 -1.4000 -90.0695 6.1132 6.2354 0.5212 0.5212 0.5243 6.2354 16.0705
17.3779 0.2834 100 7.9374 -1.3238 -85.5257 16.1760 16.0037 0.5212 0.5212 0.5259 16.0037 15.7780
28.1478 0.4251 150 14.7239 -1.0824 -82.4808 28.7309 28.1308 0.5212 0.5212 0.5228 28.1308 15.4096
35.2522 0.5668 200 18.9116 -0.8236 -75.2715 38.5263 37.4366 0.5212 0.5212 0.5295 37.4366 14.6816
37.8556 0.7085 250 22.7495 -0.6024 -76.2798 44.8164 44.5795 0.5212 0.5212 0.5223 44.5795 14.3182
36.0351 0.8503 300 22.1457 -0.7057 -79.1833 44.3831 43.8777 0.5212 0.5212 0.5254 43.8777 14.2675
32.9882 0.9920 350 23.0098 -0.6345 -80.3166 46.6946 45.5953 0.5212 0.5212 0.5248 45.5953 14.1690
30.7247 1.1337 400 48.3805 -82.4111 -0.4810 48.0656 24.6183 48.0656 0.5166 0.5212 0.5212 14.1059
29.6491 1.2754 450 48.5237 -81.5285 -0.5861 48.8411 24.9495 48.8411 0.5243 0.5212 0.5212 14.4793
28.3933 1.4171 500 47.8150 -79.8843 -0.5585 47.9210 24.8156 47.9210 0.5212 0.5212 0.5212 14.3458
26.3026 1.5588 550 48.0081 -79.5567 -0.5594 48.2215 24.4583 48.2215 0.5228 0.5212 0.5212 14.1587
25.1162 1.7005 600 49.4271 -79.4245 -0.4875 49.7428 25.2219 49.7428 0.5259 0.5212 0.5212 14.1923

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1