hZzy's picture
End of training
c2e8080 verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 223.1421
  • Logps: -81.8519
  • Logits: -0.6524
  • Objective: 224.3911
  • Dpo Loss: 114.2648
  • Regularize: 224.3911
  • Ranking Simple: 0.5083
  • Ranking Idealized: 0.5093
  • Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 288
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
72.7249 0.2834 50 49.7663 -92.8163 -1.3016 48.7266 25.8845 48.7266 0.5083 0.5093 0.5093
152.2211 0.5668 100 146.6511 -80.6726 -1.2458 149.0543 74.5413 149.0543 0.5124 0.5093 0.5093
149.0411 0.8503 150 179.0229 -81.4258 -0.9511 179.4755 89.5257 179.4755 0.5124 0.5093 0.5093
135.6758 1.1337 200 190.7774 -83.1371 -0.8760 195.4946 98.7297 195.4946 0.5083 0.5093 0.5093
122.9397 1.4171 250 204.8156 -81.1880 -0.8410 206.5414 104.7900 206.5414 0.4990 0.5093 0.5093
109.8686 1.7005 300 216.4334 -82.2344 -0.6658 216.9471 109.1882 216.9471 0.5083 0.5093 0.5093
97.6956 1.9839 350 218.2887 -81.0804 -0.6323 217.4291 109.8084 217.4291 0.5072 0.5093 0.5093
86.0309 2.2674 400 221.7113 -83.6082 -0.5904 225.3389 115.8749 225.3389 0.5052 0.5093 0.5093
78.4362 2.5508 450 221.3732 -82.0743 -0.6173 224.4839 116.2117 224.4839 0.5114 0.5093 0.5093
65.179 2.8342 500 223.8012 -82.3425 -0.6892 227.1755 114.9871 227.1755 0.5083 0.5093 0.5093
52.3116 3.1176 550 223.6770 -81.8433 -0.6290 226.7591 114.9252 226.7591 0.5103 0.5093 0.5093
45.9426 3.4010 600 222.4720 -81.3168 -0.6183 223.1873 113.6331 223.1873 0.5072 0.5093 0.5093
37.3789 3.6845 650 223.4119 -81.7013 -0.6355 225.2157 114.6103 225.2157 0.5072 0.5093 0.5093
32.7043 3.9679 700 223.5499 -81.8343 -0.6585 224.4542 114.2602 224.4542 0.5062 0.5093 0.5093
22.8627 4.2513 750 223.7742 -81.7547 -0.6564 224.6748 114.4499 224.6748 0.5072 0.5093 0.5093
19.3618 4.5347 800 223.2886 -81.8898 -0.6540 224.4371 114.3485 224.4371 0.5083 0.5093 0.5093
18.3796 4.8181 850 223.1902 -81.8524 -0.6522 224.4282 114.2867 224.4282 0.5083 0.5093 0.5093

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1