hZzy's picture
End of training
92cff83 verified
|
raw
history blame
6.09 kB
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-ES-0.1
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4217
  • Logps: -89.1060
  • Logits: -1.3837
  • Objective: 0.4142
  • Dpo Loss: 0.6791
  • Regularize: 0.4142
  • Ranking Simple: 0.5347
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 15.9847

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
0.4117 0.1417 50 0.6893 -1.4691 -90.8535 0.4102 0.4090 0.6030 0.5223 0.5248 0.4090 16.3208
0.3871 0.2834 100 0.6833 -1.5346 -91.2757 0.4049 0.4029 0.6030 0.5223 0.5316 0.4029 16.2699
0.3451 0.4251 150 0.6789 -1.4902 -91.1637 0.4013 0.3996 0.6030 0.5223 0.5347 0.3996 16.5907
0.3166 0.5668 200 0.6811 -1.4523 -93.2695 0.4148 0.4132 0.6030 0.5223 0.5316 0.4132 16.3512
0.2939 0.7085 250 0.6790 -1.5465 -90.5537 0.4131 0.4077 0.6030 0.5223 0.5342 0.4077 16.4807
0.2655 0.8503 300 0.6806 -1.4553 -91.3521 0.4126 0.4082 0.6030 0.5223 0.5311 0.4082 16.4429
0.2513 0.9920 350 0.6782 -1.4532 -91.2408 0.4110 0.4044 0.6030 0.5223 0.5352 0.4044 16.3768
0.2206 1.1337 400 0.4128 -87.3470 -1.4764 0.4049 0.6769 0.4049 0.5336 0.6030 0.5223 16.2024
0.2077 1.2754 450 0.4144 -89.8793 -1.4177 0.4106 0.6788 0.4106 0.5331 0.6030 0.5223 16.1977
0.1943 1.4171 500 0.4169 -87.6699 -1.4544 0.4092 0.6782 0.4092 0.5352 0.6030 0.5223 16.0510
0.1879 1.5588 550 0.4173 -89.0111 -1.4268 0.4102 0.6787 0.4102 0.5347 0.6030 0.5223 16.0707
0.1768 1.7005 600 0.4190 -87.0605 -1.4411 0.4116 0.6796 0.4116 0.5352 0.6030 0.5223 16.0697
0.1736 1.8422 650 0.4219 -90.0508 -1.4601 0.4144 0.6802 0.4144 0.5347 0.6030 0.5223 16.1057
0.1598 1.9839 700 0.4217 -90.5630 -1.4110 0.4148 0.6799 0.4148 0.5362 0.6030 0.5223 16.0493
0.1454 2.1256 750 0.4215 -89.5433 -1.3859 0.4151 0.6797 0.4151 0.5316 0.6030 0.5223 16.0459
0.1333 2.2674 800 0.4217 -89.1060 -1.3837 0.4142 0.6791 0.4142 0.5347 0.6030 0.5223 15.9847
0.1287 2.4091 850 0.4241 -88.6145 -1.3856 0.4153 0.6795 0.4153 0.5357 0.6030 0.5223 15.9979
0.12 2.5508 900 0.4207 -88.6663 -1.3921 0.4129 0.6795 0.4129 0.5331 0.6030 0.5223 16.0698
0.1148 2.6925 950 0.4215 -88.2854 -1.3690 0.4149 0.6792 0.4149 0.5336 0.6030 0.5223 16.0513
0.1068 2.8342 1000 0.4229 -89.1782 -1.3724 0.4168 0.6809 0.4168 0.5321 0.6030 0.5223 16.0722
0.0991 2.9759 1050 0.4210 -88.9607 -1.3982 0.4141 0.6792 0.4141 0.5336 0.6030 0.5223 16.0444

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1