hZzy's picture
End of training
1390069 verified
|
raw
history blame
4.28 kB
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-DPO-ES-0.1
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6923
  • Logps: -90.9491
  • Logits: -2.1209
  • Objective: 0.6894
  • Dpo Loss: 0.6894
  • Regularize: 0.6894
  • Ranking Simple: 0.5564
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 7.4232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.6785 0.1417 50 0.6814 -90.8721 -1.6022 0.6843 0.6843 0.6843 0.5259 0.6030 0.5223 7.8749
0.618 0.2834 100 0.6733 -98.8900 -1.7799 0.6766 0.6766 0.6766 0.5399 0.6030 0.5223 7.7840
0.5667 0.4251 150 0.6867 -99.1217 -1.8072 0.6829 0.6829 0.6829 0.5409 0.6030 0.5223 7.8537
0.5214 0.5668 200 0.6902 -99.5153 -1.8895 0.6905 0.6905 0.6905 0.5445 0.6030 0.5223 7.7013
0.4922 0.7085 250 0.6976 -82.8384 -1.9887 0.6914 0.6914 0.6914 0.5481 0.6030 0.5223 7.8784
0.4535 0.8503 300 0.6923 -90.9491 -2.1209 0.6894 0.6894 0.6894 0.5564 0.6030 0.5223 7.4232
0.4228 0.9920 350 0.7064 -87.7231 -1.9803 0.6968 0.6968 0.6968 0.5538 0.6030 0.5223 8.0253
0.2845 1.1337 400 0.7305 -101.3180 -2.0805 0.7269 0.7269 0.7269 0.5430 0.6030 0.5223 8.6164
0.2989 1.2754 450 0.7005 -93.1955 -1.8646 0.6974 0.6974 0.6974 0.5606 0.6030 0.5223 8.2386
0.3065 1.4171 500 0.7179 -97.0137 -1.9983 0.7147 0.7147 0.7147 0.5549 0.6030 0.5223 8.2760
0.2885 1.5588 550 0.7091 -107.9610 -1.9041 0.7134 0.7134 0.7134 0.5616 0.6030 0.5223 8.1968

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1