hZzy's picture
End of training
672ff36 verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L1EXPO
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L1EXPO

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0088
  • Logps: -98.4573
  • Logits: -1.9894
  • Objective: 0.0088
  • Dpo Loss: 0.6929
  • Regularize: 0.0088
  • Ranking Simple: 0.5180
  • Ranking Idealized: 0.6022
  • Ranking Idealized Expo: 0.5207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 48
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
0.006 0.0472 50 0.0060 -98.6152 -1.9958 0.0061 0.6930 0.0061 0.5180 0.6022 0.5207
0.0092 0.0945 100 0.0073 -98.7889 -1.9954 0.0073 0.6929 0.0073 0.5186 0.6022 0.5207
0.0142 0.1417 150 0.0092 -98.6620 -1.9986 0.0093 0.6930 0.0093 0.5186 0.6022 0.5207
0.0173 0.1890 200 0.0097 -98.7946 -1.9957 0.0098 0.6929 0.0098 0.5173 0.6022 0.5207
0.0245 0.2362 250 0.0121 -98.6416 -1.9951 0.0121 0.6929 0.0121 0.5186 0.6022 0.5207
0.0234 0.2834 300 0.0136 -98.3321 -1.9940 0.0140 0.6932 0.0140 0.5166 0.6022 0.5207
0.0262 0.3307 350 0.0178 -98.3457 -1.9947 0.0181 0.6926 0.0181 0.5200 0.6022 0.5207
0.0315 0.3779 400 0.0165 -98.1128 -1.9941 0.0164 0.6926 0.0164 0.5200 0.6022 0.5207
0.0294 0.4252 450 0.0145 -98.3787 -1.9950 0.0148 0.6924 0.0148 0.5186 0.6022 0.5207
0.032 0.4724 500 0.0139 -98.6457 -1.9920 0.0139 0.6925 0.0139 0.5193 0.6022 0.5207
0.0314 0.5196 550 0.0136 -98.9689 -1.9943 0.0135 0.6927 0.0135 0.5186 0.6022 0.5207
0.0311 0.5669 600 0.0142 -98.1223 -1.9968 0.0144 0.6925 0.0144 0.5186 0.6022 0.5207
0.0333 0.6141 650 0.0145 -98.6917 -1.9935 0.0146 0.6926 0.0146 0.5180 0.6022 0.5207
0.028 0.6614 700 0.0138 -98.6777 -1.9953 0.0140 0.6930 0.0140 0.5193 0.6022 0.5207
0.0319 0.7086 750 0.0147 -98.7712 -1.9952 0.0145 0.6926 0.0145 0.5180 0.6022 0.5207
0.0297 0.7558 800 0.0157 -98.1348 -1.9950 0.0163 0.6929 0.0163 0.5186 0.6022 0.5207
0.0286 0.8031 850 0.0124 -98.5940 -1.9954 0.0125 0.6928 0.0125 0.5173 0.6022 0.5207
0.0285 0.8503 900 0.0117 -98.9422 -1.9931 0.0118 0.6929 0.0118 0.5166 0.6022 0.5207
0.0248 0.8976 950 0.0156 -98.6447 -1.9902 0.0155 0.6932 0.0155 0.5173 0.6022 0.5207
0.0272 0.9448 1000 0.0126 -98.1242 -1.9906 0.0128 0.6931 0.0128 0.5180 0.6022 0.5207
0.0215 0.9920 1050 0.0133 -98.3357 -1.9911 0.0135 0.6927 0.0135 0.5180 0.6022 0.5207
0.0242 1.0393 1100 0.0128 -98.5121 -1.9881 0.0127 0.6927 0.0127 0.5180 0.6022 0.5207
0.0248 1.0865 1150 0.0121 -98.3740 -1.9900 0.0124 0.6929 0.0124 0.5180 0.6022 0.5207
0.0238 1.1338 1200 0.0131 -98.6523 -1.9881 0.0132 0.6931 0.0132 0.5186 0.6022 0.5207
0.0213 1.1810 1250 0.0116 -98.3820 -1.9892 0.0118 0.6929 0.0118 0.5186 0.6022 0.5207
0.0213 1.2282 1300 0.0101 -98.3519 -1.9901 0.0103 0.6930 0.0103 0.5180 0.6022 0.5207
0.0191 1.2755 1350 0.0105 -98.1708 -1.9895 0.0107 0.6929 0.0107 0.5186 0.6022 0.5207
0.0183 1.3227 1400 0.0098 -98.2989 -1.9896 0.0099 0.6928 0.0099 0.5180 0.6022 0.5207
0.0173 1.3700 1450 0.0120 -98.4475 -1.9888 0.0120 0.6929 0.0120 0.5193 0.6022 0.5207
0.0171 1.4172 1500 0.0093 -98.4978 -1.9892 0.0093 0.6929 0.0093 0.5186 0.6022 0.5207
0.0164 1.4645 1550 0.0100 -98.4887 -1.9898 0.0101 0.6928 0.0101 0.5180 0.6022 0.5207
0.0165 1.5117 1600 0.0097 -98.4418 -1.9892 0.0096 0.6929 0.0096 0.5186 0.6022 0.5207
0.0128 1.5589 1650 0.0100 -98.3605 -1.9889 0.0101 0.6927 0.0101 0.5180 0.6022 0.5207
0.0132 1.6062 1700 0.0090 -98.4055 -1.9891 0.0089 0.6928 0.0089 0.5180 0.6022 0.5207
0.0133 1.6534 1750 0.0094 -98.4174 -1.9885 0.0094 0.6928 0.0094 0.5180 0.6022 0.5207
0.0138 1.7007 1800 0.0096 -98.3598 -1.9886 0.0097 0.6928 0.0097 0.5180 0.6022 0.5207
0.0122 1.7479 1850 0.0090 -98.4157 -1.9888 0.0091 0.6929 0.0091 0.5180 0.6022 0.5207
0.0128 1.7951 1900 0.0089 -98.4291 -1.9891 0.0090 0.6929 0.0089 0.5180 0.6022 0.5207
0.0133 1.8424 1950 0.0089 -98.4530 -1.9892 0.0090 0.6929 0.0090 0.5180 0.6022 0.5207
0.012 1.8896 2000 0.0087 -98.4584 -1.9894 0.0088 0.6929 0.0088 0.5180 0.6022 0.5207
0.0119 1.9369 2050 0.0088 -98.4571 -1.9894 0.0088 0.6929 0.0088 0.5180 0.6022 0.5207
0.0116 1.9841 2100 0.0088 -98.4573 -1.9894 0.0088 0.6929 0.0088 0.5180 0.6022 0.5207

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1