qwen2.5-0.5b-expo-DPO-ES-TRY
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:
- Loss: 0.8143
- Logps: -120.0593
- Logits: -2.4478
- Objective: 0.8312
- Dpo Loss: 0.8312
- Regularize: 0.8312
- Ranking Simple: 0.5839
- Ranking Idealized: 0.6046
- Ranking Idealized Expo: 0.5280
- Dpo Wo Beta: -5.3609
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 6
- total_train_batch_size: 72
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Dpo Loss | Dpo Wo Beta | Logits | Logps | Validation Loss | Objective | Ranking Idealized | Ranking Idealized Expo | Ranking Simple | Regularize |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5954 | 0.3004 | 53 | 0.7113 | -2.2659 | -1.8928 | -101.3674 | 0.6816 | 0.7113 | 0.5888 | 0.5093 | 0.5238 | 0.7113 |
0.4618 | 0.6009 | 106 | 0.6936 | -2.4624 | -1.9007 | -94.3571 | 0.6913 | 0.6936 | 0.5888 | 0.5093 | 0.5351 | 0.6936 |
0.3986 | 0.9013 | 159 | 0.7215 | -3.1229 | -2.1450 | -95.6001 | 0.7014 | 0.7215 | 0.5888 | 0.5093 | 0.5351 | 0.7215 |
0.2551 | 1.2017 | 212 | 0.7525 | -3.7750 | -2.2678 | -98.1427 | 0.7351 | 0.7525 | 0.5888 | 0.5093 | 0.5372 | 0.7525 |
0.2623 | 1.5021 | 265 | 0.7739 | -4.1634 | -2.1478 | -100.8313 | 0.7400 | 0.7739 | 0.5888 | 0.5093 | 0.5393 | 0.7739 |
0.2571 | 1.8026 | 318 | 0.7665 | -4.0950 | -1.9888 | -102.3712 | 0.7401 | 0.7665 | 0.5888 | 0.5093 | 0.5393 | 0.7665 |
0.1227 | 2.1030 | 371 | 0.9224 | -6.4510 | -1.8645 | -122.0016 | 0.8844 | 0.9224 | 0.5888 | 0.5093 | 0.5424 | 0.9224 |
0.133 | 2.4034 | 424 | 0.8786 | -5.8878 | -2.0277 | -117.1217 | 0.8448 | 0.8786 | 0.5888 | 0.5093 | 0.5413 | 0.8786 |
0.1211 | 2.7085 | 477 | 0.8739 | -5.8152 | -2.0272 | -116.4230 | 0.8371 | 0.8739 | 0.5888 | 0.5093 | 0.5403 | 0.8739 |
0.0858 | 1.5045 | 530 | 0.8753 | -5.9229 | -2.4530 | -118.2529 | 0.8505 | 0.8753 | 0.6046 | 0.5280 | 0.5683 | 0.8753 |
0.1274 | 1.6547 | 583 | 0.8264 | -5.2847 | -2.4380 | -119.5907 | 0.8086 | 0.8264 | 0.6046 | 0.5280 | 0.5870 | 0.8264 |
0.1614 | 1.8049 | 636 | 0.8243 | -5.2813 | -2.4850 | -117.8585 | 0.8209 | 0.8243 | 0.6046 | 0.5280 | 0.5818 | 0.8243 |
0.1616 | 1.9551 | 689 | 0.8576 | -5.7234 | -2.4656 | -119.3221 | 0.8383 | 0.8576 | 0.6046 | 0.5280 | 0.5797 | 0.8576 |
0.1063 | 2.1053 | 742 | 0.9824 | -7.3310 | -2.2712 | -133.3637 | 0.9486 | 0.9824 | 0.6046 | 0.5280 | 0.5518 | 0.9824 |
0.1017 | 2.2556 | 795 | 0.8904 | -6.2055 | -2.4490 | -123.5745 | 0.8711 | 0.8904 | 0.6046 | 0.5280 | 0.5683 | 0.8904 |
0.1225 | 2.4058 | 848 | 0.9035 | -6.3529 | -2.4743 | -124.5336 | 0.8822 | 0.9035 | 0.6046 | 0.5280 | 0.5569 | 0.9035 |
0.1157 | 2.5583 | 901 | 0.8718 | -124.4583 | -2.4886 | 0.8941 | 0.8941 | 0.8941 | 0.5621 | 0.6046 | 0.5280 | -6.2136 |
0.1387 | 2.7085 | 954 | 0.8688 | -123.4143 | -2.5086 | 0.8892 | 0.8892 | 0.8892 | 0.5580 | 0.6046 | 0.5280 | -6.1610 |
0.1219 | 2.8588 | 1007 | 0.8682 | -123.1454 | -2.5127 | 0.8882 | 0.8882 | 0.8882 | 0.5600 | 0.6046 | 0.5280 | -6.1537 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 16
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY3
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT