qwen2.5-0.5b-expo-DPO-ES-TRY
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:
- Loss: 0.8437
- Logps: -117.1633
- Logits: -2.0009
- Objective: 0.8798
- Dpo Loss: 0.8798
- Regularize: 0.8798
- Ranking Simple: 0.5403
- Ranking Idealized: 0.5888
- Ranking Idealized Expo: 0.5093
- Dpo Wo Beta: -5.9006
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Dpo Loss | Dpo Wo Beta | Logits | Logps | Validation Loss | Objective | Ranking Idealized | Ranking Idealized Expo | Ranking Simple | Regularize |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5954 | 0.3004 | 53 | 0.7113 | -2.2659 | -1.8928 | -101.3674 | 0.6816 | 0.7113 | 0.5888 | 0.5093 | 0.5238 | 0.7113 |
0.4618 | 0.6009 | 106 | 0.6936 | -2.4624 | -1.9007 | -94.3571 | 0.6913 | 0.6936 | 0.5888 | 0.5093 | 0.5351 | 0.6936 |
0.3986 | 0.9013 | 159 | 0.7215 | -3.1229 | -2.1450 | -95.6001 | 0.7014 | 0.7215 | 0.5888 | 0.5093 | 0.5351 | 0.7215 |
0.2551 | 1.2017 | 212 | 0.7525 | -3.7750 | -2.2678 | -98.1427 | 0.7351 | 0.7525 | 0.5888 | 0.5093 | 0.5372 | 0.7525 |
0.2623 | 1.5021 | 265 | 0.7739 | -4.1634 | -2.1478 | -100.8313 | 0.7400 | 0.7739 | 0.5888 | 0.5093 | 0.5393 | 0.7739 |
0.2571 | 1.8026 | 318 | 0.7665 | -4.0950 | -1.9888 | -102.3712 | 0.7401 | 0.7665 | 0.5888 | 0.5093 | 0.5393 | 0.7665 |
0.1227 | 2.1030 | 371 | 0.9224 | -6.4510 | -1.8645 | -122.0016 | 0.8844 | 0.9224 | 0.5888 | 0.5093 | 0.5424 | 0.9224 |
0.133 | 2.4034 | 424 | 0.8786 | -5.8878 | -2.0277 | -117.1217 | 0.8448 | 0.8786 | 0.5888 | 0.5093 | 0.5413 | 0.8786 |
0.1211 | 2.7085 | 477 | 0.8371 | -116.4230 | -2.0272 | 0.8739 | 0.8739 | 0.8739 | 0.5403 | 0.5888 | 0.5093 | -5.8152 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-TRY2
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT