qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES2-0.1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:
- Loss: 324.6227
- Logps: -88.5336
- Logits: -1.1616
- Objective: 321.3043
- Dpo Loss: 0.6772
- Ranking Simple: 0.5471
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Ranking Simple |
---|---|---|---|---|---|---|---|---|
298.227 | 0.1417 | 50 | 317.0191 | -89.8084 | -1.5244 | 315.7606 | 0.6789 | 0.5311 |
275.3632 | 0.2834 | 100 | 315.6042 | -94.9782 | -1.5619 | 311.9210 | 0.6685 | 0.5430 |
263.5541 | 0.4251 | 150 | 323.5653 | -86.5190 | -1.4200 | 317.0345 | 0.6707 | 0.5466 |
251.2096 | 0.5668 | 200 | 331.8273 | -95.4989 | -1.6021 | 324.8527 | 0.6841 | 0.5455 |
260.8009 | 0.7085 | 250 | 335.3750 | -89.0524 | -1.3996 | 327.2677 | 0.6886 | 0.5383 |
241.17 | 0.8503 | 300 | 326.9033 | -86.8043 | -1.3637 | 321.4521 | 0.6751 | 0.5518 |
217.6821 | 0.9920 | 350 | 330.4503 | -84.6762 | -1.1931 | 320.4708 | 0.6764 | 0.5497 |
196.3838 | 1.1337 | 400 | 331.8378 | -85.2363 | -1.1204 | 324.6776 | 0.6834 | 0.5492 |
196.8245 | 1.2754 | 450 | 328.4219 | -87.7967 | -1.1766 | 321.0422 | 0.6758 | 0.5502 |
206.0427 | 1.4171 | 500 | 327.5086 | -86.1759 | -1.1174 | 321.6342 | 0.6782 | 0.5502 |
185.0637 | 1.5588 | 550 | 325.0724 | -92.8798 | -1.1304 | 320.1057 | 0.6694 | 0.5554 |
183.7322 | 1.7005 | 600 | 324.3962 | -89.2066 | -0.9798 | 319.9032 | 0.6742 | 0.5554 |
206.6587 | 1.8422 | 650 | 324.2286 | -88.9897 | -1.1055 | 319.4041 | 0.6736 | 0.5533 |
188.4019 | 1.9839 | 700 | 323.3723 | -89.3024 | -1.0999 | 318.8545 | 0.6727 | 0.5497 |
165.7588 | 2.1256 | 750 | 324.4387 | -89.4604 | -1.1334 | 320.7416 | 0.6772 | 0.5502 |
164.7524 | 2.2674 | 800 | 323.6566 | -89.3705 | -1.1091 | 320.3278 | 0.6756 | 0.5497 |
160.4428 | 2.4091 | 850 | 324.1759 | -88.9819 | -1.1590 | 321.1419 | 0.6765 | 0.5461 |
164.2802 | 2.5508 | 900 | 324.5910 | -88.9856 | -1.1737 | 321.6709 | 0.6779 | 0.5461 |
168.6074 | 2.6925 | 950 | 324.7719 | -88.5500 | -1.1605 | 321.5456 | 0.6777 | 0.5461 |
165.6921 | 2.8342 | 1000 | 324.6280 | -88.5341 | -1.1614 | 321.3309 | 0.6773 | 0.5461 |
160.429 | 2.9759 | 1050 | 324.6226 | -88.5336 | -1.1616 | 321.3043 | 0.6772 | 0.5471 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W0-noES2-0.1
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT