qwen2.5-0.5b-expo-L2EXPO-25-2
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_all_new dataset. It achieves the following results on the evaluation set:
- Loss: 0.3732
- Objective: 0.3661
- Ranking Simple: 0.5272
- Reward Accuracy: 0.6184
- Logp Accuracy: 0.5272
- Log Diff Policy: 1.4964
- Chosen Logps: -93.9669
- Rejected Logps: -95.4632
- Chosen Rewards: 0.0189
- Rejected Rewards: -0.0484
- Logits: -1.0973
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.3775 | 0.1413 | 50 | 0.3834 | 0.3808 | 0.5163 | 0.5785 | 0.5163 | 1.0022 | -94.6418 | -95.6439 | -0.0486 | -0.0664 | -1.1440 |
0.3567 | 0.2826 | 100 | 0.3814 | 0.3756 | 0.5193 | 0.6141 | 0.5193 | 1.2191 | -95.2743 | -96.4934 | -0.1119 | -0.1514 | -1.1248 |
0.3556 | 0.4238 | 150 | 0.3829 | 0.3778 | 0.5187 | 0.6008 | 0.5187 | 1.3120 | -97.2010 | -98.5130 | -0.3045 | -0.3534 | -1.1497 |
0.3116 | 0.5651 | 200 | 0.3788 | 0.3741 | 0.5236 | 0.6178 | 0.5236 | 1.3794 | -95.2373 | -96.6166 | -0.1082 | -0.1637 | -1.1020 |
0.3111 | 0.7064 | 250 | 0.3802 | 0.3731 | 0.5217 | 0.6081 | 0.5217 | 1.3607 | -95.5764 | -96.9371 | -0.1421 | -0.1958 | -1.1156 |
0.2888 | 0.8477 | 300 | 0.3775 | 0.3719 | 0.5254 | 0.6178 | 0.5254 | 1.4271 | -95.7193 | -97.1464 | -0.1564 | -0.2167 | -1.0972 |
0.2742 | 0.9889 | 350 | 0.3778 | 0.3731 | 0.5278 | 0.6310 | 0.5278 | 1.4149 | -92.7176 | -94.1325 | 0.1438 | 0.0847 | -1.1577 |
0.2295 | 1.1302 | 400 | 0.3764 | 0.3696 | 0.5272 | 0.6171 | 0.5272 | 1.5014 | -94.6456 | -96.1470 | -0.0490 | -0.1168 | -1.1084 |
0.2234 | 1.2715 | 450 | 0.3742 | 0.3703 | 0.5248 | 0.6069 | 0.5248 | 1.4271 | -93.7809 | -95.2079 | 0.0375 | -0.0228 | -1.1391 |
0.2144 | 1.4128 | 500 | 0.3741 | 0.3682 | 0.5248 | 0.6220 | 0.5248 | 1.4393 | -93.1956 | -94.6349 | 0.0960 | 0.0345 | -1.0827 |
0.2186 | 1.5540 | 550 | 0.3751 | 0.3683 | 0.5260 | 0.6220 | 0.5260 | 1.4528 | -92.7123 | -94.1651 | 0.1443 | 0.0814 | -1.1178 |
0.205 | 1.6953 | 600 | 0.3762 | 0.3692 | 0.5266 | 0.6232 | 0.5266 | 1.4922 | -93.6128 | -95.1051 | 0.0543 | -0.0126 | -1.1120 |
0.1908 | 1.8366 | 650 | 0.3754 | 0.3680 | 0.5223 | 0.6159 | 0.5223 | 1.4726 | -93.8479 | -95.3205 | 0.0308 | -0.0341 | -1.1085 |
0.1851 | 1.9779 | 700 | 0.3740 | 0.3671 | 0.5242 | 0.6220 | 0.5242 | 1.4626 | -94.0915 | -95.5541 | 0.0064 | -0.0575 | -1.0983 |
0.1453 | 2.1191 | 750 | 0.3738 | 0.3702 | 0.5242 | 0.6178 | 0.5242 | 1.4582 | -92.8502 | -94.3084 | 0.1305 | 0.0671 | -1.0918 |
0.149 | 2.2604 | 800 | 0.3734 | 0.3662 | 0.5290 | 0.625 | 0.5290 | 1.5033 | -94.1187 | -95.6221 | 0.0037 | -0.0643 | -1.0989 |
0.1548 | 2.4017 | 850 | 0.3725 | 0.3662 | 0.5236 | 0.6184 | 0.5236 | 1.4822 | -94.0088 | -95.4911 | 0.0147 | -0.0512 | -1.0865 |
0.1333 | 2.5430 | 900 | 0.3721 | 0.3650 | 0.5260 | 0.6202 | 0.5260 | 1.4965 | -94.1236 | -95.6201 | 0.0032 | -0.0641 | -1.1158 |
0.1414 | 2.6842 | 950 | 0.3729 | 0.3671 | 0.5266 | 0.6214 | 0.5266 | 1.4965 | -94.4185 | -95.9149 | -0.0263 | -0.0935 | -1.0838 |
0.1371 | 2.8255 | 1000 | 0.3739 | 0.3688 | 0.5248 | 0.6147 | 0.5248 | 1.4881 | -93.8768 | -95.3649 | 0.0279 | -0.0385 | -1.0965 |
0.1193 | 2.9668 | 1050 | 0.3736 | 0.3660 | 0.5266 | 0.6153 | 0.5266 | 1.4860 | -93.4251 | -94.9111 | 0.0730 | 0.0068 | -1.0944 |
0.1002 | 3.1081 | 1100 | 0.3729 | 0.3656 | 0.5260 | 0.6178 | 0.5260 | 1.4959 | -93.4099 | -94.9058 | 0.0746 | 0.0074 | -1.0990 |
0.1031 | 3.2494 | 1150 | 0.3733 | 0.3665 | 0.5266 | 0.6208 | 0.5266 | 1.4998 | -94.1445 | -95.6443 | 0.0011 | -0.0665 | -1.0853 |
0.095 | 3.3906 | 1200 | 0.3732 | 0.3659 | 0.5260 | 0.6208 | 0.5260 | 1.4867 | -93.9840 | -95.4707 | 0.0172 | -0.0491 | -1.0953 |
0.1014 | 3.5319 | 1250 | 0.3734 | 0.3665 | 0.5272 | 0.6226 | 0.5272 | 1.4976 | -94.1020 | -95.5996 | 0.0054 | -0.0620 | -1.0973 |
0.0949 | 3.6732 | 1300 | 0.3734 | 0.3664 | 0.5272 | 0.6178 | 0.5272 | 1.4947 | -93.9755 | -95.4702 | 0.0180 | -0.0491 | -1.0977 |
0.096 | 3.8145 | 1350 | 0.3733 | 0.3661 | 0.5272 | 0.6190 | 0.5272 | 1.4969 | -93.9574 | -95.4542 | 0.0198 | -0.0475 | -1.0971 |
0.1032 | 3.9557 | 1400 | 0.3732 | 0.3661 | 0.5272 | 0.6184 | 0.5272 | 1.4964 | -93.9669 | -95.4632 | 0.0189 | -0.0484 | -1.0973 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 17
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.