qwen2.5-0.5b-expo-L1EXPO-25-3
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_strong_new dataset. It achieves the following results on the evaluation set:
- Loss: 0.0449
- Objective: 0.0464
- Ranking Simple: 0.4851
- Reward Accuracy: 0.5163
- Logp Accuracy: 0.4851
- Log Diff Policy: -0.0199
- Chosen Logps: -94.0044
- Rejected Logps: -93.9845
- Chosen Rewards: 0.1334
- Rejected Rewards: 0.1295
- Logits: -1.1553
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0443 | 0.3212 | 50 | 0.0391 | 0.0396 | 0.4878 | 0.5421 | 0.4878 | -0.0011 | -94.2958 | -94.2947 | 0.1042 | 0.0985 | -1.1971 |
0.0742 | 0.6424 | 100 | 0.0500 | 0.0519 | 0.4918 | 0.5027 | 0.4918 | -0.0428 | -95.4319 | -95.3890 | -0.0094 | -0.0110 | -1.1971 |
0.0772 | 0.9636 | 150 | 0.0503 | 0.0518 | 0.4918 | 0.5136 | 0.4918 | -0.0491 | -95.7999 | -95.7508 | -0.0462 | -0.0471 | -1.1257 |
0.0949 | 1.2848 | 200 | 0.0694 | 0.0724 | 0.4918 | 0.5109 | 0.4918 | -0.0124 | -94.8558 | -94.8434 | 0.0482 | 0.0436 | -1.1415 |
0.0873 | 1.6060 | 250 | 0.0575 | 0.0589 | 0.4932 | 0.5367 | 0.4932 | -0.0147 | -93.7249 | -93.7102 | 0.1613 | 0.1569 | -1.1543 |
0.0782 | 1.9272 | 300 | 0.0538 | 0.0566 | 0.4891 | 0.5204 | 0.4891 | -0.0493 | -93.3568 | -93.3074 | 0.1981 | 0.1972 | -1.1650 |
0.0549 | 2.2484 | 350 | 0.0482 | 0.0500 | 0.4891 | 0.5082 | 0.4891 | -0.0244 | -93.5386 | -93.5143 | 0.1800 | 0.1765 | -1.1559 |
0.0487 | 2.5696 | 400 | 0.0464 | 0.0477 | 0.4864 | 0.5258 | 0.4864 | -0.0271 | -93.9310 | -93.9039 | 0.1407 | 0.1376 | -1.1521 |
0.0512 | 2.8908 | 450 | 0.0449 | 0.0464 | 0.4864 | 0.5149 | 0.4864 | -0.0210 | -94.0002 | -93.9792 | 0.1338 | 0.1300 | -1.1554 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.