metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-25-1
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise_strong_new
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-25-3
results: []
qwen2.5-0.5b-expo-L2EXPO-25-3
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-1 on the hZzy/train_pairwise_strong_new dataset. It achieves the following results on the evaluation set:
- Loss: 0.4184
- Objective: 0.4334
- Ranking Simple: 0.4905
- Reward Accuracy: 0.6291
- Logp Accuracy: 0.4905
- Log Diff Policy: 0.6270
- Chosen Logps: -93.7722
- Rejected Logps: -94.3992
- Chosen Rewards: 0.1566
- Rejected Rewards: 0.0880
- Logits: -1.2008
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.3806 | 0.3212 | 50 | 0.4419 | 0.4517 | 0.4864 | 0.6264 | 0.4864 | 0.3852 | -96.1308 | -96.5160 | -0.0793 | -0.1237 | -1.2623 |
0.3564 | 0.6424 | 100 | 0.4331 | 0.4486 | 0.4932 | 0.6114 | 0.4932 | 0.5639 | -96.4497 | -97.0136 | -0.1111 | -0.1734 | -1.2708 |
0.3184 | 0.9636 | 150 | 0.4229 | 0.4380 | 0.4973 | 0.6236 | 0.4973 | 0.6348 | -93.3135 | -93.9483 | 0.2025 | 0.1331 | -1.2473 |
0.2504 | 1.2848 | 200 | 0.4181 | 0.4328 | 0.4918 | 0.6454 | 0.4918 | 0.6747 | -93.7666 | -94.4414 | 0.1572 | 0.0838 | -1.2087 |
0.2565 | 1.6060 | 250 | 0.4203 | 0.4386 | 0.4946 | 0.6277 | 0.4946 | 0.6352 | -92.0965 | -92.7317 | 0.3242 | 0.2548 | -1.2579 |
0.2468 | 1.9272 | 300 | 0.4177 | 0.4317 | 0.4918 | 0.625 | 0.4918 | 0.6116 | -93.9391 | -94.5507 | 0.1399 | 0.0729 | -1.2024 |
0.1956 | 2.2484 | 350 | 0.4182 | 0.4315 | 0.4918 | 0.6304 | 0.4918 | 0.6462 | -93.9020 | -94.5482 | 0.1436 | 0.0731 | -1.2089 |
0.1909 | 2.5696 | 400 | 0.4186 | 0.4326 | 0.4918 | 0.6359 | 0.4918 | 0.6469 | -93.7824 | -94.4293 | 0.1556 | 0.0850 | -1.1996 |
0.1873 | 2.8908 | 450 | 0.4185 | 0.4335 | 0.4918 | 0.6264 | 0.4918 | 0.6265 | -93.7758 | -94.4023 | 0.1562 | 0.0877 | -1.2008 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1