--- license: apache-2.0 base_model: hZzy/qwen2.5-0.5b-sft-25-1 tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer - trl - expo - generated_from_trainer datasets: - hZzy/train_pairwise_strong_new model-index: - name: qwen2.5-0.5b-expo-L2EXPO-25-3 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/vdeeps75) # qwen2.5-0.5b-expo-L2EXPO-25-3 This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-25-1](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-25-1) on the hZzy/train_pairwise_strong_new dataset. It achieves the following results on the evaluation set: - Loss: 0.4184 - Objective: 0.4334 - Ranking Simple: 0.4905 - Reward Accuracy: 0.6291 - Logp Accuracy: 0.4905 - Log Diff Policy: 0.6270 - Chosen Logps: -93.7722 - Rejected Logps: -94.3992 - Chosen Rewards: 0.1566 - Rejected Rewards: 0.0880 - Logits: -1.2008 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 6 - gradient_accumulation_steps: 12 - total_train_batch_size: 288 - total_eval_batch_size: 24 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Objective | Ranking Simple | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits | |:-------------:|:------:|:----:|:---------------:|:---------:|:--------------:|:---------------:|:-------------:|:---------------:|:------------:|:--------------:|:--------------:|:----------------:|:-------:| | 0.3806 | 0.3212 | 50 | 0.4419 | 0.4517 | 0.4864 | 0.6264 | 0.4864 | 0.3852 | -96.1308 | -96.5160 | -0.0793 | -0.1237 | -1.2623 | | 0.3564 | 0.6424 | 100 | 0.4331 | 0.4486 | 0.4932 | 0.6114 | 0.4932 | 0.5639 | -96.4497 | -97.0136 | -0.1111 | -0.1734 | -1.2708 | | 0.3184 | 0.9636 | 150 | 0.4229 | 0.4380 | 0.4973 | 0.6236 | 0.4973 | 0.6348 | -93.3135 | -93.9483 | 0.2025 | 0.1331 | -1.2473 | | 0.2504 | 1.2848 | 200 | 0.4181 | 0.4328 | 0.4918 | 0.6454 | 0.4918 | 0.6747 | -93.7666 | -94.4414 | 0.1572 | 0.0838 | -1.2087 | | 0.2565 | 1.6060 | 250 | 0.4203 | 0.4386 | 0.4946 | 0.6277 | 0.4946 | 0.6352 | -92.0965 | -92.7317 | 0.3242 | 0.2548 | -1.2579 | | 0.2468 | 1.9272 | 300 | 0.4177 | 0.4317 | 0.4918 | 0.625 | 0.4918 | 0.6116 | -93.9391 | -94.5507 | 0.1399 | 0.0729 | -1.2024 | | 0.1956 | 2.2484 | 350 | 0.4182 | 0.4315 | 0.4918 | 0.6304 | 0.4918 | 0.6462 | -93.9020 | -94.5482 | 0.1436 | 0.0731 | -1.2089 | | 0.1909 | 2.5696 | 400 | 0.4186 | 0.4326 | 0.4918 | 0.6359 | 0.4918 | 0.6469 | -93.7824 | -94.4293 | 0.1556 | 0.0850 | -1.1996 | | 0.1873 | 2.8908 | 450 | 0.4185 | 0.4335 | 0.4918 | 0.6264 | 0.4918 | 0.6265 | -93.7758 | -94.4023 | 0.1562 | 0.0877 | -1.2008 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 3.2.0 - Tokenizers 0.19.1