--- license: apache-2.0 base_model: hZzy/qwen2.5-0.5b-sft-news-IFT tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer - trl - expo - generated_from_trainer datasets: - hZzy/train_pairwise_weighted model-index: - name: qwen2.5-0.5b-expo-L2EXPO-ES-0.1-W0 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/ogsqjyxu) # qwen2.5-0.5b-expo-L2EXPO-ES-0.1-W0 This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set: - Loss: 283.0078 - Logps: -81.0689 - Logits: -0.5212 - Objective: 277.3703 - Dpo Loss: 0.7209 - Regularize: 0.6310 - Ranking Simple: 0.5331 - Ranking Idealized: 0.6030 - Ranking Idealized Expo: 0.5223 - Wo Beta: 14.2695 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 12 - total_train_batch_size: 144 - total_eval_batch_size: 12 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Dpo Loss | Logits | Logps | Validation Loss | Objective | Ranking Idealized | Ranking Idealized Expo | Ranking Simple | Regularize | Wo Beta | |:-------------:|:------:|:----:|:--------:|:-------:|:--------:|:---------------:|:---------:|:-----------------:|:----------------------:|:--------------:|:----------:|:-------:| | 176.8183 | 0.1417 | 50 | 0.6874 | -1.4745 | -93.3291 | 185.6754 | 183.7365 | 0.6030 | 0.5223 | 0.5212 | 0.4178 | 16.4509 | | 168.1755 | 0.2834 | 100 | 0.6819 | -1.4487 | -93.7829 | 195.2546 | 190.2426 | 0.6030 | 0.5223 | 0.5342 | 0.4320 | 16.2920 | | 182.7148 | 0.4251 | 150 | 0.6969 | -1.2730 | -89.7329 | 218.5299 | 213.4884 | 0.6030 | 0.5223 | 0.5336 | 0.4859 | 15.8045 | | 203.0993 | 0.5668 | 200 | 0.7051 | -1.0447 | -79.7062 | 251.9243 | 242.6406 | 0.6030 | 0.5223 | 0.5326 | 0.5518 | 14.6949 | | 207.5481 | 0.7085 | 250 | 0.7055 | -1.0362 | -80.0940 | 251.9905 | 244.3510 | 0.6030 | 0.5223 | 0.5305 | 0.5542 | 14.8158 | | 193.4843 | 0.8503 | 300 | 0.7150 | -0.7137 | -80.4296 | 266.7107 | 258.3957 | 0.6030 | 0.5223 | 0.5290 | 0.5881 | 14.5431 | | 182.6922 | 0.9920 | 350 | 0.7073 | -0.6448 | -76.3638 | 262.3346 | 254.6360 | 0.6030 | 0.5223 | 0.5357 | 0.5802 | 14.6176 | | 166.9683 | 1.1337 | 400 | 0.7152 | -0.6392 | -78.3482 | 272.3288 | 264.9111 | 0.6030 | 0.5223 | 0.5274 | 0.6056 | 14.6513 | | 155.9364 | 1.2754 | 450 | 0.7186 | -0.4207 | -80.5230 | 275.0490 | 268.8637 | 0.6030 | 0.5223 | 0.5321 | 0.6129 | 14.7777 | | 143.4724 | 1.4171 | 500 | 0.7209 | -0.5141 | -80.5587 | 275.9663 | 270.0383 | 0.6030 | 0.5223 | 0.5269 | 0.6150 | 14.4364 | | 141.3444 | 1.5588 | 550 | 0.7139 | -0.6338 | -81.1271 | 275.0851 | 269.2189 | 0.6030 | 0.5223 | 0.5378 | 0.6159 | 14.6425 | | 136.172 | 1.7029 | 600 | 273.6681 | -79.4221| -0.5857 | 264.6510 | 0.7111 | 0.6012 | 0.5373 | 0.6030 | 0.5223 | 14.5631 | | 130.7133 | 1.8446 | 650 | 276.3609 | -80.2130| -0.4215 | 269.6939 | 0.7193 | 0.6141 | 0.5342 | 0.6030 | 0.5223 | 14.5456 | | 122.624 | 1.9863 | 700 | 278.4690 | -80.9968| -0.5263 | 271.4757 | 0.7178 | 0.6190 | 0.5378 | 0.6030 | 0.5223 | 14.4664 | | 108.7022 | 2.1280 | 750 | 282.5668 | -84.0088| -0.4657 | 276.0201 | 0.7207 | 0.6302 | 0.5347 | 0.6030 | 0.5223 | 14.4517 | | 104.1923 | 2.2697 | 800 | 278.0555 | -81.6313| -0.4640 | 272.7622 | 0.7166 | 0.6210 | 0.5383 | 0.6030 | 0.5223 | 14.4307 | | 99.0867 | 2.4114 | 850 | 283.0078 | -81.0689| -0.5212 | 277.3703 | 0.7209 | 0.6310 | 0.5331 | 0.6030 | 0.5223 | 14.2695 | | 91.7475 | 2.5531 | 900 | 279.6676 | -81.6144| -0.5149 | 275.1769 | 0.7200 | 0.6279 | 0.5373 | 0.6030 | 0.5223 | 14.3570 | | 87.8681 | 2.6949 | 950 | 281.5718 | -81.8544| -0.4428 | 275.7560 | 0.7191 | 0.6277 | 0.5362 | 0.6030 | 0.5223 | 14.3509 | | 81.742 | 2.8366 | 1000 | 279.1324 | -81.4412| -0.4951 | 274.5647 | 0.7197 | 0.6257 | 0.5336 | 0.6030 | 0.5223 | 14.3551 | | 76.4372 | 2.9783 | 1050 | 279.1884 | -82.3960| -0.4502 | 273.9026 | 0.7184 | 0.6249 | 0.5336 | 0.6030 | 0.5223 | 14.3203 | | 67.4698 | 3.1200 | 1100 | 280.5317 | -82.9107| -0.4190 | 274.7932 | 0.7169 | 0.6260 | 0.5326 | 0.6030 | 0.5223 | 14.3418 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 3.2.0 - Tokenizers 0.19.1