--- license: apache-2.0 base_model: hZzy/qwen2.5-0.5b-sft-news-IFT tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer - trl - expo - generated_from_trainer datasets: - hZzy/train_pairwise_weighted model-index: - name: qwen2.5-0.5b-expo-L2EXPO-ES-0.001 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/1mxe21by) # qwen2.5-0.5b-expo-L2EXPO-ES-0.001 This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set: - Loss: 0.3942 - Logps: -573.5075 - Logits: -8.8910 - Objective: 0.3931 - Dpo Loss: 0.6728 - Regularize: 0.3931 - Ranking Simple: 0.6102 - Ranking Idealized: 0.9871 - Ranking Idealized Expo: 0.6320 - Wo Beta: 160.3578 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 12 - total_train_batch_size: 144 - total_eval_batch_size: 12 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Dpo Loss | Logits | Logps | Validation Loss | Objective | Ranking Idealized | Ranking Idealized Expo | Ranking Simple | Regularize | Wo Beta | |:-------------:|:------:|:----:|:--------:|:--------:|:---------:|:---------------:|:---------:|:-----------------:|:----------------------:|:--------------:|:----------:|:--------:| | 0.418 | 0.1417 | 50 | 0.6927 | -1.8105 | -107.4392 | 0.4150 | 0.4128 | 0.9871 | 0.6320 | 0.5352 | 0.4128 | 22.0269 | | 0.416 | 0.2834 | 100 | 0.6896 | -2.0485 | -230.8855 | 0.4087 | 0.4081 | 0.9871 | 0.6320 | 0.5559 | 0.4081 | 52.9494 | | 0.387 | 0.4251 | 150 | 0.6844 | -3.9840 | -343.5519 | 0.4032 | 0.4021 | 0.9871 | 0.6320 | 0.5766 | 0.4021 | 90.2215 | | 0.3587 | 0.5668 | 200 | 0.6754 | -6.1681 | -390.3867 | 0.3917 | 0.3893 | 0.9871 | 0.6320 | 0.6004 | 0.3893 | 124.6577 | | 0.3299 | 0.7085 | 250 | 0.6765 | -7.7444 | -474.0688 | 0.3968 | 0.3968 | 0.9871 | 0.6320 | 0.5958 | 0.3968 | 147.7626 | | 0.294 | 0.8503 | 300 | 0.6728 | -8.8910 | -573.5075 | 0.3942 | 0.3931 | 0.9871 | 0.6320 | 0.6102 | 0.3931 | 160.3578 | | 0.2753 | 0.9920 | 350 | 0.6731 | -9.9981 | -593.1101 | 0.3965 | 0.3960 | 0.9871 | 0.6320 | 0.5937 | 0.3960 | 171.5761 | | 0.2316 | 1.1337 | 400 | 0.6718 | -9.6479 | -564.7661 | 0.3966 | 0.3956 | 0.9871 | 0.6320 | 0.5875 | 0.3956 | 171.6054 | | 0.2205 | 1.2754 | 450 | 0.6725 | -10.9673 | -599.2516 | 0.3962 | 0.3983 | 0.9871 | 0.6320 | 0.5859 | 0.3983 | 182.4877 | | 0.2058 | 1.4171 | 500 | 0.6741 | -9.6175 | -589.5045 | 0.4005 | 0.4029 | 0.9871 | 0.6320 | 0.5797 | 0.4029 | 188.1013 | | 0.2027 | 1.5588 | 550 | 0.6730 | -10.3937 | -622.4691 | 0.3995 | 0.4000 | 0.9871 | 0.6320 | 0.5947 | 0.4000 | 185.8620 | | 0.1897 | 1.7029 | 600 | 0.4028 | -755.1119| -11.5540 | 0.4023 | 0.6716 | 0.4023 | 0.5952 | 0.9871 | 0.6320 | 201.2357 | | 0.1797 | 1.8446 | 650 | 0.3997 | -673.7770| -10.8193 | 0.3992 | 0.6730 | 0.3992 | 0.5942 | 0.9871 | 0.6320 | 188.3079 | | 0.1689 | 1.9863 | 700 | 0.3985 | -653.8336| -11.0772 | 0.3970 | 0.6713 | 0.3970 | 0.5911 | 0.9871 | 0.6320 | 182.3852 | | 0.1492 | 2.1280 | 750 | 0.3959 | -624.3672| -11.4717 | 0.3956 | 0.6708 | 0.3956 | 0.6025 | 0.9871 | 0.6320 | 182.7602 | | 0.143 | 2.2697 | 800 | 0.3955 | -657.3067| -11.2559 | 0.3958 | 0.6701 | 0.3958 | 0.6009 | 0.9871 | 0.6320 | 190.5371 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 3.2.0 - Tokenizers 0.19.1