--- license: apache-2.0 base_model: hZzy/qwen2.5-0.5b-sft-news-IFT tags: - alignment-handbook - ndcg - trl - expo - generated_from_trainer - trl - expo - generated_from_trainer datasets: - hZzy/train_pairwise model-index: - name: qwen2.5-0.5b-expo-DPO-EXPERIMENT-1K-5e6 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/lxeilq1n) # qwen2.5-0.5b-expo-DPO-EXPERIMENT-1K-5e6 This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set: - Loss: 1521.1873 - Logps: -79.1168 - Logits: -1.0707 - Objective: 1520.4889 - Dpo Loss: 1520.4889 - Regularize: 1520.4889 - Ranking Simple: 0.5258 - Ranking Idealized: 0.5093 - Ranking Idealized Expo: 0.5093 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 6 - gradient_accumulation_steps: 12 - total_train_batch_size: 288 - total_eval_batch_size: 24 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple | Ranking Idealized | Ranking Idealized Expo | |:-------------:|:------:|:----:|:---------------:|:--------:|:-------:|:---------:|:---------:|:----------:|:--------------:|:-----------------:|:----------------------:| | 932.0532 | 0.2834 | 50 | 928.5909 | -90.5890 | -1.5321 | 972.7211 | 972.7211 | 972.7211 | 0.5103 | 0.5093 | 0.5093 | | 1035.9887 | 0.5668 | 100 | 1589.5358 | -80.1508 | -1.3577 | 1629.7952 | 1629.7952 | 1629.7952 | 0.5145 | 0.5093 | 0.5093 | | 835.8459 | 0.8503 | 150 | 1554.2150 | -79.1304 | -1.1902 | 1554.7245 | 1554.7245 | 1554.7245 | 0.5238 | 0.5093 | 0.5093 | | 353.4232 | 1.1337 | 200 | 1601.4404 | -77.8605 | -1.1493 | 1618.9882 | 1618.9882 | 1618.9882 | 0.5279 | 0.5093 | 0.5093 | | 363.333 | 1.4171 | 250 | 1571.6953 | -78.8053 | -1.0661 | 1577.6245 | 1577.6245 | 1577.6245 | 0.5227 | 0.5093 | 0.5093 | | 267.0769 | 1.7005 | 300 | 1533.1350 | -79.3922 | -1.0587 | 1538.6410 | 1538.6410 | 1538.6410 | 0.5227 | 0.5093 | 0.5093 | | 287.4463 | 1.9839 | 350 | 1521.1865 | -79.1168 | -1.0707 | 1520.4884 | 1520.4884 | 1520.4884 | 0.5258 | 0.5093 | 0.5093 | ### Framework versions - Transformers 4.42.0 - Pytorch 2.3.0+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1