---
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise_weighted
model-index:
- name: qwen2.5-0.5b-expo-DPO-noES-0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/5jjpvn9b)
# qwen2.5-0.5b-expo-DPO-noES-0.1

This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8493
- Logps: -132.8567
- Logits: -1.8165
- Objective: 0.8653
- Dpo Loss: 0.8653
- Regularize: 0.8653
- Ranking Simple: 0.5347
- Wo Beta: 10.9418

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Logps     | Logits  | Objective | Dpo Loss | Regularize | Ranking Simple | Wo Beta |
|:-------------:|:------:|:----:|:---------------:|:---------:|:-------:|:---------:|:--------:|:----------:|:--------------:|:-------:|
| 0.6719        | 0.1417 | 50   | 0.6856          | -89.6776  | -1.4697 | 0.6879    | 0.6879   | 0.6879     | 0.5269         | 7.9221  |
| 0.6459        | 0.2834 | 100  | 0.6765          | -92.9954  | -1.6511 | 0.6793    | 0.6793   | 0.6793     | 0.5347         | 7.8727  |
| 0.5993        | 0.4251 | 150  | 0.6771          | -95.2729  | -1.6963 | 0.6805    | 0.6805   | 0.6805     | 0.5347         | 8.2155  |
| 0.5557        | 0.5668 | 200  | 0.6858          | -115.4680 | -1.8150 | 0.6866    | 0.6866   | 0.6866     | 0.5295         | 7.9607  |
| 0.5428        | 0.7085 | 250  | 0.6745          | -102.5668 | -1.8495 | 0.6741    | 0.6741   | 0.6741     | 0.5367         | 7.9891  |
| 0.4987        | 0.8503 | 300  | 0.7119          | -110.0949 | -1.9277 | 0.7203    | 0.7203   | 0.7203     | 0.5373         | 8.9267  |
| 0.4599        | 0.9920 | 350  | 0.6886          | -104.9833 | -1.8474 | 0.6912    | 0.6912   | 0.6912     | 0.5352         | 8.3749  |
| 0.3498        | 1.1337 | 400  | 0.7463          | -115.0889 | -1.8807 | 0.7518    | 0.7518   | 0.7518     | 0.5518         | 9.5505  |
| 0.3361        | 1.2754 | 450  | 0.7563          | -116.8004 | -1.8356 | 0.7673    | 0.7673   | 0.7673     | 0.5419         | 9.7252  |
| 0.3584        | 1.4171 | 500  | 0.7635          | -117.5167 | -1.8626 | 0.7695    | 0.7695   | 0.7695     | 0.5419         | 9.6319  |
| 0.3343        | 1.5588 | 550  | 0.7698          | -123.3863 | -1.8209 | 0.7814    | 0.7814   | 0.7814     | 0.5352         | 9.8258  |
| 0.3105        | 1.7005 | 600  | 0.7679          | -119.8231 | -1.7866 | 0.7761    | 0.7761   | 0.7761     | 0.5383         | 9.8031  |
| 0.3412        | 1.8422 | 650  | 0.7750          | -122.2944 | -1.8323 | 0.7848    | 0.7848   | 0.7848     | 0.5383         | 9.9494  |
| 0.3156        | 1.9839 | 700  | 0.8013          | -126.3939 | -1.8338 | 0.8139    | 0.8139   | 0.8139     | 0.5378         | 10.3247 |
| 0.2183        | 2.1256 | 750  | 0.8467          | -131.1257 | -1.7999 | 0.8604    | 0.8604   | 0.8604     | 0.5352         | 10.8931 |
| 0.2338        | 2.2674 | 800  | 0.8480          | -132.1160 | -1.8070 | 0.8641    | 0.8641   | 0.8641     | 0.5352         | 10.9810 |
| 0.2015        | 2.4091 | 850  | 0.8572          | -133.3811 | -1.8018 | 0.8720    | 0.8720   | 0.8720     | 0.5378         | 11.0252 |
| 0.2348        | 2.5508 | 900  | 0.8530          | -133.6796 | -1.8114 | 0.8675    | 0.8675   | 0.8675     | 0.5378         | 10.9423 |
| 0.2268        | 2.6925 | 950  | 0.8525          | -133.2829 | -1.8136 | 0.8684    | 0.8684   | 0.8684     | 0.5336         | 10.9785 |
| 0.2198        | 2.8342 | 1000 | 0.8493          | -132.8809 | -1.8167 | 0.8652    | 0.8652   | 0.8652     | 0.5342         | 10.9383 |
| 0.2221        | 2.9759 | 1050 | 0.8493          | -132.8567 | -1.8165 | 0.8653    | 0.8653   | 0.8653     | 0.5347         | 10.9418 |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1