File size: 6,086 Bytes
de38ea9
 
 
 
92cff83
 
de38ea9
 
 
92cff83
 
 
 
 
de38ea9
 
 
 
 
 
 
 
 
 
 
92cff83
de38ea9
92cff83
 
 
 
 
 
 
de38ea9
 
92cff83
de38ea9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
- alignment-handbook
- ndcg
- trl
- expo
- generated_from_trainer
- trl
- expo
- generated_from_trainer
datasets:
- hZzy/train_pairwise
model-index:
- name: qwen2.5-0.5b-expo-L2EXPO-ES-0.1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/z6ixm6bo)
# qwen2.5-0.5b-expo-L2EXPO-ES-0.1

This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4217
- Logps: -89.1060
- Logits: -1.3837
- Objective: 0.4142
- Dpo Loss: 0.6791
- Regularize: 0.4142
- Ranking Simple: 0.5347
- Ranking Idealized: 0.6030
- Ranking Idealized Expo: 0.5223
- Wo Beta: 15.9847

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5

### Training results

| Training Loss | Epoch  | Step | Dpo Loss | Logits  | Logps    | Validation Loss | Objective | Ranking Idealized | Ranking Idealized Expo | Ranking Simple | Regularize | Wo Beta |
|:-------------:|:------:|:----:|:--------:|:-------:|:--------:|:---------------:|:---------:|:-----------------:|:----------------------:|:--------------:|:----------:|:-------:|
| 0.4117        | 0.1417 | 50   | 0.6893   | -1.4691 | -90.8535 | 0.4102          | 0.4090    | 0.6030            | 0.5223                 | 0.5248         | 0.4090     | 16.3208 |
| 0.3871        | 0.2834 | 100  | 0.6833   | -1.5346 | -91.2757 | 0.4049          | 0.4029    | 0.6030            | 0.5223                 | 0.5316         | 0.4029     | 16.2699 |
| 0.3451        | 0.4251 | 150  | 0.6789   | -1.4902 | -91.1637 | 0.4013          | 0.3996    | 0.6030            | 0.5223                 | 0.5347         | 0.3996     | 16.5907 |
| 0.3166        | 0.5668 | 200  | 0.6811   | -1.4523 | -93.2695 | 0.4148          | 0.4132    | 0.6030            | 0.5223                 | 0.5316         | 0.4132     | 16.3512 |
| 0.2939        | 0.7085 | 250  | 0.6790   | -1.5465 | -90.5537 | 0.4131          | 0.4077    | 0.6030            | 0.5223                 | 0.5342         | 0.4077     | 16.4807 |
| 0.2655        | 0.8503 | 300  | 0.6806   | -1.4553 | -91.3521 | 0.4126          | 0.4082    | 0.6030            | 0.5223                 | 0.5311         | 0.4082     | 16.4429 |
| 0.2513        | 0.9920 | 350  | 0.6782   | -1.4532 | -91.2408 | 0.4110          | 0.4044    | 0.6030            | 0.5223                 | 0.5352         | 0.4044     | 16.3768 |
| 0.2206        | 1.1337 | 400  | 0.4128   | -87.3470| -1.4764  | 0.4049          | 0.6769    | 0.4049            | 0.5336                 | 0.6030         | 0.5223     | 16.2024 |
| 0.2077        | 1.2754 | 450  | 0.4144   | -89.8793| -1.4177  | 0.4106          | 0.6788    | 0.4106            | 0.5331                 | 0.6030         | 0.5223     | 16.1977 |
| 0.1943        | 1.4171 | 500  | 0.4169   | -87.6699| -1.4544  | 0.4092          | 0.6782    | 0.4092            | 0.5352                 | 0.6030         | 0.5223     | 16.0510 |
| 0.1879        | 1.5588 | 550  | 0.4173   | -89.0111| -1.4268  | 0.4102          | 0.6787    | 0.4102            | 0.5347                 | 0.6030         | 0.5223     | 16.0707 |
| 0.1768        | 1.7005 | 600  | 0.4190   | -87.0605| -1.4411  | 0.4116          | 0.6796    | 0.4116            | 0.5352                 | 0.6030         | 0.5223     | 16.0697 |
| 0.1736        | 1.8422 | 650  | 0.4219   | -90.0508| -1.4601  | 0.4144          | 0.6802    | 0.4144            | 0.5347                 | 0.6030         | 0.5223     | 16.1057 |
| 0.1598        | 1.9839 | 700  | 0.4217   | -90.5630| -1.4110  | 0.4148          | 0.6799    | 0.4148            | 0.5362                 | 0.6030         | 0.5223     | 16.0493 |
| 0.1454        | 2.1256 | 750  | 0.4215   | -89.5433| -1.3859  | 0.4151          | 0.6797    | 0.4151            | 0.5316                 | 0.6030         | 0.5223     | 16.0459 |
| 0.1333        | 2.2674 | 800  | 0.4217   | -89.1060| -1.3837  | 0.4142          | 0.6791    | 0.4142            | 0.5347                 | 0.6030         | 0.5223     | 15.9847 |
| 0.1287        | 2.4091 | 850  | 0.4241   | -88.6145| -1.3856  | 0.4153          | 0.6795    | 0.4153            | 0.5357                 | 0.6030         | 0.5223     | 15.9979 |
| 0.12          | 2.5508 | 900  | 0.4207   | -88.6663| -1.3921  | 0.4129          | 0.6795    | 0.4129            | 0.5331                 | 0.6030         | 0.5223     | 16.0698 |
| 0.1148        | 2.6925 | 950  | 0.4215   | -88.2854| -1.3690  | 0.4149          | 0.6792    | 0.4149            | 0.5336                 | 0.6030         | 0.5223     | 16.0513 |
| 0.1068        | 2.8342 | 1000 | 0.4229   | -89.1782| -1.3724  | 0.4168          | 0.6809    | 0.4168            | 0.5321                 | 0.6030         | 0.5223     | 16.0722 |
| 0.0991        | 2.9759 | 1050 | 0.4210   | -88.9607| -1.3982  | 0.4141          | 0.6792    | 0.4141            | 0.5336                 | 0.6030         | 0.5223     | 16.0444 |


### Framework versions

- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1