File size: 12,950 Bytes
45f5c7c
 
5a79702
 
 
45f5c7c
 
 
5a79702
45f5c7c
 
 
 
 
 
 
 
 
 
 
 
 
5a79702
45f5c7c
5a79702
 
 
 
 
 
 
 
 
 
 
 
45f5c7c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
- generation/UFfull2
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpo-qlora-uf-ours-uffull-5e-7
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpo-qlora-uf-ours-uffull-5e-7

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF and the generation/UFfull2 datasets.
It achieves the following results on the evaluation set:
- Loss: 0.5926
- Rewards/chosen: -0.2031
- Rewards/rejected: -0.5182
- Rewards/accuracies: 0.7065
- Rewards/margins: 0.3151
- Rewards/margins Max: 1.1834
- Rewards/margins Min: -0.5406
- Rewards/margins Std: 0.5821
- Logps/rejected: -317.6757
- Logps/chosen: -304.7639
- Logits/rejected: -2.5747
- Logits/chosen: -2.6051

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6931        | 0.02  | 100  | 0.6930          | 0.0002         | -0.0000          | 0.5130             | 0.0002          | 0.0047              | -0.0043             | 0.0030              | -265.8543      | -284.4388    | -2.7682         | -2.8031       |
| 0.692         | 0.05  | 200  | 0.6923          | 0.0017         | 0.0000           | 0.6220             | 0.0017          | 0.0099              | -0.0057             | 0.0051              | -265.8525      | -284.2892    | -2.7668         | -2.8017       |
| 0.6903        | 0.07  | 300  | 0.6908          | 0.0067         | 0.0019           | 0.6520             | 0.0048          | 0.0253              | -0.0125             | 0.0125              | -265.6623      | -283.7856    | -2.7627         | -2.7978       |
| 0.6888        | 0.1   | 400  | 0.6880          | 0.0104         | -0.0004          | 0.6645             | 0.0108          | 0.0545              | -0.0264             | 0.0268              | -265.8943      | -283.4167    | -2.7573         | -2.7924       |
| 0.6827        | 0.12  | 500  | 0.6834          | 0.0345         | 0.0138           | 0.6820             | 0.0207          | 0.0989              | -0.0454             | 0.0479              | -264.4715      | -281.0052    | -2.7529         | -2.7877       |
| 0.6831        | 0.14  | 600  | 0.6776          | 0.0296         | -0.0039          | 0.6910             | 0.0335          | 0.1552              | -0.0696             | 0.0745              | -266.2422      | -281.4937    | -2.7479         | -2.7827       |
| 0.6652        | 0.17  | 700  | 0.6700          | 0.0086         | -0.0427          | 0.6820             | 0.0513          | 0.2350              | -0.1057             | 0.1128              | -270.1202      | -283.5948    | -2.7382         | -2.7726       |
| 0.6486        | 0.19  | 800  | 0.6615          | -0.0198        | -0.0921          | 0.6805             | 0.0723          | 0.3237              | -0.1470             | 0.1565              | -275.0622      | -286.4378    | -2.7367         | -2.7702       |
| 0.6457        | 0.22  | 900  | 0.6531          | -0.0599        | -0.1549          | 0.6755             | 0.0950          | 0.4216              | -0.1947             | 0.2059              | -281.3418      | -290.4436    | -2.7168         | -2.7500       |
| 0.6356        | 0.24  | 1000 | 0.6449          | -0.0625        | -0.1814          | 0.6785             | 0.1188          | 0.5225              | -0.2486             | 0.2583              | -283.9890      | -290.7086    | -2.7042         | -2.7362       |
| 0.6465        | 0.26  | 1100 | 0.6378          | -0.0291        | -0.1702          | 0.6775             | 0.1411          | 0.6108              | -0.2946             | 0.3031              | -282.8690      | -287.3659    | -2.6982         | -2.7301       |
| 0.6121        | 0.29  | 1200 | 0.6317          | -0.0658        | -0.2261          | 0.6780             | 0.1603          | 0.6847              | -0.3354             | 0.3418              | -288.4626      | -291.0350    | -2.6893         | -2.7208       |
| 0.6113        | 0.31  | 1300 | 0.6287          | -0.1819        | -0.3556          | 0.6820             | 0.1737          | 0.7287              | -0.3416             | 0.3621              | -301.4144      | -302.6470    | -2.6941         | -2.7251       |
| 0.6058        | 0.34  | 1400 | 0.6234          | -0.1290        | -0.3204          | 0.6775             | 0.1914          | 0.7908              | -0.3943             | 0.3995              | -297.8902      | -297.3538    | -2.6823         | -2.7135       |
| 0.6169        | 0.36  | 1500 | 0.6194          | -0.1244        | -0.3286          | 0.6790             | 0.2042          | 0.8341              | -0.4094             | 0.4197              | -298.7180      | -296.9003    | -2.6648         | -2.6957       |
| 0.5809        | 0.38  | 1600 | 0.6163          | -0.1125        | -0.3291          | 0.6800             | 0.2167          | 0.8823              | -0.4243             | 0.4399              | -298.7659      | -295.7021    | -2.6547         | -2.6853       |
| 0.5979        | 0.41  | 1700 | 0.6161          | -0.2126        | -0.4403          | 0.6805             | 0.2276          | 0.9153              | -0.4469             | 0.4624              | -309.8821      | -305.7201    | -2.6466         | -2.6773       |
| 0.6034        | 0.43  | 1800 | 0.6124          | -0.1652        | -0.4014          | 0.6805             | 0.2362          | 0.9410              | -0.4507             | 0.4726              | -305.9889      | -300.9712    | -2.6365         | -2.6672       |
| 0.5983        | 0.45  | 1900 | 0.6144          | -0.0531        | -0.2743          | 0.6900             | 0.2212          | 0.8923              | -0.3931             | 0.4327              | -293.2797      | -289.7628    | -2.6389         | -2.6689       |
| 0.5822        | 0.48  | 2000 | 0.6049          | -0.1502        | -0.4096          | 0.6885             | 0.2593          | 1.0070              | -0.4697             | 0.4998              | -306.8109      | -299.4801    | -2.6378         | -2.6679       |
| 0.6013        | 0.5   | 2100 | 0.6034          | -0.1787        | -0.4453          | 0.6870             | 0.2666          | 1.0331              | -0.4819             | 0.5137              | -310.3860      | -302.3300    | -2.6289         | -2.6593       |
| 0.6018        | 0.53  | 2200 | 0.6019          | -0.1572        | -0.4295          | 0.6925             | 0.2723          | 1.0473              | -0.4896             | 0.5205              | -308.8055      | -300.1773    | -2.6287         | -2.6585       |
| 0.6121        | 0.55  | 2300 | 0.6010          | -0.2434        | -0.5217          | 0.6905             | 0.2783          | 1.0633              | -0.4893             | 0.5289              | -318.0273      | -308.7991    | -2.6178         | -2.6476       |
| 0.5698        | 0.57  | 2400 | 0.5979          | -0.1902        | -0.4780          | 0.6920             | 0.2878          | 1.0879              | -0.4939             | 0.5369              | -313.6557      | -303.4752    | -2.6092         | -2.6389       |
| 0.5656        | 0.6   | 2500 | 0.5992          | -0.2708        | -0.5597          | 0.6985             | 0.2889          | 1.0980              | -0.5097             | 0.5454              | -321.8217      | -311.5382    | -2.5991         | -2.6291       |
| 0.5795        | 0.62  | 2600 | 0.5950          | -0.2109        | -0.5113          | 0.6950             | 0.3003          | 1.1206              | -0.5079             | 0.5533              | -316.9805      | -305.5476    | -2.5944         | -2.6244       |
| 0.5909        | 0.65  | 2700 | 0.5945          | -0.2006        | -0.5044          | 0.6950             | 0.3038          | 1.1335              | -0.5150             | 0.5598              | -316.2979      | -304.5152    | -2.5934         | -2.6235       |
| 0.6097        | 0.67  | 2800 | 0.5938          | -0.2035        | -0.5091          | 0.6975             | 0.3055          | 1.1391              | -0.5171             | 0.5610              | -316.7604      | -304.8101    | -2.5909         | -2.6210       |
| 0.5776        | 0.69  | 2900 | 0.5929          | -0.2142        | -0.5232          | 0.7040             | 0.3091          | 1.1530              | -0.5251             | 0.5673              | -318.1778      | -305.8716    | -2.5874         | -2.6177       |
| 0.575         | 0.72  | 3000 | 0.5948          | -0.1848        | -0.4886          | 0.6980             | 0.3039          | 1.1465              | -0.5243             | 0.5647              | -314.7165      | -302.9333    | -2.5861         | -2.6165       |
| 0.5767        | 0.74  | 3100 | 0.5936          | -0.1972        | -0.5061          | 0.7010             | 0.3089          | 1.1551              | -0.5276             | 0.5690              | -316.4648      | -304.1734    | -2.5862         | -2.6166       |
| 0.5642        | 0.77  | 3200 | 0.5937          | -0.1943        | -0.5034          | 0.7010             | 0.3091          | 1.1615              | -0.5332             | 0.5726              | -316.1906      | -303.8846    | -2.5867         | -2.6170       |
| 0.5767        | 0.79  | 3300 | 0.5914          | -0.2376        | -0.5569          | 0.7050             | 0.3193          | 1.1828              | -0.5330             | 0.5823              | -321.5458      | -308.2144    | -2.5828         | -2.6131       |
| 0.5685        | 0.81  | 3400 | 0.5914          | -0.2246        | -0.5434          | 0.7045             | 0.3188          | 1.1858              | -0.5380             | 0.5834              | -320.1958      | -306.9150    | -2.5800         | -2.6103       |
| 0.5687        | 0.84  | 3500 | 0.5909          | -0.2343        | -0.5556          | 0.7045             | 0.3214          | 1.1905              | -0.5370             | 0.5855              | -321.4169      | -307.8832    | -2.5779         | -2.6082       |
| 0.5598        | 0.86  | 3600 | 0.5924          | -0.2063        | -0.5212          | 0.7060             | 0.3150          | 1.1819              | -0.5400             | 0.5817              | -317.9754      | -305.0805    | -2.5781         | -2.6084       |
| 0.5639        | 0.89  | 3700 | 0.5921          | -0.2090        | -0.5258          | 0.7055             | 0.3168          | 1.1849              | -0.5399             | 0.5831              | -318.4354      | -305.3578    | -2.5751         | -2.6056       |
| 0.5931        | 0.91  | 3800 | 0.5930          | -0.1985        | -0.5119          | 0.7060             | 0.3134          | 1.1790              | -0.5399             | 0.5802              | -317.0424      | -304.3084    | -2.5778         | -2.6081       |
| 0.5542        | 0.93  | 3900 | 0.5929          | -0.1989        | -0.5128          | 0.7060             | 0.3139          | 1.1807              | -0.5398             | 0.5808              | -317.1321      | -304.3491    | -2.5760         | -2.6064       |
| 0.5713        | 0.96  | 4000 | 0.5926          | -0.2022        | -0.5175          | 0.7050             | 0.3153          | 1.1831              | -0.5407             | 0.5823              | -317.6028      | -304.6741    | -2.5743         | -2.6048       |
| 0.5725        | 0.98  | 4100 | 0.5925          | -0.2025        | -0.5175          | 0.7060             | 0.3149          | 1.1833              | -0.5415             | 0.5824              | -317.5993      | -304.7070    | -2.5752         | -2.6056       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2