File size: 5,311 Bytes
2b24186
 
2110cfd
 
2b24186
 
 
2110cfd
2b24186
 
 
 
 
 
 
 
 
 
 
 
 
2110cfd
2b24186
2110cfd
 
 
 
 
 
2b24186
2110cfd
 
 
 
 
 
 
2b24186
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-ours-5e-6
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpop-qlora-uf-ours-5e-6

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF dataset.
It achieves the following results on the evaluation set:
- Loss: 5.1264
- Positive Losses: 43.1884
- Dpo Losses: 0.6101
- Rewards/chosen: -0.3903
- Rewards/rejected: -0.7274
- Rewards/accuracies: 0.6670
- Rewards/margins: 0.3370
- Rewards/margins Max: 1.4167
- Rewards/margins Min: -0.8378
- Rewards/margins Std: 0.7707
- Logps/rejected: -331.3143
- Logps/chosen: -323.6263
- Logits/rejected: -2.4808
- Logits/chosen: -2.5277

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6302        | 0.28  | 100  | 0.8170          | 1.2658          | 0.6732     | 0.0877         | 0.0421           | 0.5920             | 0.0456          | 0.2717              | -0.1472             | 0.1389              | -254.3717      | -275.8216    | -2.6655         | -2.7015       |
| 0.5709        | 0.56  | 200  | 2.1527          | 14.1341         | 0.6518     | -0.0932        | -0.2071          | 0.6360             | 0.1139          | 0.6302              | -0.3647             | 0.3297              | -279.2877      | -293.9101    | -2.6591         | -2.6989       |
| 0.4758        | 0.85  | 300  | 2.2508          | 15.0103         | 0.6396     | -0.0829        | -0.2324          | 0.6590             | 0.1495          | 0.7147              | -0.4138             | 0.3813              | -281.8231      | -292.8875    | -2.6866         | -2.7294       |
| 0.4857        | 1.13  | 400  | 2.8413          | 20.4422         | 0.6295     | -0.1464        | -0.3473          | 0.6540             | 0.2010          | 0.9605              | -0.5524             | 0.5026              | -293.3139      | -299.2286    | -2.5810         | -2.6240       |
| 0.6015        | 1.41  | 500  | 2.4297          | 16.2472         | 0.6215     | -0.0798        | -0.3011          | 0.6660             | 0.2213          | 0.9834              | -0.5416             | 0.5125              | -288.6871      | -292.5703    | -2.5803         | -2.6246       |
| 0.4849        | 1.69  | 600  | 3.8077          | 30.0769         | 0.6153     | -0.2435        | -0.5155          | 0.6630             | 0.2721          | 1.1651              | -0.6779             | 0.6337              | -310.1338      | -308.9421    | -2.5659         | -2.6120       |
| 0.4012        | 1.97  | 700  | 4.4359          | 36.7814         | 0.6160     | -0.3161        | -0.6003          | 0.6660             | 0.2841          | 1.2285              | -0.7320             | 0.6759              | -318.6039      | -316.2043    | -2.5208         | -2.5672       |
| 0.3245        | 2.25  | 800  | 4.9873          | 41.8073         | 0.6123     | -0.3752        | -0.6988          | 0.6660             | 0.3236          | 1.3768              | -0.8214             | 0.7506              | -328.4567      | -322.1156    | -2.4952         | -2.5421       |
| 0.3018        | 2.54  | 900  | 5.0342          | 42.1224         | 0.6084     | -0.3810        | -0.7194          | 0.6680             | 0.3383          | 1.4141              | -0.8336             | 0.7645              | -330.5147      | -322.6951    | -2.4804         | -2.5276       |
| 0.4364        | 2.82  | 1000 | 5.0975          | 42.8746         | 0.6098     | -0.3872        | -0.7242          | 0.6680             | 0.3370          | 1.4157              | -0.8369             | 0.7695              | -331.0000      | -323.3101    | -2.4816         | -2.5285       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2