File size: 5,292 Bytes
af3f8c8
 
 
 
63a2487
af3f8c8
 
 
 
63a2487
 
af3f8c8
 
 
 
 
 
 
 
 
 
63a2487
af3f8c8
63a2487
 
 
 
 
 
 
 
 
 
 
 
 
 
af3f8c8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/GPT4
model-index:
- name: zephyr-dpop-qlora-gpt4-5e-6-epoch3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpop-qlora-gpt4-5e-6-epoch3

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/GPT4 dataset.
It achieves the following results on the evaluation set:
- Loss: 14.3852
- Positive Losses: 141.6597
- Dpo Losses: 0.6849
- Rewards/chosen: -1.4061
- Rewards/rejected: -2.0012
- Rewards/accuracies: 0.6667
- Rewards/margins: 0.5951
- Rewards/margins Max: 2.2885
- Rewards/margins Min: -1.0995
- Rewards/margins Std: 1.4978
- Logps/rejected: -459.3069
- Logps/chosen: -425.8328
- Logits/rejected: -2.2783
- Logits/chosen: -2.3207

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5432        | 0.28  | 100  | 1.5490          | 8.3683          | 0.6723     | -0.0507        | -0.1015          | 0.5992             | 0.0508          | 0.2567              | -0.1414             | 0.1757              | -269.3354      | -290.2917    | -2.6677         | -2.7099       |
| 0.4843        | 0.56  | 200  | 3.6354          | 28.9322         | 0.6415     | -0.2537        | -0.4297          | 0.6349             | 0.1759          | 0.7364              | -0.3533             | 0.4858              | -302.1486      | -310.5943    | -2.5589         | -2.6000       |
| 0.2828        | 0.85  | 300  | 6.8046          | 61.7689         | 0.6346     | -0.6003        | -0.8503          | 0.6508             | 0.2500          | 1.0085              | -0.4868             | 0.6679              | -344.2117      | -345.2526    | -2.5349         | -2.5759       |
| 0.3355        | 1.13  | 400  | 11.4158         | 108.7399        | 0.6572     | -1.0761        | -1.4209          | 0.6548             | 0.3447          | 1.4626              | -0.7661             | 0.9968              | -401.2702      | -392.8341    | -2.3773         | -2.4155       |
| 0.3438        | 1.41  | 500  | 10.6413         | 101.3525        | 0.6381     | -1.0007        | -1.3406          | 0.6865             | 0.3399          | 1.3353              | -0.6338             | 0.8805              | -393.2457      | -385.2938    | -2.4471         | -2.4907       |
| 0.2144        | 1.69  | 600  | 8.5896          | 79.7998         | 0.6267     | -0.7817        | -1.2135          | 0.6865             | 0.4318          | 1.5951              | -0.6661             | 1.0047              | -380.5318      | -363.3914    | -2.3029         | -2.3438       |
| 0.3314        | 1.97  | 700  | 11.1651         | 107.2969        | 0.6525     | -1.0595        | -1.5150          | 0.6627             | 0.4555          | 1.7776              | -0.8450             | 1.1660              | -410.6869      | -391.1705    | -2.3025         | -2.3432       |
| 0.1352        | 2.25  | 800  | 13.3571         | 130.9070        | 0.6700     | -1.2986        | -1.8184          | 0.6627             | 0.5198          | 2.0225              | -0.9603             | 1.3296              | -441.0237      | -415.0786    | -2.2901         | -2.3320       |
| 0.2348        | 2.54  | 900  | 14.7241         | 145.9081        | 0.6904     | -1.4488        | -2.0053          | 0.6706             | 0.5564          | 2.1801              | -1.0958             | 1.4586              | -459.7108      | -430.1044    | -2.2661         | -2.3085       |
| 0.1369        | 2.82  | 1000 | 14.5955         | 143.9389        | 0.6869     | -1.4291        | -2.0251          | 0.6627             | 0.5959          | 2.2953              | -1.1073             | 1.5052              | -461.6887      | -428.1342    | -2.2738         | -2.3165       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2