File size: 5,309 Bytes

cf85808
 
7ffa9d1
 
cf85808
 
 
7ffa9d1
cf85808
 
 
 
 
 
 
 
 
 
 
 
 
7ffa9d1
cf85808
 
7ffa9d1
cf85808
7ffa9d1
 
 
cf85808
7ffa9d1
 
 
 
 
 
 
cf85808

---
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
- generation/UF
library_name: peft
license: apache-2.0
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-dpop-qlora-uf-ours-5e-7
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-dpop-qlora-uf-ours-5e-7

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9543
- Positive Losses: 2.5736
- Dpo Losses: 0.6658
- Rewards/chosen: 0.0602
- Rewards/rejected: -0.0038
- Rewards/accuracies: 0.6300
- Rewards/margins: 0.0640
- Rewards/margins Max: 0.3473
- Rewards/margins Min: -0.1824
- Rewards/margins Std: 0.1766
- Logps/rejected: -258.9606
- Logps/chosen: -278.5768
- Logits/rejected: -2.6741
- Logits/chosen: -2.7121

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6884        | 0.28  | 100  | 0.6931          | 0.0111          | 0.6918     | 0.0136         | 0.0108           | 0.6080             | 0.0028          | 0.0179              | -0.0107             | 0.0094              | -257.4949      | -283.2318    | -2.7651         | -2.8042       |
| 0.6627        | 0.56  | 200  | 0.6995          | 0.1223          | 0.6858     | 0.0465         | 0.0311           | 0.5960             | 0.0153          | 0.0899              | -0.0496             | 0.0465              | -255.4640      | -279.9481    | -2.7485         | -2.7871       |
| 0.6293        | 0.85  | 300  | 0.7193          | 0.3552          | 0.6803     | 0.0675         | 0.0398           | 0.5960             | 0.0278          | 0.1601              | -0.0863             | 0.0826              | -254.6033      | -277.8385    | -2.7306         | -2.7684       |
| 0.6236        | 1.13  | 400  | 0.7519          | 0.6894          | 0.6756     | 0.0800         | 0.0412           | 0.6090             | 0.0388          | 0.2182              | -0.1140             | 0.1113              | -254.4585      | -276.5968    | -2.7119         | -2.7494       |
| 0.6009        | 1.41  | 500  | 0.8434          | 1.5495          | 0.6718     | 0.0639         | 0.0154           | 0.6090             | 0.0484          | 0.2709              | -0.1440             | 0.1389              | -257.0343      | -278.2061    | -2.6920         | -2.7295       |
| 0.6136        | 1.69  | 600  | 0.8727          | 1.8302          | 0.6691     | 0.0687         | 0.0134           | 0.6130             | 0.0553          | 0.3049              | -0.1595             | 0.1553              | -257.2360      | -277.7244    | -2.6827         | -2.7203       |
| 0.5918        | 1.97  | 700  | 0.8998          | 2.0811          | 0.6677     | 0.0671         | 0.0081           | 0.6220             | 0.0591          | 0.3231              | -0.1685             | 0.1641              | -257.7734      | -277.8808    | -2.6797         | -2.7172       |
| 0.5636        | 2.25  | 800  | 0.9371          | 2.4201          | 0.6667     | 0.0611         | -0.0007          | 0.6260             | 0.0618          | 0.3370              | -0.1777             | 0.1716              | -258.6473      | -278.4820    | -2.6734         | -2.7116       |
| 0.5736        | 2.54  | 900  | 0.9591          | 2.6268          | 0.6659     | 0.0578         | -0.0060          | 0.6320             | 0.0639          | 0.3467              | -0.1823             | 0.1764              | -259.1817      | -278.8090    | -2.6726         | -2.7107       |
| 0.5825        | 2.82  | 1000 | 0.9543          | 2.5810          | 0.6658     | 0.0598         | -0.0042          | 0.6290             | 0.0640          | 0.3475              | -0.1826             | 0.1767              | -259.0028      | -278.6134    | -2.6749         | -2.7127       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2