---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6712
- Rewards/chosen: -2.0287
- Rewards/rejected: -3.3245
- Rewards/accuracies: 0.7639
- Rewards/margins: 1.2958
- Logps/rejected: -594.2247
- Logps/chosen: -486.9804
- Logits/rejected: 3.7376
- Logits/chosen: 2.4533

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.8957        | 0.1   | 100  | 0.9028          | -0.5210        | -0.6849          | 0.6905             | 0.1639          | -330.2668      | -336.2060    | -2.5107         | -2.5460       |
| 0.7658        | 0.21  | 200  | 0.7650          | -0.9414        | -1.6932          | 0.7460             | 0.7519          | -431.1015      | -378.2476    | 0.3347          | -0.1529       |
| 0.7079        | 0.31  | 300  | 0.7289          | -1.3837        | -2.4868          | 0.7560             | 1.1031          | -510.4591      | -422.4754    | 1.8370          | 0.8744        |
| 0.6806        | 0.42  | 400  | 0.7040          | -1.3285        | -2.4190          | 0.7698             | 1.0904          | -503.6740      | -416.9630    | 1.2713          | 0.0992        |
| 0.7129        | 0.52  | 500  | 0.6980          | -1.4621        | -2.5268          | 0.7440             | 1.0648          | -514.4609      | -430.3167    | 2.3343          | 1.4091        |
| 0.6636        | 0.63  | 600  | 0.6877          | -1.3328        | -2.5188          | 0.75               | 1.1861          | -513.6627      | -417.3850    | 2.2082          | 0.7470        |
| 0.6217        | 0.73  | 700  | 0.6762          | -1.8908        | -3.1786          | 0.7698             | 1.2878          | -579.6354      | -473.1887    | 3.8163          | 2.5932        |
| 0.6418        | 0.84  | 800  | 0.6712          | -2.0993        | -3.4028          | 0.7679             | 1.3035          | -602.0607      | -494.0422    | 3.8655          | 2.6092        |
| 0.6678        | 0.94  | 900  | 0.6716          | -2.0307        | -3.3233          | 0.7639             | 1.2926          | -594.1103      | -487.1844    | 3.7332          | 2.4518        |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.14.6
- Tokenizers 0.15.2