---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full-magpi-reward-scale-1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full-magpi-reward-scale-1

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0007
- Rewards/chosen: -2.2108
- Rewards/rejected: -84.9647
- Rewards/accuracies: 1.0
- Rewards/margins: 82.7539
- Logps/rejected: -9137.2617
- Logps/chosen: -588.0638
- Logits/rejected: 4.7555
- Logits/chosen: -0.0442

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.0158        | 0.1420 | 50   | 0.0104          | -1.4246        | -52.2276         | 0.9960             | 50.8030         | -5863.5493     | -509.4384    | -2.3895         | -2.8735       |
| 0.0111        | 0.2841 | 100  | 0.0024          | -2.2596        | -93.3357         | 1.0                | 91.0762         | -9974.3623     | -592.9393    | 0.1786          | -2.9450       |
| 0.0039        | 0.4261 | 150  | 0.0016          | -2.3351        | -100.4879        | 1.0                | 98.1528         | -10689.5820    | -600.4949    | 2.3549          | -1.8831       |
| 0.0022        | 0.5682 | 200  | 0.0012          | -2.2027        | -86.1756         | 1.0                | 83.9729         | -9258.3438     | -587.2476    | 2.5108          | -1.6726       |
| 0.0022        | 0.7102 | 250  | 0.0008          | -2.2903        | -83.1896         | 1.0                | 80.8993         | -8959.7471     | -596.0095    | 3.7585          | -1.0150       |
| 0.001         | 0.8523 | 300  | 0.0007          | -2.1936        | -83.9541         | 1.0                | 81.7606         | -9036.2012     | -586.3376    | 4.7089          | -0.1221       |
| 0.008         | 0.9943 | 350  | 0.0007          | -2.2108        | -84.9647         | 1.0                | 82.7539         | -9137.2617     | -588.0638    | 4.7555          | -0.0442       |


### Framework versions

- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1