--- base_model: alignment-handbook/zephyr-7b-sft-full datasets: - generation/UF library_name: peft license: apache-2.0 tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: zephyr-dpop-qlora-uf-ours-5e-7 results: [] --- # zephyr-dpop-qlora-uf-ours-5e-7 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF dataset. It achieves the following results on the evaluation set: - Loss: 0.9543 - Positive Losses: 2.5736 - Dpo Losses: 0.6658 - Rewards/chosen: 0.0602 - Rewards/rejected: -0.0038 - Rewards/accuracies: 0.6300 - Rewards/margins: 0.0640 - Rewards/margins Max: 0.3473 - Rewards/margins Min: -0.1824 - Rewards/margins Std: 0.1766 - Logps/rejected: -258.9606 - Logps/chosen: -278.5768 - Logits/rejected: -2.6741 - Logits/chosen: -2.7121 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6884 | 0.28 | 100 | 0.6931 | 0.0111 | 0.6918 | 0.0136 | 0.0108 | 0.6080 | 0.0028 | 0.0179 | -0.0107 | 0.0094 | -257.4949 | -283.2318 | -2.7651 | -2.8042 | | 0.6627 | 0.56 | 200 | 0.6995 | 0.1223 | 0.6858 | 0.0465 | 0.0311 | 0.5960 | 0.0153 | 0.0899 | -0.0496 | 0.0465 | -255.4640 | -279.9481 | -2.7485 | -2.7871 | | 0.6293 | 0.85 | 300 | 0.7193 | 0.3552 | 0.6803 | 0.0675 | 0.0398 | 0.5960 | 0.0278 | 0.1601 | -0.0863 | 0.0826 | -254.6033 | -277.8385 | -2.7306 | -2.7684 | | 0.6236 | 1.13 | 400 | 0.7519 | 0.6894 | 0.6756 | 0.0800 | 0.0412 | 0.6090 | 0.0388 | 0.2182 | -0.1140 | 0.1113 | -254.4585 | -276.5968 | -2.7119 | -2.7494 | | 0.6009 | 1.41 | 500 | 0.8434 | 1.5495 | 0.6718 | 0.0639 | 0.0154 | 0.6090 | 0.0484 | 0.2709 | -0.1440 | 0.1389 | -257.0343 | -278.2061 | -2.6920 | -2.7295 | | 0.6136 | 1.69 | 600 | 0.8727 | 1.8302 | 0.6691 | 0.0687 | 0.0134 | 0.6130 | 0.0553 | 0.3049 | -0.1595 | 0.1553 | -257.2360 | -277.7244 | -2.6827 | -2.7203 | | 0.5918 | 1.97 | 700 | 0.8998 | 2.0811 | 0.6677 | 0.0671 | 0.0081 | 0.6220 | 0.0591 | 0.3231 | -0.1685 | 0.1641 | -257.7734 | -277.8808 | -2.6797 | -2.7172 | | 0.5636 | 2.25 | 800 | 0.9371 | 2.4201 | 0.6667 | 0.0611 | -0.0007 | 0.6260 | 0.0618 | 0.3370 | -0.1777 | 0.1716 | -258.6473 | -278.4820 | -2.6734 | -2.7116 | | 0.5736 | 2.54 | 900 | 0.9591 | 2.6268 | 0.6659 | 0.0578 | -0.0060 | 0.6320 | 0.0639 | 0.3467 | -0.1823 | 0.1764 | -259.1817 | -278.8090 | -2.6726 | -2.7107 | | 0.5825 | 2.82 | 1000 | 0.9543 | 2.5810 | 0.6658 | 0.0598 | -0.0042 | 0.6290 | 0.0640 | 0.3475 | -0.1826 | 0.1767 | -259.0028 | -278.6134 | -2.6749 | -2.7127 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2