--- base_model: alignment-handbook/zephyr-7b-sft-full datasets: - generation/UF library_name: peft license: apache-2.0 tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: zephyr-dpop-qlora-uf-ours-5e-6 results: [] --- # zephyr-dpop-qlora-uf-ours-5e-6 This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the generation/UF dataset. It achieves the following results on the evaluation set: - Loss: 5.1264 - Positive Losses: 43.1884 - Dpo Losses: 0.6101 - Rewards/chosen: -0.3903 - Rewards/rejected: -0.7274 - Rewards/accuracies: 0.6670 - Rewards/margins: 0.3370 - Rewards/margins Max: 1.4167 - Rewards/margins Min: -0.8378 - Rewards/margins Std: 0.7707 - Logps/rejected: -331.3143 - Logps/chosen: -323.6263 - Logits/rejected: -2.4808 - Logits/chosen: -2.5277 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Positive Losses | Dpo Losses | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:---------------:|:----------:|:--------------:|:----------------:|:------------------:|:---------------:|:-------------------:|:-------------------:|:-------------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6302 | 0.28 | 100 | 0.8170 | 1.2658 | 0.6732 | 0.0877 | 0.0421 | 0.5920 | 0.0456 | 0.2717 | -0.1472 | 0.1389 | -254.3717 | -275.8216 | -2.6655 | -2.7015 | | 0.5709 | 0.56 | 200 | 2.1527 | 14.1341 | 0.6518 | -0.0932 | -0.2071 | 0.6360 | 0.1139 | 0.6302 | -0.3647 | 0.3297 | -279.2877 | -293.9101 | -2.6591 | -2.6989 | | 0.4758 | 0.85 | 300 | 2.2508 | 15.0103 | 0.6396 | -0.0829 | -0.2324 | 0.6590 | 0.1495 | 0.7147 | -0.4138 | 0.3813 | -281.8231 | -292.8875 | -2.6866 | -2.7294 | | 0.4857 | 1.13 | 400 | 2.8413 | 20.4422 | 0.6295 | -0.1464 | -0.3473 | 0.6540 | 0.2010 | 0.9605 | -0.5524 | 0.5026 | -293.3139 | -299.2286 | -2.5810 | -2.6240 | | 0.6015 | 1.41 | 500 | 2.4297 | 16.2472 | 0.6215 | -0.0798 | -0.3011 | 0.6660 | 0.2213 | 0.9834 | -0.5416 | 0.5125 | -288.6871 | -292.5703 | -2.5803 | -2.6246 | | 0.4849 | 1.69 | 600 | 3.8077 | 30.0769 | 0.6153 | -0.2435 | -0.5155 | 0.6630 | 0.2721 | 1.1651 | -0.6779 | 0.6337 | -310.1338 | -308.9421 | -2.5659 | -2.6120 | | 0.4012 | 1.97 | 700 | 4.4359 | 36.7814 | 0.6160 | -0.3161 | -0.6003 | 0.6660 | 0.2841 | 1.2285 | -0.7320 | 0.6759 | -318.6039 | -316.2043 | -2.5208 | -2.5672 | | 0.3245 | 2.25 | 800 | 4.9873 | 41.8073 | 0.6123 | -0.3752 | -0.6988 | 0.6660 | 0.3236 | 1.3768 | -0.8214 | 0.7506 | -328.4567 | -322.1156 | -2.4952 | -2.5421 | | 0.3018 | 2.54 | 900 | 5.0342 | 42.1224 | 0.6084 | -0.3810 | -0.7194 | 0.6680 | 0.3383 | 1.4141 | -0.8336 | 0.7645 | -330.5147 | -322.6951 | -2.4804 | -2.5276 | | 0.4364 | 2.82 | 1000 | 5.0975 | 42.8746 | 0.6098 | -0.3872 | -0.7242 | 0.6680 | 0.3370 | 1.4157 | -0.8369 | 0.7695 | -331.0000 | -323.3101 | -2.4816 | -2.5285 | ### Framework versions - PEFT 0.7.1 - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.14.6 - Tokenizers 0.15.2