--- license: apache-2.0 library_name: peft tags: - generated_from_trainer base_model: mnoukhov/pythia410m-sft-tldr model-index: - name: pythia410m-dpo-tldr-lr1e-5 results: [] --- # pythia410m-dpo-tldr-lr1e-5 This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.5595 - Rewards/chosen: -0.9059 - Rewards/rejected: -1.3735 - Rewards/accuracies: 0.7113 - Rewards/margins: 0.4677 - Logps/rejected: -88.3830 - Logps/chosen: -88.3830 - Logps/ref Rejected: -63.5119 - Logps/ref Chosen: -70.2656 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - num_epochs: 1.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Rejected | Logps/ref Chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:----------------:| | 0.6295 | 0.2 | 291 | 0.5864 | -0.5101 | -0.8319 | 0.7039 | 0.3218 | -80.4685 | -80.4685 | -63.5119 | -70.2656 | | 0.5926 | 0.4 | 582 | 0.5600 | -0.9009 | -1.3738 | 0.7120 | 0.4728 | -88.2839 | -88.2839 | -63.5119 | -70.2656 | | 0.5761 | 0.6 | 873 | 0.5585 | -0.9509 | -1.4326 | 0.7110 | 0.4817 | -89.2846 | -89.2846 | -63.5119 | -70.2656 | | 0.5678 | 0.8 | 1164 | 0.5595 | -0.9059 | -1.3735 | 0.7113 | 0.4677 | -88.3830 | -88.3830 | -63.5119 | -70.2656 | ### Framework versions - PEFT 0.10.0 - Transformers 4.38.2 - Pytorch 2.1.2+cu121 - Datasets 2.17.0 - Tokenizers 0.15.2