--- license: apache-2.0 library_name: peft tags: - trl - dpo - generated_from_trainer base_model: AI-Sweden-Models/gpt-sw3-1.3b model-index: - name: gpt1B_3e_DPO_model results: [] --- # gpt1B_3e_DPO_model This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0057 - Rewards/chosen: 0.0954 - Rewards/rejected: -6.8658 - Rewards/accuracies: 1.0 - Rewards/margins: 6.9612 - Logps/rejected: -290.2744 - Logps/chosen: -129.1244 - Logits/rejected: -2.8564 - Logits/chosen: -3.0705 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.423 | 0.1 | 25 | 0.2732 | 0.1382 | -1.1127 | 0.9900 | 1.2509 | -232.7433 | -128.6963 | -3.1835 | -3.3419 | | 0.0751 | 0.2 | 50 | 0.0774 | 0.1041 | -2.9483 | 0.9933 | 3.0524 | -251.0999 | -129.0378 | -3.0438 | -3.2267 | | 0.0333 | 0.3 | 75 | 0.0306 | 0.0559 | -4.4302 | 0.9967 | 4.4861 | -265.9183 | -129.5195 | -2.9445 | -3.1419 | | 0.0148 | 0.4 | 100 | 0.0184 | 0.0605 | -5.1453 | 0.9967 | 5.2058 | -273.0695 | -129.4732 | -2.8915 | -3.0970 | | 0.0117 | 0.5 | 125 | 0.0137 | 0.0536 | -5.6291 | 1.0 | 5.6827 | -277.9078 | -129.5426 | -2.8703 | -3.0794 | | 0.0089 | 0.6 | 150 | 0.0111 | 0.0653 | -5.8777 | 1.0 | 5.9429 | -280.3931 | -129.4260 | -2.8768 | -3.0866 | | 0.0072 | 0.7 | 175 | 0.0094 | 0.1032 | -5.9748 | 1.0 | 6.0780 | -281.3649 | -129.0464 | -2.8814 | -3.0918 | | 0.0086 | 0.79 | 200 | 0.0079 | 0.1381 | -6.1527 | 1.0 | 6.2909 | -283.1440 | -128.6971 | -2.8828 | -3.0942 | | 0.0055 | 0.89 | 225 | 0.0075 | 0.1177 | -6.3321 | 1.0 | 6.4498 | -284.9379 | -128.9017 | -2.8750 | -3.0877 | | 0.0039 | 0.99 | 250 | 0.0072 | 0.1005 | -6.4757 | 1.0 | 6.5762 | -286.3734 | -129.0737 | -2.8678 | -3.0809 | | 0.0028 | 1.09 | 275 | 0.0067 | 0.0874 | -6.6119 | 1.0 | 6.6992 | -287.7352 | -129.2047 | -2.8641 | -3.0772 | | 0.0049 | 1.19 | 300 | 0.0063 | 0.0960 | -6.6507 | 1.0 | 6.7467 | -288.1233 | -129.1187 | -2.8637 | -3.0768 | | 0.0032 | 1.29 | 325 | 0.0064 | 0.0981 | -6.6773 | 1.0 | 6.7753 | -288.3892 | -129.0980 | -2.8629 | -3.0763 | | 0.0035 | 1.39 | 350 | 0.0061 | 0.0994 | -6.7027 | 1.0 | 6.8021 | -288.6437 | -129.0850 | -2.8638 | -3.0770 | | 0.0027 | 1.49 | 375 | 0.0059 | 0.0970 | -6.7348 | 1.0 | 6.8318 | -288.9645 | -129.1081 | -2.8629 | -3.0763 | | 0.0024 | 1.59 | 400 | 0.0059 | 0.1007 | -6.7581 | 1.0 | 6.8588 | -289.1978 | -129.0716 | -2.8616 | -3.0752 | | 0.0035 | 1.69 | 425 | 0.0058 | 0.1019 | -6.7665 | 1.0 | 6.8684 | -289.2819 | -129.0595 | -2.8609 | -3.0746 | | 0.0025 | 1.79 | 450 | 0.0059 | 0.1000 | -6.7839 | 1.0 | 6.8839 | -289.4559 | -129.0789 | -2.8602 | -3.0739 | | 0.0024 | 1.89 | 475 | 0.0056 | 0.0982 | -6.8036 | 1.0 | 6.9018 | -289.6526 | -129.0969 | -2.8595 | -3.0732 | | 0.0026 | 1.99 | 500 | 0.0057 | 0.0978 | -6.8226 | 1.0 | 6.9204 | -289.8423 | -129.1003 | -2.8586 | -3.0724 | | 0.0033 | 2.09 | 525 | 0.0057 | 0.0952 | -6.8383 | 1.0 | 6.9335 | -289.9999 | -129.1269 | -2.8571 | -3.0711 | | 0.003 | 2.19 | 550 | 0.0056 | 0.0966 | -6.8546 | 1.0 | 6.9513 | -290.1629 | -129.1121 | -2.8571 | -3.0712 | | 0.0024 | 2.29 | 575 | 0.0057 | 0.0957 | -6.8546 | 1.0 | 6.9503 | -290.1624 | -129.1215 | -2.8571 | -3.0712 | | 0.0038 | 2.38 | 600 | 0.0057 | 0.0959 | -6.8568 | 1.0 | 6.9527 | -290.1844 | -129.1196 | -2.8572 | -3.0712 | | 0.0026 | 2.48 | 625 | 0.0056 | 0.0943 | -6.8630 | 1.0 | 6.9574 | -290.2470 | -129.1351 | -2.8571 | -3.0710 | | 0.0031 | 2.58 | 650 | 0.0056 | 0.0937 | -6.8627 | 1.0 | 6.9564 | -290.2435 | -129.1417 | -2.8565 | -3.0704 | | 0.0024 | 2.68 | 675 | 0.0057 | 0.0961 | -6.8653 | 1.0 | 6.9614 | -290.2693 | -129.1175 | -2.8568 | -3.0709 | | 0.0022 | 2.78 | 700 | 0.0057 | 0.0960 | -6.8628 | 1.0 | 6.9588 | -290.2445 | -129.1185 | -2.8567 | -3.0707 | | 0.002 | 2.88 | 725 | 0.0057 | 0.0944 | -6.8626 | 1.0 | 6.9570 | -290.2426 | -129.1347 | -2.8565 | -3.0705 | | 0.0023 | 2.98 | 750 | 0.0057 | 0.0954 | -6.8658 | 1.0 | 6.9612 | -290.2744 | -129.1244 | -2.8564 | -3.0705 | ### Framework versions - PEFT 0.8.2 - Transformers 4.38.1 - Pytorch 2.2.0+cu118 - Datasets 2.17.1 - Tokenizers 0.15.2