gpt1B_3e_DPO_model
This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0057
- Rewards/chosen: 0.0954
- Rewards/rejected: -6.8658
- Rewards/accuracies: 1.0
- Rewards/margins: 6.9612
- Logps/rejected: -290.2744
- Logps/chosen: -129.1244
- Logits/rejected: -2.8564
- Logits/chosen: -3.0705
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.423 | 0.1 | 25 | 0.2732 | 0.1382 | -1.1127 | 0.9900 | 1.2509 | -232.7433 | -128.6963 | -3.1835 | -3.3419 |
0.0751 | 0.2 | 50 | 0.0774 | 0.1041 | -2.9483 | 0.9933 | 3.0524 | -251.0999 | -129.0378 | -3.0438 | -3.2267 |
0.0333 | 0.3 | 75 | 0.0306 | 0.0559 | -4.4302 | 0.9967 | 4.4861 | -265.9183 | -129.5195 | -2.9445 | -3.1419 |
0.0148 | 0.4 | 100 | 0.0184 | 0.0605 | -5.1453 | 0.9967 | 5.2058 | -273.0695 | -129.4732 | -2.8915 | -3.0970 |
0.0117 | 0.5 | 125 | 0.0137 | 0.0536 | -5.6291 | 1.0 | 5.6827 | -277.9078 | -129.5426 | -2.8703 | -3.0794 |
0.0089 | 0.6 | 150 | 0.0111 | 0.0653 | -5.8777 | 1.0 | 5.9429 | -280.3931 | -129.4260 | -2.8768 | -3.0866 |
0.0072 | 0.7 | 175 | 0.0094 | 0.1032 | -5.9748 | 1.0 | 6.0780 | -281.3649 | -129.0464 | -2.8814 | -3.0918 |
0.0086 | 0.79 | 200 | 0.0079 | 0.1381 | -6.1527 | 1.0 | 6.2909 | -283.1440 | -128.6971 | -2.8828 | -3.0942 |
0.0055 | 0.89 | 225 | 0.0075 | 0.1177 | -6.3321 | 1.0 | 6.4498 | -284.9379 | -128.9017 | -2.8750 | -3.0877 |
0.0039 | 0.99 | 250 | 0.0072 | 0.1005 | -6.4757 | 1.0 | 6.5762 | -286.3734 | -129.0737 | -2.8678 | -3.0809 |
0.0028 | 1.09 | 275 | 0.0067 | 0.0874 | -6.6119 | 1.0 | 6.6992 | -287.7352 | -129.2047 | -2.8641 | -3.0772 |
0.0049 | 1.19 | 300 | 0.0063 | 0.0960 | -6.6507 | 1.0 | 6.7467 | -288.1233 | -129.1187 | -2.8637 | -3.0768 |
0.0032 | 1.29 | 325 | 0.0064 | 0.0981 | -6.6773 | 1.0 | 6.7753 | -288.3892 | -129.0980 | -2.8629 | -3.0763 |
0.0035 | 1.39 | 350 | 0.0061 | 0.0994 | -6.7027 | 1.0 | 6.8021 | -288.6437 | -129.0850 | -2.8638 | -3.0770 |
0.0027 | 1.49 | 375 | 0.0059 | 0.0970 | -6.7348 | 1.0 | 6.8318 | -288.9645 | -129.1081 | -2.8629 | -3.0763 |
0.0024 | 1.59 | 400 | 0.0059 | 0.1007 | -6.7581 | 1.0 | 6.8588 | -289.1978 | -129.0716 | -2.8616 | -3.0752 |
0.0035 | 1.69 | 425 | 0.0058 | 0.1019 | -6.7665 | 1.0 | 6.8684 | -289.2819 | -129.0595 | -2.8609 | -3.0746 |
0.0025 | 1.79 | 450 | 0.0059 | 0.1000 | -6.7839 | 1.0 | 6.8839 | -289.4559 | -129.0789 | -2.8602 | -3.0739 |
0.0024 | 1.89 | 475 | 0.0056 | 0.0982 | -6.8036 | 1.0 | 6.9018 | -289.6526 | -129.0969 | -2.8595 | -3.0732 |
0.0026 | 1.99 | 500 | 0.0057 | 0.0978 | -6.8226 | 1.0 | 6.9204 | -289.8423 | -129.1003 | -2.8586 | -3.0724 |
0.0033 | 2.09 | 525 | 0.0057 | 0.0952 | -6.8383 | 1.0 | 6.9335 | -289.9999 | -129.1269 | -2.8571 | -3.0711 |
0.003 | 2.19 | 550 | 0.0056 | 0.0966 | -6.8546 | 1.0 | 6.9513 | -290.1629 | -129.1121 | -2.8571 | -3.0712 |
0.0024 | 2.29 | 575 | 0.0057 | 0.0957 | -6.8546 | 1.0 | 6.9503 | -290.1624 | -129.1215 | -2.8571 | -3.0712 |
0.0038 | 2.38 | 600 | 0.0057 | 0.0959 | -6.8568 | 1.0 | 6.9527 | -290.1844 | -129.1196 | -2.8572 | -3.0712 |
0.0026 | 2.48 | 625 | 0.0056 | 0.0943 | -6.8630 | 1.0 | 6.9574 | -290.2470 | -129.1351 | -2.8571 | -3.0710 |
0.0031 | 2.58 | 650 | 0.0056 | 0.0937 | -6.8627 | 1.0 | 6.9564 | -290.2435 | -129.1417 | -2.8565 | -3.0704 |
0.0024 | 2.68 | 675 | 0.0057 | 0.0961 | -6.8653 | 1.0 | 6.9614 | -290.2693 | -129.1175 | -2.8568 | -3.0709 |
0.0022 | 2.78 | 700 | 0.0057 | 0.0960 | -6.8628 | 1.0 | 6.9588 | -290.2445 | -129.1185 | -2.8567 | -3.0707 |
0.002 | 2.88 | 725 | 0.0057 | 0.0944 | -6.8626 | 1.0 | 6.9570 | -290.2426 | -129.1347 | -2.8565 | -3.0705 |
0.0023 | 2.98 | 750 | 0.0057 | 0.0954 | -6.8658 | 1.0 | 6.9612 | -290.2744 | -129.1244 | -2.8564 | -3.0705 |
Framework versions
- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for thorirhrafn/gpt1B_3e_DPO_model
Base model
AI-Sweden-Models/gpt-sw3-1.3b