gpt1B_3e_DPO_model

This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0057
  • Rewards/chosen: 0.0954
  • Rewards/rejected: -6.8658
  • Rewards/accuracies: 1.0
  • Rewards/margins: 6.9612
  • Logps/rejected: -290.2744
  • Logps/chosen: -129.1244
  • Logits/rejected: -2.8564
  • Logits/chosen: -3.0705

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.423 0.1 25 0.2732 0.1382 -1.1127 0.9900 1.2509 -232.7433 -128.6963 -3.1835 -3.3419
0.0751 0.2 50 0.0774 0.1041 -2.9483 0.9933 3.0524 -251.0999 -129.0378 -3.0438 -3.2267
0.0333 0.3 75 0.0306 0.0559 -4.4302 0.9967 4.4861 -265.9183 -129.5195 -2.9445 -3.1419
0.0148 0.4 100 0.0184 0.0605 -5.1453 0.9967 5.2058 -273.0695 -129.4732 -2.8915 -3.0970
0.0117 0.5 125 0.0137 0.0536 -5.6291 1.0 5.6827 -277.9078 -129.5426 -2.8703 -3.0794
0.0089 0.6 150 0.0111 0.0653 -5.8777 1.0 5.9429 -280.3931 -129.4260 -2.8768 -3.0866
0.0072 0.7 175 0.0094 0.1032 -5.9748 1.0 6.0780 -281.3649 -129.0464 -2.8814 -3.0918
0.0086 0.79 200 0.0079 0.1381 -6.1527 1.0 6.2909 -283.1440 -128.6971 -2.8828 -3.0942
0.0055 0.89 225 0.0075 0.1177 -6.3321 1.0 6.4498 -284.9379 -128.9017 -2.8750 -3.0877
0.0039 0.99 250 0.0072 0.1005 -6.4757 1.0 6.5762 -286.3734 -129.0737 -2.8678 -3.0809
0.0028 1.09 275 0.0067 0.0874 -6.6119 1.0 6.6992 -287.7352 -129.2047 -2.8641 -3.0772
0.0049 1.19 300 0.0063 0.0960 -6.6507 1.0 6.7467 -288.1233 -129.1187 -2.8637 -3.0768
0.0032 1.29 325 0.0064 0.0981 -6.6773 1.0 6.7753 -288.3892 -129.0980 -2.8629 -3.0763
0.0035 1.39 350 0.0061 0.0994 -6.7027 1.0 6.8021 -288.6437 -129.0850 -2.8638 -3.0770
0.0027 1.49 375 0.0059 0.0970 -6.7348 1.0 6.8318 -288.9645 -129.1081 -2.8629 -3.0763
0.0024 1.59 400 0.0059 0.1007 -6.7581 1.0 6.8588 -289.1978 -129.0716 -2.8616 -3.0752
0.0035 1.69 425 0.0058 0.1019 -6.7665 1.0 6.8684 -289.2819 -129.0595 -2.8609 -3.0746
0.0025 1.79 450 0.0059 0.1000 -6.7839 1.0 6.8839 -289.4559 -129.0789 -2.8602 -3.0739
0.0024 1.89 475 0.0056 0.0982 -6.8036 1.0 6.9018 -289.6526 -129.0969 -2.8595 -3.0732
0.0026 1.99 500 0.0057 0.0978 -6.8226 1.0 6.9204 -289.8423 -129.1003 -2.8586 -3.0724
0.0033 2.09 525 0.0057 0.0952 -6.8383 1.0 6.9335 -289.9999 -129.1269 -2.8571 -3.0711
0.003 2.19 550 0.0056 0.0966 -6.8546 1.0 6.9513 -290.1629 -129.1121 -2.8571 -3.0712
0.0024 2.29 575 0.0057 0.0957 -6.8546 1.0 6.9503 -290.1624 -129.1215 -2.8571 -3.0712
0.0038 2.38 600 0.0057 0.0959 -6.8568 1.0 6.9527 -290.1844 -129.1196 -2.8572 -3.0712
0.0026 2.48 625 0.0056 0.0943 -6.8630 1.0 6.9574 -290.2470 -129.1351 -2.8571 -3.0710
0.0031 2.58 650 0.0056 0.0937 -6.8627 1.0 6.9564 -290.2435 -129.1417 -2.8565 -3.0704
0.0024 2.68 675 0.0057 0.0961 -6.8653 1.0 6.9614 -290.2693 -129.1175 -2.8568 -3.0709
0.0022 2.78 700 0.0057 0.0960 -6.8628 1.0 6.9588 -290.2445 -129.1185 -2.8567 -3.0707
0.002 2.88 725 0.0057 0.0944 -6.8626 1.0 6.9570 -290.2426 -129.1347 -2.8565 -3.0705
0.0023 2.98 750 0.0057 0.0954 -6.8658 1.0 6.9612 -290.2744 -129.1244 -2.8564 -3.0705

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for thorirhrafn/gpt1B_3e_DPO_model

Adapter
(15)
this model