mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3501
  • Rewards/chosen: -4.6533
  • Rewards/rejected: -7.2695
  • Rewards/accuracies: 0.6044
  • Rewards/margins: 2.6162
  • Logps/rejected: -52.8039
  • Logps/chosen: -38.8969
  • Logits/rejected: -2.8818
  • Logits/chosen: -2.8827

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6609 0.1 50 0.7439 -0.3799 -0.6639 0.5363 0.2840 -30.7855 -24.6521 -2.8212 -2.8215
0.7223 0.2 100 1.2179 -3.4197 -4.5833 0.5670 1.1636 -43.8500 -34.7847 -2.4935 -2.4943
1.5151 0.29 150 1.3451 -4.6461 -5.3198 0.4923 0.6737 -46.3050 -38.8727 -2.7810 -2.7816
1.5249 0.39 200 1.5370 -4.3700 -4.3686 0.4659 -0.0014 -43.1345 -37.9527 -2.9607 -2.9612
1.3975 0.49 250 1.2806 -3.4083 -3.9853 0.5319 0.5769 -41.8567 -34.7470 -2.9314 -2.9319
1.3304 0.59 300 1.3357 -2.0104 -2.3692 0.4945 0.3588 -36.4698 -30.0870 -2.9631 -2.9635
1.0439 0.68 350 1.2763 -0.5270 -0.8889 0.5077 0.3619 -31.5354 -25.1425 -2.8440 -2.8443
1.4598 0.78 400 1.2025 -2.3552 -3.1289 0.5560 0.7737 -39.0019 -31.2365 -3.1671 -3.1675
0.8046 0.88 450 1.2610 -2.5219 -3.3122 0.5538 0.7903 -39.6132 -31.7922 -2.8903 -2.8908
0.9395 0.98 500 1.1880 -1.6006 -2.5141 0.5451 0.9135 -36.9527 -28.7210 -2.7295 -2.7300
0.239 1.07 550 1.1556 -2.0692 -3.6279 0.5868 1.5587 -40.6656 -30.2832 -2.8301 -2.8308
0.1348 1.17 600 1.3248 -3.6765 -5.8923 0.5978 2.2158 -48.2133 -35.6409 -2.8392 -2.8400
0.328 1.27 650 1.2982 -3.5842 -5.5884 0.5868 2.0042 -47.2005 -35.3331 -2.8786 -2.8794
0.3605 1.37 700 1.2960 -4.0655 -6.4030 0.6000 2.3374 -49.9156 -36.9376 -2.8812 -2.8820
0.1389 1.46 750 1.3185 -4.2670 -6.7599 0.5956 2.4929 -51.1054 -37.6093 -2.8897 -2.8905
0.1871 1.56 800 1.3483 -4.5542 -7.1419 0.5978 2.5877 -52.3788 -38.5665 -2.8779 -2.8788
0.3556 1.66 850 1.3507 -4.6209 -7.2288 0.6000 2.6080 -52.6684 -38.7887 -2.8809 -2.8817
0.4099 1.76 900 1.3517 -4.6482 -7.2597 0.6022 2.6114 -52.7713 -38.8799 -2.8817 -2.8826
0.3996 1.86 950 1.3491 -4.6540 -7.2682 0.6044 2.6142 -52.7997 -38.8992 -2.8818 -2.8827
0.2013 1.95 1000 1.3501 -4.6533 -7.2695 0.6044 2.6162 -52.8039 -38.8969 -2.8818 -2.8827

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
7
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e6_03_Beta_DPO

Finetuned
(919)
this model