mistralit2_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6191
  • Rewards/chosen: -1.8431
  • Rewards/rejected: -2.7054
  • Rewards/accuracies: 0.6505
  • Rewards/margins: 0.8623
  • Logps/rejected: -37.5904
  • Logps/chosen: -29.5295
  • Logits/rejected: -2.8238
  • Logits/chosen: -2.8242

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6777 0.1 50 0.6740 -0.1496 -0.1942 0.5824 0.0446 -29.2197 -23.8845 -2.8632 -2.8635
0.6077 0.2 100 0.6364 -1.2703 -1.6253 0.5846 0.3550 -33.9902 -27.6202 -2.8384 -2.8387
0.4959 0.29 150 0.6488 -2.0038 -2.5512 0.5934 0.5473 -37.0763 -30.0653 -2.8343 -2.8347
0.553 0.39 200 0.5977 -0.9571 -1.3986 0.6374 0.4415 -33.2344 -26.5762 -2.8518 -2.8521
0.6334 0.49 250 0.5740 -0.6757 -1.1710 0.6440 0.4953 -32.4758 -25.6382 -2.8479 -2.8482
0.5613 0.59 300 0.5961 -1.4901 -2.1568 0.6374 0.6666 -35.7616 -28.3529 -2.8436 -2.8439
0.5182 0.68 350 0.6175 -1.8099 -2.5639 0.6418 0.7541 -37.1189 -29.4187 -2.8403 -2.8407
0.6292 0.78 400 0.6197 -1.8949 -2.6751 0.6418 0.7802 -37.4896 -29.7022 -2.8352 -2.8356
0.6529 0.88 450 0.5986 -1.3908 -2.0689 0.6527 0.6781 -35.4687 -28.0218 -2.8394 -2.8398
0.5042 0.98 500 0.5930 -1.2223 -1.8903 0.6637 0.6680 -34.8735 -27.4602 -2.8391 -2.8395
0.364 1.07 550 0.5917 -1.3579 -2.0905 0.6659 0.7327 -35.5409 -27.9120 -2.8340 -2.8344
0.346 1.17 600 0.6084 -1.6411 -2.4313 0.6527 0.7903 -36.6769 -28.8561 -2.8286 -2.8291
0.4524 1.27 650 0.6120 -1.7303 -2.5496 0.6484 0.8192 -37.0710 -29.1536 -2.8265 -2.8269
0.3422 1.37 700 0.6172 -1.7895 -2.6271 0.6505 0.8376 -37.3293 -29.3507 -2.8252 -2.8257
0.2776 1.46 750 0.6164 -1.8100 -2.6641 0.6462 0.8541 -37.4528 -29.4193 -2.8245 -2.8249
0.3599 1.56 800 0.6201 -1.8360 -2.6887 0.6484 0.8527 -37.5348 -29.5057 -2.8241 -2.8246
0.4059 1.66 850 0.6205 -1.8421 -2.6971 0.6440 0.8550 -37.5629 -29.5263 -2.8241 -2.8246
0.3417 1.76 900 0.6190 -1.8389 -2.6983 0.6505 0.8594 -37.5666 -29.5155 -2.8239 -2.8243
0.3409 1.86 950 0.6195 -1.8423 -2.7030 0.6484 0.8606 -37.5823 -29.5270 -2.8237 -2.8242
0.2802 1.95 1000 0.6191 -1.8431 -2.7054 0.6505 0.8623 -37.5904 -29.5295 -2.8238 -2.8242

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
7
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/mistralit2_1000_STEPS_1e7_rate_03_beta_DPO

Finetuned
(919)
this model