results

This model is a fine-tuned version of microsoft/phi-1_5 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0001
  • Rewards/chosen: -7.5874
  • Rewards/rejected: -24.0497
  • Rewards/accuracies: 1.0
  • Rewards/margins: 16.4623
  • Logps/rejected: -274.3435
  • Logps/chosen: -143.2090
  • Logits/rejected: -1.8100
  • Logits/chosen: -1.4786

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1500

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0417 0.07 100 0.0418 -0.3892 -8.0118 0.9792 7.6226 -113.9640 -71.2264 1.8258 1.7898
0.0221 0.15 200 0.0303 -2.5657 -10.9212 0.9896 8.3555 -143.0585 -92.9920 1.9704 2.1047
0.0107 0.22 300 0.0131 -1.7388 -11.6047 0.9965 9.8659 -149.8935 -84.7232 1.0731 0.9750
0.0204 0.29 400 0.0108 -2.0131 -11.9647 0.9965 9.9516 -153.4932 -87.4658 1.3610 1.6740
0.0067 0.36 500 0.0080 -5.9488 -19.6561 0.9974 13.7073 -230.4076 -126.8228 -0.4464 -0.2114
0.0 0.44 600 0.0047 -5.6456 -20.2381 0.9983 14.5924 -236.2268 -123.7909 -0.4142 -0.0244
0.0003 0.51 700 0.0018 -7.2250 -21.3351 0.9991 14.1101 -247.1974 -139.5853 -0.3510 -0.0203
0.0005 0.58 800 0.0008 -7.2263 -21.2475 0.9991 14.0211 -246.3209 -139.5981 -0.8673 -0.7010
0.0 0.66 900 0.0009 -10.2371 -26.0402 0.9991 15.8031 -294.2486 -169.7062 -1.9784 -1.7799
0.0 0.73 1000 0.0008 -5.9544 -22.0767 0.9991 16.1223 -254.6137 -126.8789 -1.0623 -0.6039
0.0 0.8 1100 0.0007 -7.3374 -23.8700 0.9991 16.5327 -272.5467 -140.7083 -1.5517 -1.1710
0.0 0.87 1200 0.0007 -7.6398 -24.1605 0.9991 16.5207 -275.4509 -143.7327 -1.8124 -1.4901
0.0 0.95 1300 0.0001 -7.5920 -24.0476 1.0 16.4556 -274.3220 -143.2550 -1.8115 -1.4816
0.0001 1.02 1400 0.0001 -7.5872 -24.0480 1.0 16.4608 -274.3262 -143.2065 -1.8102 -1.4791
0.0 1.09 1500 0.0001 -7.5874 -24.0497 1.0 16.4623 -274.3435 -143.2090 -1.8100 -1.4786

Framework versions

  • Transformers 4.33.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for TrevorJS/mtg-phi-1_5-dpo-qlora

Base model

microsoft/phi-1_5
Finetuned
(220)
this model