Edit model card

phi-2-ipo-ultrafeedback-lora

This model is a fine-tuned version of lole25/phi-2-sft-ultrachat-lora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 2156.2256
  • Rewards/chosen: -0.1105
  • Rewards/rejected: -0.1771
  • Rewards/accuracies: 0.6940
  • Rewards/margins: 0.0666
  • Logps/rejected: -249.1476
  • Logps/chosen: -271.2955
  • Logits/rejected: 0.7668
  • Logits/chosen: 0.6624

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2494.2439 0.21 100 2494.1194 -0.0001 -0.0010 0.5480 0.0009 -231.5405 -260.2577 0.9164 0.8142
2425.7957 0.42 200 2420.3296 -0.0052 -0.0154 0.6560 0.0101 -232.9728 -260.7673 0.9218 0.8183
2310.102 0.63 300 2309.9451 -0.0300 -0.0576 0.6680 0.0276 -237.1959 -263.2440 0.9088 0.8041
2159.0707 0.84 400 2236.2759 -0.0634 -0.1085 0.6840 0.0451 -242.2857 -266.5839 0.8637 0.7578
2176.8641 1.05 500 2197.5420 -0.0903 -0.1463 0.6980 0.0560 -246.0634 -269.2716 0.8180 0.7125
2066.3285 1.26 600 2177.3389 -0.1014 -0.1628 0.6960 0.0614 -247.7128 -270.3855 0.7927 0.6879
2119.5369 1.47 700 2166.3855 -0.1054 -0.1702 0.6960 0.0648 -248.4533 -270.7824 0.7771 0.6726
2096.7854 1.67 800 2159.7104 -0.1091 -0.1756 0.6960 0.0665 -248.9965 -271.1501 0.7684 0.6641
2094.5041 1.88 900 2158.6299 -0.1103 -0.1768 0.6980 0.0665 -249.1140 -271.2745 0.7690 0.6646

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lole25/phi-2-ipo-ultrafeedback-lora

Base model

microsoft/phi-2
Adapter
(633)
this model

Dataset used to train lole25/phi-2-ipo-ultrafeedback-lora