Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6668
  • Rewards/chosen: -0.2672
  • Rewards/rejected: -0.3491
  • Rewards/accuracies: 0.6137
  • Rewards/margins: 0.0819
  • Logps/rejected: -378.9569
  • Logps/chosen: -361.0521
  • Logits/rejected: -2.5949
  • Logits/chosen: -2.5884

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6933 0.08 100 0.6930 -0.0077 -0.0080 0.5177 0.0004 -344.8478 -335.0984 -2.4838 -2.4768
0.6926 0.16 200 0.6923 -0.0138 -0.0155 0.5427 0.0017 -345.5920 -335.7114 -2.4836 -2.4766
0.6906 0.24 300 0.6917 -0.0130 -0.0161 0.5523 0.0031 -345.6560 -335.6324 -2.4879 -2.4809
0.6884 0.32 400 0.6898 -0.0075 -0.0146 0.5807 0.0071 -345.4990 -335.0794 -2.4972 -2.4901
0.6753 0.4 500 0.6856 -0.1385 -0.1579 0.5630 0.0194 -359.8317 -348.1783 -2.4986 -2.4916
0.6839 0.48 600 0.6815 -0.3188 -0.3556 0.5667 0.0368 -379.6049 -366.2155 -2.5394 -2.5333
0.6535 0.56 700 0.6770 -0.4204 -0.4741 0.5763 0.0537 -391.4496 -376.3719 -2.5483 -2.5425
0.6764 0.64 800 0.6724 -0.2481 -0.3087 0.5990 0.0606 -374.9128 -359.1413 -2.5714 -2.5651
0.6753 0.72 900 0.6704 -0.4283 -0.5062 0.5983 0.0780 -394.6671 -377.1592 -2.5807 -2.5750
0.6459 0.8 1000 0.6680 -0.2406 -0.3163 0.6127 0.0757 -375.6733 -358.3894 -2.5924 -2.5858
0.6541 0.88 1100 0.6670 -0.2806 -0.3625 0.6157 0.0820 -380.2968 -362.3882 -2.5942 -2.5878
0.6422 0.96 1200 0.6669 -0.2657 -0.3473 0.6157 0.0817 -378.7738 -360.8972 -2.5963 -2.5898

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
15
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for shenxq/zephyr-7b-dpo-qlora

Adapter
(1170)
this model

Dataset used to train shenxq/zephyr-7b-dpo-qlora