Edit model card

zephyr-7b-dpop-uf6k-qlora-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF6konly dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6877
  • Positive Losses: 0.0653
  • Dpo Losses: 0.6792
  • Rewards/chosen: 0.0804
  • Rewards/rejected: 0.0515
  • Rewards/accuracies: 0.6786
  • Rewards/margins: 0.0289
  • Rewards/margins Max: 0.0914
  • Rewards/margins Min: -0.0315
  • Rewards/margins Std: 0.0551
  • Logps/rejected: -254.0317
  • Logps/chosen: -277.1849
  • Logits/rejected: -2.8006
  • Logits/chosen: -2.8458

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6932 0.3 100 0.6930 0.0052 0.6925 0.0096 0.0083 0.5794 0.0012 0.0066 -0.0040 0.0047 -258.3495 -284.2642 -2.8141 -2.8594
0.6919 0.61 200 0.6915 0.0119 0.6904 0.0242 0.0187 0.6667 0.0055 0.0195 -0.0083 0.0124 -257.3141 -282.7999 -2.8133 -2.8583
0.6903 0.91 300 0.6899 0.0165 0.6876 0.0395 0.0283 0.6667 0.0112 0.0379 -0.0143 0.0232 -256.3544 -281.2695 -2.8086 -2.8537
0.6832 1.22 400 0.6892 0.0304 0.6847 0.0525 0.0351 0.7024 0.0174 0.0557 -0.0196 0.0337 -255.6741 -279.9755 -2.8057 -2.8507
0.6776 1.52 500 0.6884 0.0444 0.6825 0.0647 0.0427 0.6905 0.0220 0.0710 -0.0256 0.0433 -254.9144 -278.7508 -2.8047 -2.8495
0.677 1.82 600 0.6873 0.0459 0.6811 0.0769 0.0519 0.6825 0.0250 0.0803 -0.0280 0.0484 -253.9932 -277.5360 -2.8047 -2.8494
0.6796 2.13 700 0.6872 0.0548 0.6800 0.0798 0.0526 0.6825 0.0272 0.0865 -0.0298 0.0521 -253.9202 -277.2366 -2.8026 -2.8477
0.6778 2.43 800 0.6875 0.0604 0.6795 0.0800 0.0518 0.6825 0.0282 0.0897 -0.0307 0.0540 -254.0074 -277.2222 -2.8024 -2.8474
0.6739 2.74 900 0.6878 0.0651 0.6793 0.0802 0.0515 0.6706 0.0287 0.0914 -0.0317 0.0550 -254.0345 -277.2037 -2.8028 -2.8477

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-7b-dpop-uf6k-qlora-5e-7-epoch3

Adapter
(136)
this model