just1nseo's picture
End of training
63a2487 verified
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
  - generation/GPT4
model-index:
  - name: zephyr-dpop-qlora-gpt4-5e-6-epoch3
    results: []

zephyr-dpop-qlora-gpt4-5e-6-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/GPT4 dataset. It achieves the following results on the evaluation set:

  • Loss: 14.3852
  • Positive Losses: 141.6597
  • Dpo Losses: 0.6849
  • Rewards/chosen: -1.4061
  • Rewards/rejected: -2.0012
  • Rewards/accuracies: 0.6667
  • Rewards/margins: 0.5951
  • Rewards/margins Max: 2.2885
  • Rewards/margins Min: -1.0995
  • Rewards/margins Std: 1.4978
  • Logps/rejected: -459.3069
  • Logps/chosen: -425.8328
  • Logits/rejected: -2.2783
  • Logits/chosen: -2.3207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5432 0.28 100 1.5490 8.3683 0.6723 -0.0507 -0.1015 0.5992 0.0508 0.2567 -0.1414 0.1757 -269.3354 -290.2917 -2.6677 -2.7099
0.4843 0.56 200 3.6354 28.9322 0.6415 -0.2537 -0.4297 0.6349 0.1759 0.7364 -0.3533 0.4858 -302.1486 -310.5943 -2.5589 -2.6000
0.2828 0.85 300 6.8046 61.7689 0.6346 -0.6003 -0.8503 0.6508 0.2500 1.0085 -0.4868 0.6679 -344.2117 -345.2526 -2.5349 -2.5759
0.3355 1.13 400 11.4158 108.7399 0.6572 -1.0761 -1.4209 0.6548 0.3447 1.4626 -0.7661 0.9968 -401.2702 -392.8341 -2.3773 -2.4155
0.3438 1.41 500 10.6413 101.3525 0.6381 -1.0007 -1.3406 0.6865 0.3399 1.3353 -0.6338 0.8805 -393.2457 -385.2938 -2.4471 -2.4907
0.2144 1.69 600 8.5896 79.7998 0.6267 -0.7817 -1.2135 0.6865 0.4318 1.5951 -0.6661 1.0047 -380.5318 -363.3914 -2.3029 -2.3438
0.3314 1.97 700 11.1651 107.2969 0.6525 -1.0595 -1.5150 0.6627 0.4555 1.7776 -0.8450 1.1660 -410.6869 -391.1705 -2.3025 -2.3432
0.1352 2.25 800 13.3571 130.9070 0.6700 -1.2986 -1.8184 0.6627 0.5198 2.0225 -0.9603 1.3296 -441.0237 -415.0786 -2.2901 -2.3320
0.2348 2.54 900 14.7241 145.9081 0.6904 -1.4488 -2.0053 0.6706 0.5564 2.1801 -1.0958 1.4586 -459.7108 -430.1044 -2.2661 -2.3085
0.1369 2.82 1000 14.5955 143.9389 0.6869 -1.4291 -2.0251 0.6627 0.5959 2.2953 -1.1073 1.5052 -461.6887 -428.1342 -2.2738 -2.3165

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2