just1nseo's picture
End of training
9baa3b2 verified
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: alignment-handbook/zephyr-7b-sft-full
datasets:
  - generation/UF
model-index:
  - name: zephyr-7b-dpop-ours-qlora-5e-6-epoch3
    results: []

zephyr-7b-dpop-ours-qlora-5e-6-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 5.2043
  • Positive Losses: 49.6926
  • Dpo Losses: 0.6309
  • Rewards/chosen: -0.4548
  • Rewards/rejected: -0.7461
  • Rewards/accuracies: 0.6706
  • Rewards/margins: 0.2913
  • Rewards/margins Max: 1.0735
  • Rewards/margins Min: -0.5345
  • Rewards/margins Std: 0.7255
  • Logps/rejected: -333.7965
  • Logps/chosen: -330.7057
  • Logits/rejected: -2.5363
  • Logits/chosen: -2.5869

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6275 0.28 100 0.8540 1.6941 0.6742 0.0863 0.0420 0.6032 0.0443 0.2147 -0.1031 0.1420 -254.9810 -276.5898 -2.7114 -2.7527
0.599 0.56 200 1.9207 12.5808 0.6560 -0.0584 -0.1681 0.6389 0.1097 0.4903 -0.2555 0.3316 -275.9966 -291.0660 -2.7386 -2.7842
0.4901 0.85 300 2.8067 22.2141 0.6507 -0.1851 -0.3037 0.6389 0.1186 0.4724 -0.2575 0.3257 -289.5482 -303.7292 -2.7330 -2.7855
0.4414 1.13 400 2.6622 20.9278 0.6386 -0.1386 -0.3238 0.6746 0.1852 0.6971 -0.3749 0.4833 -291.5616 -299.0799 -2.6703 -2.7191
0.4651 1.41 500 2.6646 20.6090 0.6384 -0.1329 -0.3285 0.6627 0.1956 0.7628 -0.3883 0.5195 -292.0331 -298.5117 -2.6714 -2.7217
0.5269 1.69 600 5.0162 46.1312 0.6337 -0.4167 -0.6475 0.6627 0.2307 0.8626 -0.4616 0.5963 -323.9284 -326.8941 -2.6026 -2.6532
0.3513 1.97 700 4.8954 45.5933 0.6399 -0.4107 -0.6603 0.6627 0.2496 0.9744 -0.5254 0.6826 -325.2173 -326.2958 -2.5808 -2.6317
0.2795 2.25 800 4.7693 43.9090 0.6266 -0.3919 -0.6839 0.6825 0.2920 1.0657 -0.5266 0.7166 -327.5706 -324.4103 -2.5545 -2.6047
0.3544 2.54 900 5.3640 51.3363 0.6314 -0.4735 -0.7650 0.6706 0.2915 1.0782 -0.5345 0.7289 -335.6813 -332.5704 -2.5359 -2.5863
0.545 2.82 1000 5.2224 49.9806 0.6312 -0.4578 -0.7482 0.6627 0.2904 1.0718 -0.5332 0.7245 -333.9995 -330.9984 -2.5367 -2.5873

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2