zephyr-7b-dpo-qlora / README.md
lewtun's picture
lewtun HF staff
End of training
e6c2d41
|
raw
history blame
5.72 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized_fixed
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-lora
    results: []

zephyr-7b-dpo-lora

This model is a fine-tuned version of lewtun/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized_fixed dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5133
  • Rewards/chosen: -1.2447
  • Rewards/rejected: -2.1118
  • Rewards/accuracies: 0.7539
  • Rewards/margins: 0.8671
  • Logps/rejected: -457.0128
  • Logps/chosen: -385.9082
  • Logits/rejected: 1.2523
  • Logits/chosen: 0.7989

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6918 0.05 100 0.6914 0.0059 0.0018 0.7109 0.0041 -245.6464 -260.8458 -2.1364 -2.2285
0.6619 0.1 200 0.6497 -0.0263 -0.1318 0.7070 0.1056 -259.0110 -264.0628 -2.0537 -2.1558
0.6077 0.16 300 0.6083 -0.2610 -0.5505 0.7188 0.2895 -300.8820 -287.5379 -1.8505 -1.9870
0.5813 0.21 400 0.5857 -0.5019 -0.9224 0.7344 0.4205 -338.0691 -311.6292 -1.7834 -1.9347
0.6033 0.26 500 0.5684 -0.6480 -1.1327 0.7578 0.4847 -359.0957 -326.2360 -1.0646 -1.2844
0.5338 0.31 600 0.5431 -0.9068 -1.6081 0.7539 0.7013 -406.6367 -352.1152 -0.0058 -0.3463
0.5235 0.37 700 0.5304 -1.0331 -1.8281 0.7461 0.7951 -428.6434 -364.7436 0.2246 -0.1374
0.5241 0.42 800 0.5276 -0.9760 -1.7110 0.7578 0.7350 -416.9325 -359.0362 0.3361 -0.0432
0.5332 0.47 900 0.5257 -1.2407 -2.0657 0.75 0.8250 -452.3993 -385.5118 0.8926 0.4681
0.531 0.52 1000 0.5232 -1.1277 -1.8553 0.7461 0.7276 -431.3623 -374.2120 0.2825 -0.0766
0.4864 0.58 1100 0.5172 -1.1670 -1.9894 0.75 0.8224 -444.7675 -378.1358 1.1814 0.7409
0.5467 0.63 1200 0.5196 -1.3633 -2.1690 0.7383 0.8058 -462.7306 -397.7628 1.3020 0.8593
0.5125 0.68 1300 0.5179 -1.2033 -2.0041 0.7422 0.8009 -446.2437 -381.7657 1.1045 0.6639
0.4881 0.73 1400 0.5158 -1.2792 -2.1334 0.7539 0.8543 -459.1728 -389.3554 1.1891 0.7445
0.5273 0.78 1500 0.5135 -1.2081 -2.0746 0.7539 0.8664 -453.2860 -382.2505 1.2533 0.7973
0.5317 0.84 1600 0.5140 -1.2815 -2.1592 0.75 0.8777 -461.7518 -389.5859 1.2752 0.8202
0.5384 0.89 1700 0.5134 -1.2549 -2.1287 0.7539 0.8738 -458.7038 -386.9291 1.2938 0.8384
0.5619 0.94 1800 0.5135 -1.2438 -2.1108 0.7578 0.8670 -456.9133 -385.8195 1.2532 0.7986
0.5169 0.99 1900 0.5133 -1.2447 -2.1118 0.7539 0.8671 -457.0128 -385.9082 1.2523 0.7989

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.0