zephyr-7b-dpo-full-magpi-low-margin-3-epochs

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0010
  • Rewards/chosen: -3.9690
  • Rewards/rejected: -58.1646
  • Rewards/accuracies: 0.9980
  • Rewards/margins: 54.1956
  • Logps/rejected: -6457.2515
  • Logps/chosen: -763.8833
  • Logits/rejected: 4.2613
  • Logits/chosen: 1.2520

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 55
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.009 0.4739 50 0.0026 -0.6343 -34.9234 0.9960 34.2890 -4133.1240 -430.4127 -2.6406 -2.9323
0.0158 0.9479 100 0.0003 -1.2700 -43.4290 1.0 42.1589 -4983.6846 -493.9866 3.0564 -0.5346
0.0007 1.4218 150 0.0005 -2.9913 -52.6639 0.9980 49.6726 -5907.1768 -666.1114 4.1711 0.7924
0.0015 1.8957 200 0.0006 -3.5933 -54.2858 0.9980 50.6925 -6069.3657 -726.3069 4.0713 1.0487
0.0 2.3697 250 0.0009 -3.9372 -57.6402 0.9980 53.7030 -6404.8037 -760.6977 4.2488 1.2599
0.0001 2.8436 300 0.0010 -3.9690 -58.1646 0.9980 54.1956 -6457.2515 -763.8833 4.2613 1.2520

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for sfulay/zephyr-7b-dpo-full-magpi-low-margin-3-epochs

Finetuned
(311)
this model