metadata

license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized_fixed
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b-dpo-lora
    results: []

zephyr-7b-dpo-lora

This model is a fine-tuned version of lewtun/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized_fixed dataset. It achieves the following results on the evaluation set:

Loss: 0.5133
Rewards/chosen: -1.2447
Rewards/rejected: -2.1118
Rewards/accuracies: 0.7539
Rewards/margins: 0.8671
Logps/rejected: -457.0128
Logps/chosen: -385.9082
Logits/rejected: 1.2523
Logits/chosen: 0.7989

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6918	0.05	100	0.6914	0.0059	0.0018	0.7109	0.0041	-245.6464	-260.8458	-2.1364	-2.2285
0.6619	0.1	200	0.6497	-0.0263	-0.1318	0.7070	0.1056	-259.0110	-264.0628	-2.0537	-2.1558
0.6077	0.16	300	0.6083	-0.2610	-0.5505	0.7188	0.2895	-300.8820	-287.5379	-1.8505	-1.9870
0.5813	0.21	400	0.5857	-0.5019	-0.9224	0.7344	0.4205	-338.0691	-311.6292	-1.7834	-1.9347
0.6033	0.26	500	0.5684	-0.6480	-1.1327	0.7578	0.4847	-359.0957	-326.2360	-1.0646	-1.2844
0.5338	0.31	600	0.5431	-0.9068	-1.6081	0.7539	0.7013	-406.6367	-352.1152	-0.0058	-0.3463
0.5235	0.37	700	0.5304	-1.0331	-1.8281	0.7461	0.7951	-428.6434	-364.7436	0.2246	-0.1374
0.5241	0.42	800	0.5276	-0.9760	-1.7110	0.7578	0.7350	-416.9325	-359.0362	0.3361	-0.0432
0.5332	0.47	900	0.5257	-1.2407	-2.0657	0.75	0.8250	-452.3993	-385.5118	0.8926	0.4681
0.531	0.52	1000	0.5232	-1.1277	-1.8553	0.7461	0.7276	-431.3623	-374.2120	0.2825	-0.0766
0.4864	0.58	1100	0.5172	-1.1670	-1.9894	0.75	0.8224	-444.7675	-378.1358	1.1814	0.7409
0.5467	0.63	1200	0.5196	-1.3633	-2.1690	0.7383	0.8058	-462.7306	-397.7628	1.3020	0.8593
0.5125	0.68	1300	0.5179	-1.2033	-2.0041	0.7422	0.8009	-446.2437	-381.7657	1.1045	0.6639
0.4881	0.73	1400	0.5158	-1.2792	-2.1334	0.7539	0.8543	-459.1728	-389.3554	1.1891	0.7445
0.5273	0.78	1500	0.5135	-1.2081	-2.0746	0.7539	0.8664	-453.2860	-382.2505	1.2533	0.7973
0.5317	0.84	1600	0.5140	-1.2815	-2.1592	0.75	0.8777	-461.7518	-389.5859	1.2752	0.8202
0.5384	0.89	1700	0.5134	-1.2549	-2.1287	0.7539	0.8738	-458.7038	-386.9291	1.2938	0.8384
0.5619	0.94	1800	0.5135	-1.2438	-2.1108	0.7578	0.8670	-456.9133	-385.8195	1.2532	0.7986
0.5169	0.99	1900	0.5133	-1.2447	-2.1118	0.7539	0.8671	-457.0128	-385.9082	1.2523	0.7989

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.0