metadata

base_model: princeton-nlp/Llama-3-Base-8B-SFT
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: Llama-3-Base-8B
    results: []

Llama-3-Base-8B

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6285
Rewards/chosen: 0.5979
Rewards/rejected: 0.1801
Rewards/accuracies: 0.6620
Rewards/margins: 0.4178
Logps/rejected: -2212.5046
Logps/chosen: -2612.9824
Logits/rejected: -1.3033
Logits/chosen: -1.3358

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6694	0.03	100	0.6733	0.4668	0.3687	0.5500	0.0980	-2193.6436	-2626.0984	-1.2047	-1.2463
0.6496	0.05	200	0.6497	0.8935	0.6578	0.6040	0.2357	-2164.7385	-2583.4270	-1.1621	-1.2030
0.6358	0.08	300	0.6672	0.6703	0.4436	0.5900	0.2266	-2186.1528	-2605.7471	-1.2202	-1.2617
0.6783	0.1	400	0.7144	0.2834	0.0925	0.5680	0.1909	-2221.2676	-2644.4390	-1.3598	-1.4017
0.751	0.13	500	0.6889	1.3453	0.9758	0.6020	0.3696	-2132.9402	-2538.2405	-1.4750	-1.5419
0.6921	0.16	600	0.6644	0.8464	0.5451	0.6220	0.3014	-2176.0090	-2588.1318	-1.2841	-1.3381
0.6437	0.18	700	0.6724	0.8250	0.4796	0.6420	0.3454	-2182.5566	-2590.2764	-1.4526	-1.4817
0.8109	0.21	800	0.6655	1.1490	0.7473	0.6380	0.4017	-2155.7832	-2557.8708	-1.5267	-1.5761
0.6725	0.24	900	0.6836	1.4258	0.9989	0.6160	0.4269	-2130.6240	-2530.1914	-1.4486	-1.4910
0.7027	0.26	1000	0.6690	0.8152	0.4729	0.6260	0.3424	-2183.2278	-2591.2505	-1.5095	-1.5565
0.6421	0.29	1100	0.6513	0.5281	0.1941	0.6640	0.3340	-2211.1040	-2619.9661	-1.5382	-1.5785
0.6217	0.31	1200	0.6436	0.7372	0.3396	0.6460	0.3976	-2196.5581	-2599.0544	-1.6345	-1.6765
0.7365	0.34	1300	0.6400	0.9183	0.5227	0.6240	0.3956	-2178.2437	-2580.9446	-1.5597	-1.6009
0.7057	0.37	1400	0.6468	0.9514	0.5619	0.6140	0.3895	-2174.3254	-2577.6377	-1.6716	-1.7117
0.6396	0.39	1500	0.6498	0.9546	0.5405	0.6400	0.4141	-2176.4675	-2577.3193	-1.6244	-1.6600
0.5835	0.42	1600	0.6488	0.9504	0.5356	0.6480	0.4148	-2176.9568	-2577.7402	-1.6255	-1.6706
0.629	0.44	1700	0.6501	1.2484	0.8056	0.6100	0.4428	-2149.9568	-2547.9316	-1.5737	-1.6192
0.6495	0.47	1800	0.6440	1.2029	0.7629	0.6280	0.4400	-2154.2307	-2552.4846	-1.4589	-1.4973
0.6465	0.5	1900	0.6641	0.2111	-0.0941	0.6280	0.3052	-2239.9255	-2651.6641	-1.4961	-1.5323
0.6866	0.52	2000	0.6480	0.5747	0.1977	0.6600	0.3770	-2210.75	-2615.3054	-1.4509	-1.4934
0.6441	0.55	2100	0.6358	0.8809	0.4502	0.6480	0.4307	-2185.4985	-2584.6841	-1.4418	-1.4842
0.6752	0.58	2200	0.6346	0.9311	0.5075	0.6560	0.4236	-2179.7668	-2579.6636	-1.3193	-1.3656
0.5646	0.6	2300	0.6396	0.6599	0.2912	0.6480	0.3686	-2201.3948	-2606.7883	-1.2832	-1.3116
0.6519	0.63	2400	0.6451	0.4237	0.0937	0.6400	0.3300	-2221.1460	-2630.4050	-1.4460	-1.4777
0.6292	0.65	2500	0.6313	0.8682	0.4231	0.6460	0.4452	-2188.2095	-2585.9512	-1.4040	-1.4397
0.5985	0.68	2600	0.6274	0.8396	0.3650	0.6640	0.4746	-2194.0144	-2588.8174	-1.3580	-1.3860
0.6323	0.71	2700	0.6328	0.6585	0.2012	0.6640	0.4573	-2210.3958	-2606.9260	-1.2622	-1.2938
0.6174	0.73	2800	0.6305	0.8505	0.3762	0.6580	0.4744	-2192.8989	-2587.7209	-1.3312	-1.3635
0.5972	0.76	2900	0.6310	0.6521	0.2290	0.6600	0.4231	-2207.6130	-2607.5659	-1.3492	-1.3840
0.6645	0.79	3000	0.6291	0.7035	0.2579	0.6520	0.4456	-2204.7251	-2602.4238	-1.3330	-1.3678
0.5786	0.81	3100	0.6310	0.5452	0.1222	0.6580	0.4230	-2218.2944	-2618.2534	-1.3173	-1.3498
0.604	0.84	3200	0.6375	0.3327	-0.0527	0.6540	0.3854	-2235.7852	-2639.5032	-1.3444	-1.3760
0.6704	0.86	3300	0.6269	0.7327	0.2896	0.6540	0.4431	-2201.5579	-2599.5049	-1.3241	-1.3585
0.6365	0.89	3400	0.6271	0.6900	0.2577	0.6560	0.4323	-2204.7437	-2603.7739	-1.3038	-1.3371
0.6621	0.92	3500	0.6279	0.6303	0.2073	0.6580	0.4230	-2209.7827	-2609.7432	-1.2991	-1.3321
0.6597	0.94	3600	0.6294	0.5540	0.1441	0.6580	0.4099	-2216.1082	-2617.3774	-1.3028	-1.3348
0.671	0.97	3700	0.6285	0.5945	0.1774	0.6600	0.4171	-2212.7783	-2613.3303	-1.3033	-1.3358
0.6328	0.99	3800	0.6283	0.5985	0.1803	0.6580	0.4182	-2212.4902	-2612.9258	-1.3032	-1.3356

Framework versions

Transformers 4.36.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2