chat_1000STEPS_1e6rate

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6684
Rewards/chosen: -0.3437
Rewards/rejected: -0.4414
Rewards/accuracies: 0.5055
Rewards/margins: 0.0978
Logps/rejected: -23.2056
Logps/chosen: -20.1814
Logits/rejected: -0.8363
Logits/chosen: -0.8361

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6939	0.1	50	0.6917	-0.0037	-0.0069	0.4901	0.0032	-18.8600	-16.7813	-0.5975	-0.5973
0.6902	0.2	100	0.6919	-0.1261	-0.1323	0.4440	0.0063	-20.1147	-18.0054	-0.6143	-0.6142
0.6923	0.29	150	0.6796	-0.0370	-0.0721	0.4945	0.0351	-19.5126	-17.1150	-0.6569	-0.6568
0.6793	0.39	200	0.6803	-0.0086	-0.0473	0.4769	0.0387	-19.2641	-16.8305	-0.6452	-0.6450
0.6446	0.49	250	0.6790	-0.0967	-0.1427	0.4857	0.0460	-20.2182	-17.7115	-0.6468	-0.6466
0.6365	0.59	300	0.6809	-0.1168	-0.1650	0.4681	0.0482	-20.4409	-17.9127	-0.6877	-0.6874
0.6828	0.68	350	0.6765	-0.1034	-0.1632	0.4923	0.0599	-20.4235	-17.7782	-0.6849	-0.6847
0.6797	0.78	400	0.6788	-0.0900	-0.1511	0.4923	0.0611	-20.3023	-17.6445	-0.6763	-0.6762
0.6751	0.88	450	0.6772	-0.0807	-0.1445	0.4945	0.0638	-20.2366	-17.5521	-0.6528	-0.6526
0.6596	0.98	500	0.6744	-0.1091	-0.1779	0.5055	0.0688	-20.5702	-17.8358	-0.6395	-0.6393
0.4819	1.07	550	0.6714	-0.2112	-0.2907	0.5077	0.0795	-21.6987	-18.8566	-0.7045	-0.7043
0.4754	1.17	600	0.6699	-0.2743	-0.3603	0.5011	0.0860	-22.3943	-19.4880	-0.7556	-0.7554
0.4339	1.27	650	0.6694	-0.2906	-0.3826	0.5033	0.0920	-22.6175	-19.6505	-0.8041	-0.8039
0.4692	1.37	700	0.6673	-0.3183	-0.4163	0.5033	0.0980	-22.9541	-19.9276	-0.8200	-0.8199
0.4767	1.46	750	0.6681	-0.3342	-0.4320	0.5055	0.0978	-23.1116	-20.0865	-0.8291	-0.8289
0.4125	1.56	800	0.6684	-0.3381	-0.4355	0.5099	0.0974	-23.1466	-20.1256	-0.8330	-0.8328
0.4733	1.66	850	0.6681	-0.3425	-0.4407	0.5011	0.0983	-23.1986	-20.1691	-0.8359	-0.8357
0.4699	1.76	900	0.6683	-0.3431	-0.4412	0.5077	0.0981	-23.2032	-20.1758	-0.8365	-0.8363
0.4629	1.86	950	0.6682	-0.3438	-0.4421	0.5011	0.0984	-23.2125	-20.1823	-0.8365	-0.8363
0.4482	1.95	1000	0.6684	-0.3437	-0.4414	0.5055	0.0978	-23.2056	-20.1814	-0.8363	-0.8361

Framework versions

Transformers 4.37.2
Pytorch 2.0.0+cu117
Datasets 2.17.0
Tokenizers 0.15.2

tsavage68
/

chat_1000STEPS_1e6rate_01beta_DPO

chat_1000STEPS_1e6rate

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/chat_1000STEPS_1e6rate_01beta_DPO

Evaluation results