openchat-3.6-8b-20240522_iter1

This model is a fine-tuned version of openchat/openchat-3.6-8b-20240522 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5244
Rewards/chosen: -1.3982
Rewards/rejected: -2.1254
Rewards/accuracies: 0.7200
Rewards/margins: 0.7272
Logps/rejected: -171.7692
Logps/chosen: -199.1642
Logits/rejected: -1.2657
Logits/chosen: -1.3423

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6926	0.1153	100	0.6938	-0.0143	-0.0002	0.4000	-0.0142	-150.5169	-185.3253	-1.4917	-1.5923
0.679	0.2307	200	0.6922	-0.1109	-0.1115	0.5600	0.0006	-151.6300	-186.2912	-1.4630	-1.5610
0.6486	0.3460	300	0.6787	-0.2780	-0.2833	0.6400	0.0052	-153.3482	-187.9626	-1.4348	-1.5306
0.6411	0.4614	400	0.6542	-0.3856	-0.5726	0.6800	0.1870	-156.2416	-189.0385	-1.3933	-1.4854
0.6012	0.5767	500	0.6362	-0.6283	-0.8095	0.6800	0.1812	-158.6099	-191.4649	-1.3534	-1.4404
0.618	0.6921	600	0.6056	-0.6784	-1.0395	0.7200	0.3611	-160.9102	-191.9662	-1.3254	-1.4087
0.5593	0.8074	700	0.5816	-0.7838	-1.2369	0.7200	0.4531	-162.8839	-193.0198	-1.3188	-1.4025
0.6186	0.9228	800	0.5684	-0.9097	-1.3887	0.7200	0.4790	-164.4020	-194.2788	-1.3118	-1.3925
0.435	1.0381	900	0.5445	-1.0726	-1.6299	0.6800	0.5573	-166.8143	-195.9084	-1.2884	-1.3688
0.3574	1.1535	1000	0.5431	-1.2392	-1.8217	0.7600	0.5825	-168.7325	-197.5744	-1.2871	-1.3622
0.3629	1.2688	1100	0.5291	-1.3493	-2.0023	0.7600	0.6530	-170.5380	-198.6750	-1.2698	-1.3464
0.372	1.3842	1200	0.5354	-1.4103	-2.0374	0.6800	0.6270	-170.8891	-199.2855	-1.2711	-1.3467
0.4256	1.4995	1300	0.5290	-1.3264	-2.0119	0.7200	0.6855	-170.6346	-198.4460	-1.2728	-1.3499
0.3428	1.6149	1400	0.5261	-1.3729	-2.0747	0.6800	0.7019	-171.2626	-198.9109	-1.2725	-1.3481
0.3868	1.7302	1500	0.5269	-1.3721	-2.1075	0.7200	0.7354	-171.5904	-198.9033	-1.2656	-1.3428
0.3909	1.8456	1600	0.5235	-1.3906	-2.1287	0.7200	0.7380	-171.8019	-199.0883	-1.2676	-1.3435
0.3738	1.9609	1700	0.5244	-1.3982	-2.1254	0.7200	0.7272	-171.7692	-199.1642	-1.2657	-1.3423

Framework versions

Transformers 4.43.4
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.19.1

RyanYr
/

openchat-3.6-8b-20240522_iter1

openchat-3.6-8b-20240522_iter1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RyanYr/openchat-3.6-8b-20240522_iter1

Collection including RyanYr/openchat-3.6-8b-20240522_iter1

Reward modeling

Evaluation results