openchat-3.6-8b-20240522_iter2

This model is a fine-tuned version of RyanYr/openchat-3.6-8b-20240522_iter1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5377
Rewards/chosen: -0.2459
Rewards/rejected: -0.7375
Rewards/accuracies: 0.7200
Rewards/margins: 0.4916
Logps/rejected: -139.4380
Logps/chosen: -132.3574
Logits/rejected: -1.3194
Logits/chosen: -1.3369

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7344	0.1140	100	0.7121	0.3107	0.3572	0.4000	-0.0465	-128.4905	-126.7911	-1.4523	-1.4659
0.6553	0.2281	200	0.6966	0.2237	0.2513	0.5200	-0.0276	-129.5494	-127.6609	-1.4428	-1.4571
0.669	0.3421	300	0.6798	0.1031	0.0581	0.5200	0.0450	-131.4815	-128.8666	-1.4122	-1.4281
0.6402	0.4561	400	0.6595	0.0694	-0.0114	0.6400	0.0808	-132.1772	-129.2041	-1.4254	-1.4406
0.6716	0.5702	500	0.6351	0.1022	-0.0221	0.6400	0.1243	-132.2838	-128.8764	-1.4550	-1.4689
0.655	0.6842	600	0.6278	0.1039	-0.0286	0.6000	0.1325	-132.3487	-128.8587	-1.4625	-1.4766
0.5943	0.7982	700	0.6084	0.0643	-0.1073	0.6400	0.1716	-133.1360	-129.2548	-1.4485	-1.4622
0.6048	0.9123	800	0.6002	0.0902	-0.1175	0.6800	0.2077	-133.2379	-128.9962	-1.4607	-1.4735
0.4934	1.0263	900	0.5798	0.0298	-0.2745	0.7200	0.3043	-134.8078	-129.5996	-1.4349	-1.4491
0.4284	1.1403	1000	0.5724	-0.1252	-0.4897	0.6800	0.3645	-136.9601	-131.1501	-1.3824	-1.3981
0.4132	1.2544	1100	0.5563	-0.1930	-0.5928	0.7600	0.3998	-137.9906	-131.8278	-1.3545	-1.3715
0.3957	1.3684	1200	0.5543	-0.2162	-0.6427	0.7600	0.4264	-138.4894	-132.0604	-1.3412	-1.3583
0.4893	1.4824	1300	0.5476	-0.2078	-0.6782	0.7200	0.4704	-138.8445	-131.9757	-1.3340	-1.3521
0.4361	1.5965	1400	0.5413	-0.2007	-0.6908	0.7200	0.4901	-138.9703	-131.9046	-1.3316	-1.3490
0.4406	1.7105	1500	0.5477	-0.2466	-0.6913	0.7200	0.4448	-138.9762	-132.3638	-1.3242	-1.3421
0.3988	1.8245	1600	0.5449	-0.2388	-0.7225	0.7200	0.4838	-139.2881	-132.2855	-1.3254	-1.3431
0.4044	1.9386	1700	0.5377	-0.2459	-0.7375	0.7200	0.4916	-139.4380	-132.3574	-1.3194	-1.3369

Framework versions

Transformers 4.43.4
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.19.1

RyanYr
/

openchat-3.6-8b-20240522_iter2

openchat-3.6-8b-20240522_iter2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RyanYr/openchat-3.6-8b-20240522_iter2

Collection including RyanYr/openchat-3.6-8b-20240522_iter2

Reward modeling

Evaluation results