openchat-3.6-8b-20240522_iter3

This model is a fine-tuned version of RyanYr/openchat-3.6-8b-20240522_iter2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.5950
Rewards/chosen: -0.8463
Rewards/rejected: -1.5198
Rewards/accuracies: 0.7600
Rewards/margins: 0.6735
Logps/rejected: -149.1518
Logps/chosen: -144.0943
Logits/rejected: -1.2099
Logits/chosen: -1.2151

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 64
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6665	0.1158	100	0.6985	-0.1138	-0.2545	0.6000	0.1407	-136.4984	-136.7689	-1.2600	-1.2619
0.6393	0.2316	200	0.6824	-0.1312	-0.2764	0.5600	0.1452	-136.7173	-136.9431	-1.3003	-1.3045
0.5871	0.3474	300	0.6834	-0.2655	-0.3888	0.6000	0.1233	-137.8412	-138.2859	-1.3107	-1.3155
0.6151	0.4633	400	0.6799	-0.4578	-0.5623	0.6000	0.1045	-139.5763	-140.2087	-1.3024	-1.3069
0.5577	0.5791	500	0.6544	-0.3815	-0.5536	0.6000	0.1722	-139.4899	-139.4459	-1.3045	-1.3100
0.6366	0.6949	600	0.6261	-0.1856	-0.4357	0.6400	0.2500	-138.3102	-137.4874	-1.3360	-1.3430
0.53	0.8107	700	0.6434	-0.4043	-0.6780	0.6400	0.2737	-140.7333	-139.6738	-1.2803	-1.2844
0.5761	0.9265	800	0.6186	-0.3762	-0.6989	0.6400	0.3227	-140.9429	-139.3935	-1.3125	-1.3198
0.4286	1.0423	900	0.6368	-0.8084	-1.1996	0.6800	0.3913	-145.9498	-143.7149	-1.2632	-1.2671
0.407	1.1582	1000	0.6345	-0.8524	-1.3574	0.7200	0.5049	-147.5273	-144.1555	-1.2234	-1.2269
0.4758	1.2740	1100	0.6022	-0.6198	-1.1935	0.6800	0.5738	-145.8886	-141.8288	-1.2307	-1.2366
0.4415	1.3898	1200	0.5959	-0.7170	-1.3440	0.7200	0.6270	-147.3939	-142.8015	-1.2248	-1.2305
0.4228	1.5056	1300	0.5890	-0.6584	-1.3069	0.7600	0.6485	-147.0226	-142.2149	-1.2222	-1.2276
0.4199	1.6214	1400	0.6033	-0.9116	-1.5633	0.7200	0.6517	-149.5865	-144.7474	-1.2084	-1.2133
0.4188	1.7372	1500	0.5948	-0.8277	-1.5036	0.7200	0.6759	-148.9892	-143.9083	-1.2126	-1.2175
0.4185	1.8531	1600	0.5908	-0.8393	-1.5404	0.7600	0.7011	-149.3580	-144.0246	-1.2042	-1.2096
0.3986	1.9689	1700	0.5950	-0.8463	-1.5198	0.7600	0.6735	-149.1518	-144.0943	-1.2099	-1.2151

Framework versions

Transformers 4.43.4
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.19.1

RyanYr
/

openchat-3.6-8b-20240522_iter3

openchat-3.6-8b-20240522_iter3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RyanYr/openchat-3.6-8b-20240522_iter3

Collection including RyanYr/openchat-3.6-8b-20240522_iter3

Reward modeling

Evaluation results