metadata

library_name: transformers
tags:
  - trl
  - cpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-CPO
    results: []

OpenELM-1_1B-CPO

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.1904
Rewards/chosen: -3.6406
Rewards/rejected: -4.4375
Rewards/accuracies: 0.5918
Rewards/margins: 0.8008
Logps/rejected: -444.0
Logps/chosen: -364.0
Logits/rejected: -7.5312
Logits/chosen: -8.875
Nll Loss: 1.1719

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Nll Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
2.4271	0.1047	100	-12.3125	-12.125	-336.0	-328.0	2.2959	1.0859	0.4980	-3.3594	-0.0850	-3.2812
2.2538	0.2093	200	-9.875	-9.5	-338.0	-346.0	2.1836	1.0938	0.5234	-3.3906	0.0640	-3.4531
2.1253	0.3140	300	-11.4375	-11.0	-346.0	-360.0	2.1307	1.1172	0.5176	-3.4531	0.1416	-3.5938
2.0609	0.4186	400	-11.125	-10.625	-332.0	-344.0	2.1359	1.0703	0.5293	-3.3281	0.1187	-3.4375
2.1905	0.5233	500	-9.3125	-8.5	-338.0	-352.0	2.1286	1.0859	0.5254	-3.375	0.1357	-3.5156
2.1304	0.6279	600	-10.625	-9.625	-360.0	-398.0	2.1410	1.1562	0.5723	-3.6094	0.3672	-3.9688
2.2554	0.7326	700	-9.6875	-8.5625	-374.0	-416.0	2.1848	1.2031	0.5664	-3.7344	0.4258	-4.1562
2.0796	0.8373	800	-7.8438	-7.0312	-346.0	-374.0	2.1224	1.1172	0.5469	-3.4531	0.2852	-3.75
2.1021	0.9419	900	-6.2812	-5.2812	-350.0	-390.0	2.1099	1.1328	0.5723	-3.5	0.4062	-3.9062
1.5182	1.0471	1000	2.1662	-3.5	-3.8594	0.5664	0.3633	-386.0	-350.0	-9.375	-10.625	1.125
1.4917	1.1518	1100	2.1588	-3.5625	-4.0	0.5703	0.4395	-400.0	-356.0	-6.4688	-7.875	1.1484
1.5219	1.2564	1200	2.1449	-3.625	-4.1875	0.5938	0.5586	-420.0	-364.0	-6.6562	-7.7812	1.1719
1.5292	1.3611	1300	2.1489	-3.5312	-4.0	0.5742	0.4785	-402.0	-354.0	-7.75	-8.875	1.1406
1.4257	1.4657	1400	2.1193	-3.5781	-4.0938	0.5801	0.5156	-410.0	-358.0	-7.7188	-9.25	1.1562
1.4366	1.5704	1500	2.0983	-3.5938	-4.1562	0.5898	0.5586	-416.0	-358.0	-7.6875	-8.9375	1.1562
1.5246	1.6750	1600	2.1191	-3.5781	-4.2188	0.5938	0.625	-420.0	-358.0	-5.4688	-6.9062	1.1562
1.4534	1.7797	1700	2.0829	-3.4688	-4.0312	0.5762	0.5625	-404.0	-348.0	-9.0625	-10.0625	1.1172
1.4551	1.8844	1800	2.1033	-3.5625	-4.1562	0.5898	0.6016	-416.0	-356.0	-6.8438	-8.1875	1.1484
1.4969	1.9890	1900	2.1046	-3.5312	-4.125	0.5762	0.5938	-412.0	-354.0	-8.125	-9.3125	1.1406
0.9984	2.0937	2000	2.1806	-3.6406	-4.2812	0.5781	0.6367	-428.0	-364.0	-7.9375	-9.1875	1.1719
0.9885	2.1983	2100	2.1927	-3.6875	-4.5	0.5801	0.7930	-448.0	-370.0	-7.4062	-8.6875	1.1875
0.9814	2.3030	2200	2.1867	-3.625	-4.3438	0.5742	0.7266	-436.0	-362.0	-7.5	-8.8125	1.1719
0.9844	2.4076	2300	2.1905	-3.6875	-4.5312	0.5996	0.8438	-452.0	-368.0	-7.125	-8.375	1.1875
0.9931	2.5123	2400	2.1843	-3.6406	-4.4375	0.5820	0.7930	-442.0	-364.0	-7.375	-8.6875	1.1719
0.9537	2.6170	2500	2.1907	-3.6406	-4.4688	0.5898	0.8125	-446.0	-364.0	-7.5	-8.8125	1.1719
0.9512	2.7216	2600	2.1918	-3.6406	-4.4375	0.5898	0.8086	-446.0	-364.0	-7.5	-8.8125	1.1719
0.9604	2.8263	2700	2.1906	-3.6406	-4.4375	0.5879	0.7969	-442.0	-364.0	-7.5312	-8.875	1.1719
1.0208	2.9309	2800	2.1904	-3.6406	-4.4375	0.5918	0.8008	-444.0	-364.0	-7.5312	-8.875	1.1719

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1