qwen2.5-0.5b-expo-L1EXPO

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 0.0088
Logps: -98.4573
Logits: -1.9894
Objective: 0.0088
Dpo Loss: 0.6929
Regularize: 0.0088
Ranking Simple: 0.5180
Ranking Idealized: 0.6022
Ranking Idealized Expo: 0.5207

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 12
total_train_batch_size: 48
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
0.006	0.0472	50	0.0060	-98.6152	-1.9958	0.0061	0.6930	0.0061	0.5180	0.6022	0.5207
0.0092	0.0945	100	0.0073	-98.7889	-1.9954	0.0073	0.6929	0.0073	0.5186	0.6022	0.5207
0.0142	0.1417	150	0.0092	-98.6620	-1.9986	0.0093	0.6930	0.0093	0.5186	0.6022	0.5207
0.0173	0.1890	200	0.0097	-98.7946	-1.9957	0.0098	0.6929	0.0098	0.5173	0.6022	0.5207
0.0245	0.2362	250	0.0121	-98.6416	-1.9951	0.0121	0.6929	0.0121	0.5186	0.6022	0.5207
0.0234	0.2834	300	0.0136	-98.3321	-1.9940	0.0140	0.6932	0.0140	0.5166	0.6022	0.5207
0.0262	0.3307	350	0.0178	-98.3457	-1.9947	0.0181	0.6926	0.0181	0.5200	0.6022	0.5207
0.0315	0.3779	400	0.0165	-98.1128	-1.9941	0.0164	0.6926	0.0164	0.5200	0.6022	0.5207
0.0294	0.4252	450	0.0145	-98.3787	-1.9950	0.0148	0.6924	0.0148	0.5186	0.6022	0.5207
0.032	0.4724	500	0.0139	-98.6457	-1.9920	0.0139	0.6925	0.0139	0.5193	0.6022	0.5207
0.0314	0.5196	550	0.0136	-98.9689	-1.9943	0.0135	0.6927	0.0135	0.5186	0.6022	0.5207
0.0311	0.5669	600	0.0142	-98.1223	-1.9968	0.0144	0.6925	0.0144	0.5186	0.6022	0.5207
0.0333	0.6141	650	0.0145	-98.6917	-1.9935	0.0146	0.6926	0.0146	0.5180	0.6022	0.5207
0.028	0.6614	700	0.0138	-98.6777	-1.9953	0.0140	0.6930	0.0140	0.5193	0.6022	0.5207
0.0319	0.7086	750	0.0147	-98.7712	-1.9952	0.0145	0.6926	0.0145	0.5180	0.6022	0.5207
0.0297	0.7558	800	0.0157	-98.1348	-1.9950	0.0163	0.6929	0.0163	0.5186	0.6022	0.5207
0.0286	0.8031	850	0.0124	-98.5940	-1.9954	0.0125	0.6928	0.0125	0.5173	0.6022	0.5207
0.0285	0.8503	900	0.0117	-98.9422	-1.9931	0.0118	0.6929	0.0118	0.5166	0.6022	0.5207
0.0248	0.8976	950	0.0156	-98.6447	-1.9902	0.0155	0.6932	0.0155	0.5173	0.6022	0.5207
0.0272	0.9448	1000	0.0126	-98.1242	-1.9906	0.0128	0.6931	0.0128	0.5180	0.6022	0.5207
0.0215	0.9920	1050	0.0133	-98.3357	-1.9911	0.0135	0.6927	0.0135	0.5180	0.6022	0.5207
0.0242	1.0393	1100	0.0128	-98.5121	-1.9881	0.0127	0.6927	0.0127	0.5180	0.6022	0.5207
0.0248	1.0865	1150	0.0121	-98.3740	-1.9900	0.0124	0.6929	0.0124	0.5180	0.6022	0.5207
0.0238	1.1338	1200	0.0131	-98.6523	-1.9881	0.0132	0.6931	0.0132	0.5186	0.6022	0.5207
0.0213	1.1810	1250	0.0116	-98.3820	-1.9892	0.0118	0.6929	0.0118	0.5186	0.6022	0.5207
0.0213	1.2282	1300	0.0101	-98.3519	-1.9901	0.0103	0.6930	0.0103	0.5180	0.6022	0.5207
0.0191	1.2755	1350	0.0105	-98.1708	-1.9895	0.0107	0.6929	0.0107	0.5186	0.6022	0.5207
0.0183	1.3227	1400	0.0098	-98.2989	-1.9896	0.0099	0.6928	0.0099	0.5180	0.6022	0.5207
0.0173	1.3700	1450	0.0120	-98.4475	-1.9888	0.0120	0.6929	0.0120	0.5193	0.6022	0.5207
0.0171	1.4172	1500	0.0093	-98.4978	-1.9892	0.0093	0.6929	0.0093	0.5186	0.6022	0.5207
0.0164	1.4645	1550	0.0100	-98.4887	-1.9898	0.0101	0.6928	0.0101	0.5180	0.6022	0.5207
0.0165	1.5117	1600	0.0097	-98.4418	-1.9892	0.0096	0.6929	0.0096	0.5186	0.6022	0.5207
0.0128	1.5589	1650	0.0100	-98.3605	-1.9889	0.0101	0.6927	0.0101	0.5180	0.6022	0.5207
0.0132	1.6062	1700	0.0090	-98.4055	-1.9891	0.0089	0.6928	0.0089	0.5180	0.6022	0.5207
0.0133	1.6534	1750	0.0094	-98.4174	-1.9885	0.0094	0.6928	0.0094	0.5180	0.6022	0.5207
0.0138	1.7007	1800	0.0096	-98.3598	-1.9886	0.0097	0.6928	0.0097	0.5180	0.6022	0.5207
0.0122	1.7479	1850	0.0090	-98.4157	-1.9888	0.0091	0.6929	0.0091	0.5180	0.6022	0.5207
0.0128	1.7951	1900	0.0089	-98.4291	-1.9891	0.0090	0.6929	0.0089	0.5180	0.6022	0.5207
0.0133	1.8424	1950	0.0089	-98.4530	-1.9892	0.0090	0.6929	0.0090	0.5180	0.6022	0.5207
0.012	1.8896	2000	0.0087	-98.4584	-1.9894	0.0088	0.6929	0.0088	0.5180	0.6022	0.5207
0.0119	1.9369	2050	0.0088	-98.4571	-1.9894	0.0088	0.6929	0.0088	0.5180	0.6022	0.5207
0.0116	1.9841	2100	0.0088	-98.4573	-1.9894	0.0088	0.6929	0.0088	0.5180	0.6022	0.5207

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L1EXPO

qwen2.5-0.5b-expo-L1EXPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO

Evaluation results