qwen2.5-0.5b-expo-L1EXPO-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

Loss: 0.1381
Logps: -85.9802
Logits: -1.2306
Objective: 0.1370
Dpo Loss: 0.6974
Regularize: 0.1370
Ranking Simple: 0.5243
Ranking Idealized: 0.6025
Ranking Idealized Expo: 0.5233
Wo Beta: 15.6347

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 12
total_train_batch_size: 144
total_eval_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo	Wo Beta
0.0351	0.1417	50	0.0221	-91.2329	-1.3917	0.0224	0.6927	0.0224	0.5212	0.6025	0.5233	16.2217
0.0877	0.2834	100	0.0433	-88.6602	-1.3863	0.0447	0.6922	0.0447	0.5238	0.6025	0.5233	16.1682
0.1323	0.4251	150	0.0768	-90.4377	-1.3054	0.0764	0.6956	0.0764	0.5223	0.6025	0.5233	16.0034
0.1427	0.5668	200	0.1032	-88.3433	-1.3124	0.1017	0.6959	0.1017	0.5223	0.6025	0.5233	15.9928
0.1451	0.7085	250	0.1178	-88.0698	-1.2854	0.1185	0.6950	0.1185	0.5274	0.6025	0.5233	15.7878
0.1305	0.8503	300	0.1247	-86.3312	-1.2863	0.1252	0.6961	0.1252	0.5280	0.6025	0.5233	15.7668
0.1407	0.9920	350	0.1314	-86.4501	-1.2757	0.1310	0.6976	0.1310	0.5223	0.6025	0.5233	15.6570
0.1245	1.1337	400	0.1399	-86.2849	-1.2418	0.1390	0.6980	0.1390	0.5259	0.6025	0.5233	15.6147
0.1163	1.2754	450	0.1421	-85.4828	-1.2307	0.1421	0.6985	0.1421	0.5274	0.6025	0.5233	15.6128
0.1071	1.4171	500	0.1382	-87.2673	-1.2270	0.1376	0.6980	0.1376	0.5285	0.6025	0.5233	15.6445
0.1045	1.5588	550	0.1428	-87.0776	-1.2327	0.1426	0.6977	0.1426	0.5254	0.6025	0.5233	15.5807
0.0866	1.7005	600	0.1424	-85.1926	-1.2196	0.1408	0.6965	0.1408	0.5269	0.6025	0.5233	15.6603
0.0847	1.8422	650	0.1380	-86.1129	-1.2229	0.1356	0.6974	0.1356	0.5243	0.6025	0.5233	15.6660
0.071	1.9839	700	0.1420	-85.2496	-1.2208	0.1405	0.6980	0.1405	0.5254	0.6025	0.5233	15.6109
0.0546	2.1256	750	0.1423	-85.4691	-1.2233	0.1407	0.6980	0.1407	0.5259	0.6025	0.5233	15.6480
0.0531	2.2674	800	0.1386	-86.1368	-1.2206	0.1371	0.6981	0.1371	0.5243	0.6025	0.5233	15.6234
0.0444	2.4091	850	0.1395	-86.0362	-1.2271	0.1382	0.6980	0.1382	0.5238	0.6025	0.5233	15.6472
0.0438	2.5508	900	0.1387	-85.8840	-1.2296	0.1374	0.6975	0.1374	0.5238	0.6025	0.5233	15.6345
0.0384	2.6925	950	0.1380	-85.9590	-1.2285	0.1368	0.6975	0.1368	0.5238	0.6025	0.5233	15.6425
0.0375	2.8342	1000	0.1380	-85.9976	-1.2305	0.1369	0.6974	0.1369	0.5243	0.6025	0.5233	15.6355
0.0397	2.9759	1050	0.1381	-85.9802	-1.2306	0.1370	0.6974	0.1370	0.5243	0.6025	0.5233	15.6347

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L1EXPO-noES-0.1

qwen2.5-0.5b-expo-L1EXPO-noES-0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-noES-0.1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-noES-0.1

Evaluation results