Visualize in Weights & Biases

qwen2.5-0.5b-expo-L1EXPO-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1381
  • Logps: -85.9802
  • Logits: -1.2306
  • Objective: 0.1370
  • Dpo Loss: 0.6974
  • Regularize: 0.1370
  • Ranking Simple: 0.5243
  • Ranking Idealized: 0.6025
  • Ranking Idealized Expo: 0.5233
  • Wo Beta: 15.6347

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.0351 0.1417 50 0.0221 -91.2329 -1.3917 0.0224 0.6927 0.0224 0.5212 0.6025 0.5233 16.2217
0.0877 0.2834 100 0.0433 -88.6602 -1.3863 0.0447 0.6922 0.0447 0.5238 0.6025 0.5233 16.1682
0.1323 0.4251 150 0.0768 -90.4377 -1.3054 0.0764 0.6956 0.0764 0.5223 0.6025 0.5233 16.0034
0.1427 0.5668 200 0.1032 -88.3433 -1.3124 0.1017 0.6959 0.1017 0.5223 0.6025 0.5233 15.9928
0.1451 0.7085 250 0.1178 -88.0698 -1.2854 0.1185 0.6950 0.1185 0.5274 0.6025 0.5233 15.7878
0.1305 0.8503 300 0.1247 -86.3312 -1.2863 0.1252 0.6961 0.1252 0.5280 0.6025 0.5233 15.7668
0.1407 0.9920 350 0.1314 -86.4501 -1.2757 0.1310 0.6976 0.1310 0.5223 0.6025 0.5233 15.6570
0.1245 1.1337 400 0.1399 -86.2849 -1.2418 0.1390 0.6980 0.1390 0.5259 0.6025 0.5233 15.6147
0.1163 1.2754 450 0.1421 -85.4828 -1.2307 0.1421 0.6985 0.1421 0.5274 0.6025 0.5233 15.6128
0.1071 1.4171 500 0.1382 -87.2673 -1.2270 0.1376 0.6980 0.1376 0.5285 0.6025 0.5233 15.6445
0.1045 1.5588 550 0.1428 -87.0776 -1.2327 0.1426 0.6977 0.1426 0.5254 0.6025 0.5233 15.5807
0.0866 1.7005 600 0.1424 -85.1926 -1.2196 0.1408 0.6965 0.1408 0.5269 0.6025 0.5233 15.6603
0.0847 1.8422 650 0.1380 -86.1129 -1.2229 0.1356 0.6974 0.1356 0.5243 0.6025 0.5233 15.6660
0.071 1.9839 700 0.1420 -85.2496 -1.2208 0.1405 0.6980 0.1405 0.5254 0.6025 0.5233 15.6109
0.0546 2.1256 750 0.1423 -85.4691 -1.2233 0.1407 0.6980 0.1407 0.5259 0.6025 0.5233 15.6480
0.0531 2.2674 800 0.1386 -86.1368 -1.2206 0.1371 0.6981 0.1371 0.5243 0.6025 0.5233 15.6234
0.0444 2.4091 850 0.1395 -86.0362 -1.2271 0.1382 0.6980 0.1382 0.5238 0.6025 0.5233 15.6472
0.0438 2.5508 900 0.1387 -85.8840 -1.2296 0.1374 0.6975 0.1374 0.5238 0.6025 0.5233 15.6345
0.0384 2.6925 950 0.1380 -85.9590 -1.2285 0.1368 0.6975 0.1368 0.5238 0.6025 0.5233 15.6425
0.0375 2.8342 1000 0.1380 -85.9976 -1.2305 0.1369 0.6974 0.1369 0.5243 0.6025 0.5233 15.6355
0.0397 2.9759 1050 0.1381 -85.9802 -1.2306 0.1370 0.6974 0.1370 0.5243 0.6025 0.5233 15.6347

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-L1EXPO-noES-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L1EXPO-noES-0.1