Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6458
  • Logps: -79.3784
  • Logits: -0.5665
  • Objective: 0.6297
  • Dpo Loss: 0.7206
  • Regularize: 0.6297
  • Ranking Simple: 0.5347
  • Ranking Idealized: 0.6030
  • Ranking Idealized Expo: 0.5223
  • Wo Beta: 14.3163

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo Wo Beta
0.4017 0.1417 50 0.4165 -93.1726 -1.5024 0.4149 0.6868 0.4149 0.5259 0.6030 0.5223 16.4267
0.3777 0.2834 100 0.4360 -92.8653 -1.4775 0.4269 0.6818 0.4269 0.5316 0.6030 0.5223 16.2439
0.4057 0.4251 150 0.4911 -84.1774 -1.2946 0.4805 0.6897 0.4805 0.5383 0.6030 0.5223 15.6306
0.4475 0.5668 200 0.5660 -89.7342 -0.9897 0.5515 0.7103 0.5515 0.5316 0.6030 0.5223 15.1280
0.455 0.7085 250 0.5978 -78.1917 -1.0033 0.5822 0.7171 0.5822 0.5311 0.6030 0.5223 14.6763
0.4337 0.8503 300 0.5993 -78.8918 -0.6761 0.5779 0.7105 0.5779 0.5300 0.6030 0.5223 14.9196
0.4039 0.9920 350 0.5978 -75.1520 -0.7968 0.5765 0.7078 0.5765 0.5290 0.6030 0.5223 14.6531
0.3729 1.1337 400 0.6180 -75.1433 -0.5569 0.6000 0.7153 0.6000 0.5228 0.6030 0.5223 14.6471
0.3454 1.2754 450 0.6316 -76.2289 -0.6214 0.6131 0.7165 0.6131 0.5336 0.6030 0.5223 14.5034
0.3226 1.4171 500 0.6255 -77.6040 -0.5608 0.6084 0.7204 0.6084 0.5285 0.6030 0.5223 14.4998
0.3133 1.5588 550 0.6282 -78.6291 -0.6736 0.6138 0.7139 0.6138 0.5336 0.6030 0.5223 14.4069
0.2944 1.7005 600 0.6321 -78.9179 -0.5620 0.6139 0.7175 0.6139 0.5357 0.6030 0.5223 14.6142
0.2915 1.8422 650 0.6321 -77.4437 -0.7021 0.6157 0.7138 0.6157 0.5367 0.6030 0.5223 14.3858
0.2675 1.9839 700 0.6386 -79.3600 -0.5612 0.6233 0.7185 0.6233 0.5290 0.6030 0.5223 14.3171
0.2415 2.1256 750 0.6405 -80.0990 -0.6174 0.6263 0.7177 0.6263 0.5347 0.6030 0.5223 14.4302
0.2263 2.2674 800 0.6458 -79.3784 -0.5665 0.6297 0.7206 0.6297 0.5347 0.6030 0.5223 14.3163
0.2148 2.4091 850 0.6436 -79.0806 -0.5793 0.6276 0.7192 0.6276 0.5362 0.6030 0.5223 14.4263
0.1993 2.5508 900 0.6454 -80.3815 -0.5621 0.6302 0.7217 0.6302 0.5342 0.6030 0.5223 14.4491
0.1887 2.6925 950 0.6443 -79.1446 -0.6216 0.6274 0.7204 0.6274 0.5336 0.6030 0.5223 14.3186
0.1764 2.8342 1000 0.6399 -79.7721 -0.6087 0.6246 0.7200 0.6246 0.5336 0.6030 0.5223 14.4502
0.163 2.9759 1050 0.6428 -79.5818 -0.6068 0.6266 0.7211 0.6266 0.5316 0.6030 0.5223 14.3406

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.1

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-ES-0.1