Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-ES-1000

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 4901.4473
  • Logps: -79.4462
  • Logits: -0.5595
  • Objective: 4906.0884
  • Dpo Loss: 2071.1946
  • Regularize: 4906.0884
  • Ranking Simple: 0.5362
  • Ranking Idealized: 0.5212
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 14.7030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
171.6127 0.1417 50 308.1877 -1.4575 -90.7970 309.7356 308.1877 0.5212 0.5212 0.5254 308.1877 7.7047
582.3149 0.2834 100 708.8133 -1.3877 -88.8040 733.8202 708.8133 0.5212 0.5212 0.5285 708.8133 7.4775
1002.3808 0.4251 150 1245.7263 -1.3138 -83.6606 1283.0697 1245.7263 0.5212 0.5212 0.5311 1245.7263 7.3632
1199.7266 0.5668 200 1471.0287 -1.2584 -79.8249 1530.3123 1471.0287 0.5212 0.5212 0.5347 1471.0287 7.2330
1311.3106 0.7085 250 1842.3601 -1.1799 -78.5750 1873.1123 1842.3601 0.5212 0.5212 0.5347 1842.3601 7.2046
1216.5524 0.8503 300 1949.1084 -1.0463 -80.6875 2001.6104 1949.1084 0.5212 0.5212 0.5326 1949.1084 6.9438
1157.2415 0.9920 350 1956.4012 -0.8782 -79.7493 2064.3220 1956.4012 0.5212 0.5212 0.5440 1956.4012 7.0169
721.9005 1.1337 400 2228.8811 -0.5703 -80.2022 2276.4189 2228.8811 0.5212 0.5212 0.5404 2228.8811 7.2480
779.6797 1.2754 450 2016.3281 -0.7091 -78.4054 2069.4939 2016.3281 0.5212 0.5212 0.5367 2016.3281 6.8242
788.48 1.4171 500 2044.0745 -0.6659 -81.9827 2120.1182 2044.0745 0.5212 0.5212 0.5342 2044.0745 6.8667
684.4246 1.5588 550 2053.8372 -0.6751 -81.6376 2148.1580 2053.8372 0.5212 0.5212 0.5342 2053.8372 6.7901
708.5259 1.7005 600 2071.1946 -0.5595 -79.4462 2179.6001 2071.1946 0.5212 0.5212 0.5362 2071.1946 6.6511
690.9902 1.8422 650 2158.4885 -0.5552 -80.5108 2241.3740 2158.4885 0.5212 0.5212 0.5414 2158.4885 6.7740
617.6108 1.9839 700 2132.2517 -0.5079 -80.3825 2230.5115 2132.2517 0.5212 0.5212 0.5404 2132.2517 6.7954
343.0455 2.1256 750 2123.3604 -0.5398 -81.3539 2199.7175 2123.3604 0.5212 0.5212 0.5430 2123.3604 6.7578
311.7518 2.2674 800 2038.6656 -0.5497 -80.2739 2139.7871 2038.6656 0.5212 0.5212 0.5378 2038.6656 6.6768
315.5968 2.4091 850 2184.7112 -0.5282 -83.3843 2249.3201 2184.7112 0.5212 0.5212 0.5404 2184.7112 6.8072
6263.3387 2.5555 900 6365.9199 -90.6956 -0.1035 6410.7383 3261.5381 6410.7383 0.5248 0.5212 0.5212 14.3919
4964.9731 2.6972 950 6126.6899 -88.8726 -0.1541 6172.9790 3203.6243 6172.9790 0.5259 0.5212 0.5212 14.2868
4278.7487 2.8389 1000 6092.5342 -87.8890 -0.0946 6127.7734 3153.0500 6127.7734 0.5243 0.5212 0.5212 14.2488
3830.4475 2.9806 1050 6040.4917 -86.9566 -0.1365 6063.3457 3127.4956 6063.3457 0.5233 0.5212 0.5212 14.0310
3079.7456 3.1223 1100 6014.5977 -88.1664 -0.1249 6010.9189 3144.7844 6010.9189 0.5217 0.5212 0.5212 14.2338

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-ES-1000

Finetuned
(47)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-ES-1000