Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES4-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6798
  • Logps: -81.3660
  • Logits: -0.2689
  • Objective: 0.6594
  • Dpo Loss: 2.5881
  • Ranking Simple: 0.5290

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Ranking Simple
0.2167 0.1417 50 0.1866 -89.7452 -1.3915 0.1841 0.8977 0.5243
0.3971 0.2834 100 0.3324 -90.0353 -1.2435 0.3345 1.3259 0.5207
0.5144 0.4251 150 0.5155 -78.4187 -1.3789 0.5175 1.9820 0.5295
0.5194 0.5668 200 0.5709 -73.3419 -0.7470 0.5515 2.1252 0.5285
0.5608 0.7085 250 0.6097 -78.0422 -0.6609 0.5942 2.2497 0.5362
0.5689 0.8503 300 0.6153 -80.6416 -0.5345 0.6046 2.2745 0.5305
0.5556 0.9920 350 0.6604 -80.0774 -0.4720 0.6456 2.4892 0.5347
0.6911 1.1337 400 0.6987 -80.2664 -0.3889 0.6688 2.6312 0.5362
0.4109 1.2754 450 0.6748 -80.4623 -0.3379 0.6533 2.6157 0.5321
0.5646 1.4171 500 0.6765 -80.2331 -0.3752 0.6533 2.5394 0.5347
0.5393 1.5588 550 0.6834 -80.9551 -0.3117 0.6649 2.5727 0.5336
0.4349 1.7005 600 0.6761 -81.5591 -0.2322 0.6615 2.5397 0.5300
0.4677 1.8422 650 0.6817 -80.5548 -0.2289 0.6641 2.5860 0.5243
0.3266 1.9839 700 0.6802 -80.5748 -0.2811 0.6562 2.5943 0.5254
0.2719 2.1256 750 0.6823 -81.2181 -0.2618 0.6627 2.5971 0.5316
0.2695 2.2674 800 0.6833 -81.1710 -0.2659 0.6653 2.5873 0.5295
0.3121 2.4091 850 0.6796 -81.3012 -0.2552 0.6616 2.5816 0.5264
0.3154 2.5508 900 0.6804 -81.3982 -0.2835 0.6608 2.5860 0.5285
0.2493 2.6925 950 0.6800 -81.4311 -0.2696 0.6596 2.5904 0.5285
0.2171 2.8342 1000 0.6800 -81.3825 -0.2689 0.6595 2.5888 0.5290
0.2548 2.9759 1050 0.6798 -81.3660 -0.2689 0.6594 2.5881 0.5290

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES4-1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES4-1