Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-noES3-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7566
  • Logps: -117.9870
  • Logits: -1.9778
  • Objective: 0.7537
  • Dpo Loss: 0.7537
  • Regularize: 0.7537
  • Ranking Simple: 0.5595
  • Wo Beta: 9.1042

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Wo Beta
0.6316 0.1417 50 0.6807 -90.3282 -1.5879 0.6825 0.6825 0.6825 0.5342 7.8619
0.5922 0.2834 100 0.6793 -95.8152 -1.7964 0.6819 0.6819 0.6819 0.5487 7.7077
0.5002 0.4251 150 0.6815 -96.2024 -1.5951 0.6749 0.6749 0.6749 0.5497 7.4380
0.4735 0.5668 200 0.6951 -98.9176 -1.7564 0.6911 0.6911 0.6911 0.5569 7.5241
0.4626 0.7085 250 0.6976 -93.4775 -1.7986 0.6945 0.6945 0.6945 0.5580 7.9027
0.4214 0.8503 300 0.6931 -104.4337 -2.0138 0.6865 0.6865 0.6865 0.5616 7.5814
0.3652 0.9920 350 0.7074 -102.8306 -1.9094 0.6984 0.6984 0.6984 0.5559 7.8344
0.2206 1.1337 400 0.7347 -113.6048 -2.0909 0.7296 0.7296 0.7296 0.5502 8.6751
0.2202 1.2754 450 0.7463 -115.7782 -1.9911 0.7433 0.7433 0.7433 0.5512 8.9123
0.2366 1.4171 500 0.7444 -114.7710 -2.0464 0.7387 0.7387 0.7387 0.5518 8.8630
0.1989 1.5588 550 0.7553 -118.7775 -2.0168 0.7519 0.7519 0.7519 0.5595 8.9846
0.1952 1.7005 600 0.7544 -117.4880 -1.9707 0.7513 0.7513 0.7513 0.5595 9.0297
0.2252 1.8422 650 0.7560 -117.8008 -1.9748 0.7529 0.7529 0.7529 0.5585 9.0926
0.199 1.9839 700 0.7566 -117.9869 -1.9778 0.7537 0.7537 0.7537 0.5595 9.1042

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-noES3-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-noES3-0.1