dpo-model-lora

This model is a fine-tuned version of Qwen/Qwen2-0.5B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6534
  • Rewards/chosen: -0.7320
  • Rewards/rejected: -0.8303
  • Rewards/accuracies: 0.6172
  • Rewards/margins: 0.0983
  • Logps/rejected: -359.0921
  • Logps/chosen: -378.4928
  • Logits/rejected: -2.2715
  • Logits/chosen: -2.3471

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6884 0.1030 50 0.6879 -0.0543 -0.0734 0.6484 0.0191 -351.5229 -371.7161 -2.2877 -2.3628
0.6787 0.2060 100 0.6770 -0.1811 -0.2114 0.6016 0.0303 -352.9030 -372.9836 -2.2815 -2.3565
0.6721 0.3090 150 0.6721 -0.2679 -0.3094 0.6562 0.0415 -353.8831 -373.8524 -2.2782 -2.3536
0.6668 0.4119 200 0.6665 -0.4037 -0.4625 0.6016 0.0588 -355.4139 -375.2100 -2.2758 -2.3515
0.6597 0.5149 250 0.6612 -0.4907 -0.5505 0.6172 0.0598 -356.2946 -376.0805 -2.2757 -2.3510
0.6581 0.6179 300 0.6578 -0.6137 -0.6975 0.625 0.0838 -357.7639 -377.3098 -2.2736 -2.3491
0.6536 0.7209 350 0.6556 -0.6458 -0.7367 0.6328 0.0909 -358.1565 -377.6311 -2.2732 -2.3489
0.6486 0.8239 400 0.6556 -0.7025 -0.7958 0.6328 0.0933 -358.7473 -378.1981 -2.2737 -2.3493
0.649 0.9269 450 0.6556 -0.7432 -0.8327 0.6484 0.0896 -359.1166 -378.6048 -2.2726 -2.3482

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lewtun/dpo-model-lora

Base model

Qwen/Qwen2-0.5B
Adapter
(265)
this model