IE_M2_1000steps_1e8rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6722
  • Rewards/chosen: -0.0007
  • Rewards/rejected: -0.0438
  • Rewards/accuracies: 0.4600
  • Rewards/margins: 0.0430
  • Logps/rejected: -41.4594
  • Logps/chosen: -42.2128
  • Logits/rejected: -2.9152
  • Logits/chosen: -2.8539

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6934 0.4 50 0.6935 0.0012 0.0018 0.2150 -0.0007 -41.0035 -42.1940 -2.9159 -2.8546
0.6929 0.8 100 0.6915 -0.0004 -0.0037 0.2700 0.0033 -41.0589 -42.2096 -2.9161 -2.8547
0.6885 1.2 150 0.6866 -0.0003 -0.0137 0.3850 0.0134 -41.1590 -42.2088 -2.9159 -2.8546
0.6814 1.6 200 0.6823 -0.0012 -0.0232 0.4450 0.0220 -41.2542 -42.2174 -2.9154 -2.8541
0.6809 2.0 250 0.6785 0.0002 -0.0296 0.4550 0.0299 -41.3181 -42.2031 -2.9155 -2.8541
0.6811 2.4 300 0.6754 -0.0007 -0.0371 0.4550 0.0364 -41.3932 -42.2125 -2.9154 -2.8541
0.6735 2.8 350 0.6737 -0.0013 -0.0412 0.4600 0.0399 -41.4335 -42.2181 -2.9153 -2.8539
0.6771 3.2 400 0.6722 -0.0004 -0.0434 0.4600 0.0430 -41.4555 -42.2092 -2.9152 -2.8540
0.6766 3.6 450 0.6723 -0.0018 -0.0446 0.4600 0.0428 -41.4676 -42.2235 -2.9152 -2.8539
0.6789 4.0 500 0.6712 -0.0002 -0.0454 0.4600 0.0451 -41.4754 -42.2077 -2.9152 -2.8539
0.6691 4.4 550 0.6713 -0.0007 -0.0456 0.4600 0.0450 -41.4782 -42.2121 -2.9152 -2.8539
0.6703 4.8 600 0.6720 -0.0010 -0.0444 0.4600 0.0434 -41.4658 -42.2152 -2.9152 -2.8538
0.6752 5.2 650 0.6717 -0.0017 -0.0458 0.4550 0.0441 -41.4803 -42.2227 -2.9152 -2.8539
0.6759 5.6 700 0.6712 0.0000 -0.0452 0.4600 0.0452 -41.4734 -42.2054 -2.9150 -2.8537
0.6694 6.0 750 0.6725 -0.0018 -0.0443 0.4600 0.0425 -41.4648 -42.2237 -2.9153 -2.8539
0.6593 6.4 800 0.6721 -0.0006 -0.0439 0.4600 0.0433 -41.4608 -42.2114 -2.9153 -2.8540
0.6781 6.8 850 0.6722 -0.0007 -0.0437 0.4600 0.0430 -41.4589 -42.2125 -2.9152 -2.8539
0.668 7.2 900 0.6722 -0.0007 -0.0438 0.4600 0.0430 -41.4594 -42.2128 -2.9152 -2.8539
0.6692 7.6 950 0.6722 -0.0007 -0.0438 0.4600 0.0430 -41.4594 -42.2128 -2.9152 -2.8539
0.6724 8.0 1000 0.6722 -0.0007 -0.0438 0.4600 0.0430 -41.4594 -42.2128 -2.9152 -2.8539

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.0.0+cu117
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/IE_M2_1000steps_1e8rate_01beta_cSFTDPO