chat_1000STEPS_1e6rate

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6684
  • Rewards/chosen: -0.3437
  • Rewards/rejected: -0.4414
  • Rewards/accuracies: 0.5055
  • Rewards/margins: 0.0978
  • Logps/rejected: -23.2056
  • Logps/chosen: -20.1814
  • Logits/rejected: -0.8363
  • Logits/chosen: -0.8361

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6939 0.1 50 0.6917 -0.0037 -0.0069 0.4901 0.0032 -18.8600 -16.7813 -0.5975 -0.5973
0.6902 0.2 100 0.6919 -0.1261 -0.1323 0.4440 0.0063 -20.1147 -18.0054 -0.6143 -0.6142
0.6923 0.29 150 0.6796 -0.0370 -0.0721 0.4945 0.0351 -19.5126 -17.1150 -0.6569 -0.6568
0.6793 0.39 200 0.6803 -0.0086 -0.0473 0.4769 0.0387 -19.2641 -16.8305 -0.6452 -0.6450
0.6446 0.49 250 0.6790 -0.0967 -0.1427 0.4857 0.0460 -20.2182 -17.7115 -0.6468 -0.6466
0.6365 0.59 300 0.6809 -0.1168 -0.1650 0.4681 0.0482 -20.4409 -17.9127 -0.6877 -0.6874
0.6828 0.68 350 0.6765 -0.1034 -0.1632 0.4923 0.0599 -20.4235 -17.7782 -0.6849 -0.6847
0.6797 0.78 400 0.6788 -0.0900 -0.1511 0.4923 0.0611 -20.3023 -17.6445 -0.6763 -0.6762
0.6751 0.88 450 0.6772 -0.0807 -0.1445 0.4945 0.0638 -20.2366 -17.5521 -0.6528 -0.6526
0.6596 0.98 500 0.6744 -0.1091 -0.1779 0.5055 0.0688 -20.5702 -17.8358 -0.6395 -0.6393
0.4819 1.07 550 0.6714 -0.2112 -0.2907 0.5077 0.0795 -21.6987 -18.8566 -0.7045 -0.7043
0.4754 1.17 600 0.6699 -0.2743 -0.3603 0.5011 0.0860 -22.3943 -19.4880 -0.7556 -0.7554
0.4339 1.27 650 0.6694 -0.2906 -0.3826 0.5033 0.0920 -22.6175 -19.6505 -0.8041 -0.8039
0.4692 1.37 700 0.6673 -0.3183 -0.4163 0.5033 0.0980 -22.9541 -19.9276 -0.8200 -0.8199
0.4767 1.46 750 0.6681 -0.3342 -0.4320 0.5055 0.0978 -23.1116 -20.0865 -0.8291 -0.8289
0.4125 1.56 800 0.6684 -0.3381 -0.4355 0.5099 0.0974 -23.1466 -20.1256 -0.8330 -0.8328
0.4733 1.66 850 0.6681 -0.3425 -0.4407 0.5011 0.0983 -23.1986 -20.1691 -0.8359 -0.8357
0.4699 1.76 900 0.6683 -0.3431 -0.4412 0.5077 0.0981 -23.2032 -20.1758 -0.8365 -0.8363
0.4629 1.86 950 0.6682 -0.3438 -0.4421 0.5011 0.0984 -23.2125 -20.1823 -0.8365 -0.8363
0.4482 1.95 1000 0.6684 -0.3437 -0.4414 0.5055 0.0978 -23.2056 -20.1814 -0.8363 -0.8361

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
3
Safetensors
Model size
6.74B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/chat_1000STEPS_1e6rate_01beta_DPO

Finetuned
(419)
this model