llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6234
  • Rewards/chosen: 0.0858
  • Rewards/rejected: -0.1898
  • Rewards/accuracies: 0.6574
  • Rewards/margins: 0.2756
  • Logps/rejected: -198.1188
  • Logps/chosen: -205.4868
  • Logits/rejected: 0.7931
  • Logits/chosen: 0.8315

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6867 0.1 19 0.6390 0.0633 -0.1318 0.6451 0.1951 -197.8286 -205.5991 0.7774 0.8133
0.6727 0.21 38 0.6384 0.0354 -0.2285 0.6529 0.2639 -198.3123 -205.7386 0.8054 0.8432
0.6577 0.31 57 0.6391 -0.0114 -0.2258 0.6406 0.2145 -198.2988 -205.9725 0.7954 0.8346
0.6609 0.42 76 0.6344 -0.3737 -0.6175 0.6417 0.2438 -200.2571 -207.7841 0.7818 0.8194
0.6536 0.52 95 0.6285 -0.1130 -0.3816 0.6652 0.2687 -199.0778 -206.4805 0.7958 0.8350
0.654 0.62 114 0.6342 0.0007 -0.2311 0.6484 0.2318 -198.3250 -205.9122 0.7917 0.8303
0.6435 0.73 133 0.6258 0.0462 -0.2234 0.6562 0.2696 -198.2865 -205.6845 0.7949 0.8332
0.6508 0.83 152 0.6234 0.0858 -0.1898 0.6574 0.2756 -198.1188 -205.4868 0.7931 0.8315
0.6361 0.94 171 0.6269 0.1007 -0.1655 0.6618 0.2662 -197.9971 -205.4121 0.7975 0.8353

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for dhmeltzer/llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_contrast_1024_r_64_alpha_16