Llama-3-Base-8B / README.md
fenguhao's picture
End of training
cb85392 verified
metadata
base_model: princeton-nlp/Llama-3-Base-8B-SFT
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: Llama-3-Base-8B
    results: []

Llama-3-Base-8B

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6285
  • Rewards/chosen: 0.5979
  • Rewards/rejected: 0.1801
  • Rewards/accuracies: 0.6620
  • Rewards/margins: 0.4178
  • Logps/rejected: -2212.5046
  • Logps/chosen: -2612.9824
  • Logits/rejected: -1.3033
  • Logits/chosen: -1.3358

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6694 0.03 100 0.6733 0.4668 0.3687 0.5500 0.0980 -2193.6436 -2626.0984 -1.2047 -1.2463
0.6496 0.05 200 0.6497 0.8935 0.6578 0.6040 0.2357 -2164.7385 -2583.4270 -1.1621 -1.2030
0.6358 0.08 300 0.6672 0.6703 0.4436 0.5900 0.2266 -2186.1528 -2605.7471 -1.2202 -1.2617
0.6783 0.1 400 0.7144 0.2834 0.0925 0.5680 0.1909 -2221.2676 -2644.4390 -1.3598 -1.4017
0.751 0.13 500 0.6889 1.3453 0.9758 0.6020 0.3696 -2132.9402 -2538.2405 -1.4750 -1.5419
0.6921 0.16 600 0.6644 0.8464 0.5451 0.6220 0.3014 -2176.0090 -2588.1318 -1.2841 -1.3381
0.6437 0.18 700 0.6724 0.8250 0.4796 0.6420 0.3454 -2182.5566 -2590.2764 -1.4526 -1.4817
0.8109 0.21 800 0.6655 1.1490 0.7473 0.6380 0.4017 -2155.7832 -2557.8708 -1.5267 -1.5761
0.6725 0.24 900 0.6836 1.4258 0.9989 0.6160 0.4269 -2130.6240 -2530.1914 -1.4486 -1.4910
0.7027 0.26 1000 0.6690 0.8152 0.4729 0.6260 0.3424 -2183.2278 -2591.2505 -1.5095 -1.5565
0.6421 0.29 1100 0.6513 0.5281 0.1941 0.6640 0.3340 -2211.1040 -2619.9661 -1.5382 -1.5785
0.6217 0.31 1200 0.6436 0.7372 0.3396 0.6460 0.3976 -2196.5581 -2599.0544 -1.6345 -1.6765
0.7365 0.34 1300 0.6400 0.9183 0.5227 0.6240 0.3956 -2178.2437 -2580.9446 -1.5597 -1.6009
0.7057 0.37 1400 0.6468 0.9514 0.5619 0.6140 0.3895 -2174.3254 -2577.6377 -1.6716 -1.7117
0.6396 0.39 1500 0.6498 0.9546 0.5405 0.6400 0.4141 -2176.4675 -2577.3193 -1.6244 -1.6600
0.5835 0.42 1600 0.6488 0.9504 0.5356 0.6480 0.4148 -2176.9568 -2577.7402 -1.6255 -1.6706
0.629 0.44 1700 0.6501 1.2484 0.8056 0.6100 0.4428 -2149.9568 -2547.9316 -1.5737 -1.6192
0.6495 0.47 1800 0.6440 1.2029 0.7629 0.6280 0.4400 -2154.2307 -2552.4846 -1.4589 -1.4973
0.6465 0.5 1900 0.6641 0.2111 -0.0941 0.6280 0.3052 -2239.9255 -2651.6641 -1.4961 -1.5323
0.6866 0.52 2000 0.6480 0.5747 0.1977 0.6600 0.3770 -2210.75 -2615.3054 -1.4509 -1.4934
0.6441 0.55 2100 0.6358 0.8809 0.4502 0.6480 0.4307 -2185.4985 -2584.6841 -1.4418 -1.4842
0.6752 0.58 2200 0.6346 0.9311 0.5075 0.6560 0.4236 -2179.7668 -2579.6636 -1.3193 -1.3656
0.5646 0.6 2300 0.6396 0.6599 0.2912 0.6480 0.3686 -2201.3948 -2606.7883 -1.2832 -1.3116
0.6519 0.63 2400 0.6451 0.4237 0.0937 0.6400 0.3300 -2221.1460 -2630.4050 -1.4460 -1.4777
0.6292 0.65 2500 0.6313 0.8682 0.4231 0.6460 0.4452 -2188.2095 -2585.9512 -1.4040 -1.4397
0.5985 0.68 2600 0.6274 0.8396 0.3650 0.6640 0.4746 -2194.0144 -2588.8174 -1.3580 -1.3860
0.6323 0.71 2700 0.6328 0.6585 0.2012 0.6640 0.4573 -2210.3958 -2606.9260 -1.2622 -1.2938
0.6174 0.73 2800 0.6305 0.8505 0.3762 0.6580 0.4744 -2192.8989 -2587.7209 -1.3312 -1.3635
0.5972 0.76 2900 0.6310 0.6521 0.2290 0.6600 0.4231 -2207.6130 -2607.5659 -1.3492 -1.3840
0.6645 0.79 3000 0.6291 0.7035 0.2579 0.6520 0.4456 -2204.7251 -2602.4238 -1.3330 -1.3678
0.5786 0.81 3100 0.6310 0.5452 0.1222 0.6580 0.4230 -2218.2944 -2618.2534 -1.3173 -1.3498
0.604 0.84 3200 0.6375 0.3327 -0.0527 0.6540 0.3854 -2235.7852 -2639.5032 -1.3444 -1.3760
0.6704 0.86 3300 0.6269 0.7327 0.2896 0.6540 0.4431 -2201.5579 -2599.5049 -1.3241 -1.3585
0.6365 0.89 3400 0.6271 0.6900 0.2577 0.6560 0.4323 -2204.7437 -2603.7739 -1.3038 -1.3371
0.6621 0.92 3500 0.6279 0.6303 0.2073 0.6580 0.4230 -2209.7827 -2609.7432 -1.2991 -1.3321
0.6597 0.94 3600 0.6294 0.5540 0.1441 0.6580 0.4099 -2216.1082 -2617.3774 -1.3028 -1.3348
0.671 0.97 3700 0.6285 0.5945 0.1774 0.6600 0.4171 -2212.7783 -2613.3303 -1.3033 -1.3358
0.6328 0.99 3800 0.6283 0.5985 0.1803 0.6580 0.4182 -2212.4902 -2612.9258 -1.3032 -1.3356

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.2