Llama0-3-8b-ultra-p-0.05-lr1e-6-e3
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5678
- Rewards/chosen: -3.1024
- Rewards/rejected: -5.3878
- Rewards/accuracies: 0.7656
- Rewards/margins: 2.2855
- Logps/rejected: -803.3770
- Logps/chosen: -566.8709
- Logits/rejected: -0.7286
- Logits/chosen: -0.3055
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5867 | 0.2060 | 100 | 0.5790 | -0.4168 | -0.7897 | 0.6797 | 0.3729 | -343.5607 | -298.3142 | 0.0683 | 0.0050 |
0.5459 | 0.4119 | 200 | 0.5337 | -0.8336 | -1.5839 | 0.7422 | 0.7504 | -422.9892 | -339.9911 | 0.4007 | 0.2879 |
0.5201 | 0.6179 | 300 | 0.5116 | -0.7067 | -1.5136 | 0.7344 | 0.8069 | -415.9542 | -327.3016 | 0.3623 | 0.2661 |
0.5068 | 0.8239 | 400 | 0.5037 | -0.7404 | -1.6591 | 0.7891 | 0.9187 | -430.5064 | -330.6776 | 0.2848 | 0.2141 |
0.427 | 1.0299 | 500 | 0.5057 | -1.4842 | -2.9740 | 0.75 | 1.4898 | -561.9933 | -405.0575 | -0.1430 | -0.0848 |
0.3367 | 1.2358 | 600 | 0.5150 | -1.9307 | -3.6670 | 0.75 | 1.7363 | -631.2911 | -449.7062 | -0.1170 | -0.0016 |
0.336 | 1.4418 | 700 | 0.5013 | -1.6315 | -3.1525 | 0.7656 | 1.5211 | -579.8499 | -419.7817 | -0.0661 | 0.0619 |
0.3443 | 1.6478 | 800 | 0.4919 | -1.5274 | -2.9336 | 0.7656 | 1.4062 | -557.9580 | -409.3778 | -0.0808 | 0.0430 |
0.3387 | 1.8538 | 900 | 0.5136 | -1.8875 | -3.4761 | 0.7578 | 1.5886 | -612.2042 | -445.3885 | -0.0675 | 0.0881 |
0.2045 | 2.0597 | 1000 | 0.5396 | -2.6871 | -4.6850 | 0.7656 | 1.9979 | -733.0979 | -525.3492 | -0.3513 | -0.1306 |
0.1911 | 2.2657 | 1100 | 0.5562 | -3.0265 | -5.1837 | 0.7422 | 2.1572 | -782.9683 | -559.2891 | -0.6321 | -0.2757 |
0.1935 | 2.4717 | 1200 | 0.5518 | -2.8870 | -5.0043 | 0.75 | 2.1173 | -765.0246 | -545.3388 | -0.6105 | -0.2462 |
0.1909 | 2.6777 | 1300 | 0.5623 | -3.0447 | -5.2451 | 0.75 | 2.2004 | -789.1040 | -561.1038 | -0.6371 | -0.2728 |
0.1805 | 2.8836 | 1400 | 0.5746 | -3.2314 | -5.5860 | 0.75 | 2.3546 | -823.1945 | -579.7725 | -0.7721 | -0.3436 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.