Visualize in Weights & Biases

Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-human-eval-final

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the meng-lab/Llama-3.1-8B-Instruct-humaneval dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3754
  • Loss Layer 4 Head: 1.6774
  • Loss Layer 8 Head: 1.3806
  • Loss Layer 12 Head: 1.2795
  • Loss Layer 16 Head: 0.6378
  • Loss Layer 20 Head: 0.3110
  • Loss Layer 24 Head: 0.1844
  • Loss Layer 28 Head: 0.0864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Loss Layer 4 Head Loss Layer 8 Head Loss Layer 12 Head Loss Layer 16 Head Loss Layer 20 Head Loss Layer 24 Head Loss Layer 28 Head
7.7477 9.6823 200 7.6952 1.9941 1.7442 1.9609 1.0923 0.4414 0.2459 0.4381
5.8078 19.3646 400 6.4289 1.9090 1.5288 1.4099 0.9812 0.3976 0.2383 0.1448
4.8435 29.0469 600 5.9964 1.8480 1.5236 1.3836 0.6737 0.3976 0.2537 0.1092
4.6084 38.7292 800 6.0069 1.8460 1.7121 1.3111 0.6743 0.3436 0.2146 0.0977
4.0625 48.4115 1000 5.7159 1.8920 1.4329 1.3107 0.6548 0.3220 0.1980 0.0920
3.7565 58.0938 1200 5.4530 1.7095 1.3997 1.2900 0.6451 0.3159 0.1877 0.0897
3.5758 67.7761 1400 5.4088 1.6897 1.3862 1.2843 0.6413 0.3125 0.1860 0.0880
3.5369 77.4584 1600 5.3933 1.6839 1.3837 1.2815 0.6409 0.3124 0.1856 0.0870
3.51 87.1407 1800 5.3780 1.6781 1.3809 1.2799 0.6378 0.3111 0.1843 0.0865
3.4762 96.8230 2000 5.3754 1.6774 1.3806 1.2795 0.6378 0.3110 0.1844 0.0864

Framework versions

  • Transformers 4.43.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.19.1
Downloads last month
156
Safetensors
Model size
8.15B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for meng-lab/llama_3.1_8b_instruct_paradec_humaneval

Finetuned
(1038)
this model

Dataset used to train meng-lab/llama_3.1_8b_instruct_paradec_humaneval

Collection including meng-lab/llama_3.1_8b_instruct_paradec_humaneval