Visualize in Weights & Biases

Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-humaneval-stage-3

This model is a fine-tuned version of /home/jovyan/workspace/PipeDec/checkpoint/Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-humaneval-stage-2 on the meng-lab/Llama-3.1-8B-Instruct-humaneval dataset. It achieves the following results on the evaluation set:

  • Loss: 11.0822
  • Loss Three Hop Layer 8 Head: 3.3949
  • Loss Three Hop Layer 16 Head: 2.9406
  • Loss Three Hop Layer 24 Head: 2.6311
  • Loss Three Hop Layer 32 Head: 2.4800

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Loss Three Hop Layer 8 Head Loss Three Hop Layer 16 Head Loss Three Hop Layer 24 Head Loss Three Hop Layer 32 Head
17.1204 9.6677 200 17.4024 4.0165 3.6938 5.0117 5.0980
11.7831 19.3353 400 12.4640 3.8214 3.1283 2.8095 3.0563
11.1082 29.0030 600 12.3118 3.4779 3.1955 2.9590 2.9857
10.9205 38.6707 800 11.9277 3.7709 3.0051 2.8893 2.6454
10.1281 48.3384 1000 11.6923 3.4574 2.9719 2.8398 2.7656
9.4147 58.0060 1200 11.2543 3.4058 2.9891 2.6593 2.5635
8.9315 67.6737 1400 11.0952 3.3972 2.9370 2.6327 2.4895
8.9092 77.3414 1600 11.1042 3.4010 2.9454 2.6344 2.4875
8.8371 87.0091 1800 11.0849 3.3957 2.9410 2.6311 2.4803
8.8213 96.6767 2000 11.0822 3.3949 2.9406 2.6311 2.4800

Framework versions

  • Transformers 4.43.2
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
369
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for meng-lab/llama_3.1_8b_instruct_paradec_humaneval_medusa

Finetuned
(874)
this model

Dataset used to train meng-lab/llama_3.1_8b_instruct_paradec_humaneval_medusa