Visualize in Weights & Biases

CodeLlama-34b-Instruct-sft-5e-3-epoch-100-human-eval-final

This model is a fine-tuned version of meta-llama/CodeLlama-34b-Instruct-hf on the meng-lab/CodeLlama-34B-Instruct-humaneval dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7616
  • Loss Layer 6 Head: 1.0709
  • Loss Layer 12 Head: 0.8047
  • Loss Layer 18 Head: 0.7212
  • Loss Layer 24 Head: 0.4396
  • Loss Layer 30 Head: 0.3042
  • Loss Layer 36 Head: 0.2040
  • Loss Layer 42 Head: 0.1346

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Loss Layer 6 Head Loss Layer 12 Head Loss Layer 18 Head Loss Layer 24 Head Loss Layer 30 Head Loss Layer 36 Head Loss Layer 42 Head
3.7028 9.1168 200 4.6352 1.4035 0.8311 1.0429 0.4931 0.4457 0.2349 0.1599
2.736 18.2336 400 4.7219 1.2158 0.8316 0.7490 1.0869 0.3238 0.2666 0.1723
2.0128 27.3504 600 3.8953 1.1598 0.8030 0.7230 0.4451 0.3500 0.2027 0.1459
3.3605 36.4672 800 4.9203 1.1038 0.8175 1.6655 0.4410 0.3091 0.2055 0.1365
2.5177 45.5840 1000 4.2388 1.0907 0.8042 1.1115 0.4403 0.3038 0.2217 0.1412
2.0743 54.7009 1200 3.9221 1.0727 0.8050 0.8689 0.4418 0.3012 0.2044 0.1362
1.8844 63.8177 1400 3.8140 1.0723 0.8028 0.7729 0.4389 0.3045 0.2036 0.1350
1.8019 72.9345 1600 3.7777 1.0726 0.8038 0.7376 0.4401 0.3042 0.2032 0.1345
1.7339 82.0513 1800 3.7662 1.0703 0.8056 0.7246 0.4394 0.3041 0.2042 0.1347
1.6981 91.1681 2000 3.7616 1.0709 0.8047 0.7212 0.4396 0.3042 0.2040 0.1346

Framework versions

  • Transformers 4.43.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
27
Safetensors
Model size
34.2B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for meng-lab/codellama_34b_instruct_paradec_humaneval

Finetuned
(3)
this model

Dataset used to train meng-lab/codellama_34b_instruct_paradec_humaneval

Collection including meng-lab/codellama_34b_instruct_paradec_humaneval