Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-human-eval-final

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the meng-lab/Llama-3.1-8B-Instruct-humaneval dataset. It achieves the following results on the evaluation set:

Loss: 5.3754
Loss Layer 4 Head: 1.6774
Loss Layer 8 Head: 1.3806
Loss Layer 12 Head: 1.2795
Loss Layer 16 Head: 0.6378
Loss Layer 20 Head: 0.3110
Loss Layer 24 Head: 0.1844
Loss Layer 28 Head: 0.0864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.005
train_batch_size: 1
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 32
total_train_batch_size: 128
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Loss Layer 4 Head	Loss Layer 8 Head	Loss Layer 12 Head	Loss Layer 16 Head	Loss Layer 20 Head	Loss Layer 24 Head	Loss Layer 28 Head
7.7477	9.6823	200	7.6952	1.9941	1.7442	1.9609	1.0923	0.4414	0.2459	0.4381
5.8078	19.3646	400	6.4289	1.9090	1.5288	1.4099	0.9812	0.3976	0.2383	0.1448
4.8435	29.0469	600	5.9964	1.8480	1.5236	1.3836	0.6737	0.3976	0.2537	0.1092
4.6084	38.7292	800	6.0069	1.8460	1.7121	1.3111	0.6743	0.3436	0.2146	0.0977
4.0625	48.4115	1000	5.7159	1.8920	1.4329	1.3107	0.6548	0.3220	0.1980	0.0920
3.7565	58.0938	1200	5.4530	1.7095	1.3997	1.2900	0.6451	0.3159	0.1877	0.0897
3.5758	67.7761	1400	5.4088	1.6897	1.3862	1.2843	0.6413	0.3125	0.1860	0.0880
3.5369	77.4584	1600	5.3933	1.6839	1.3837	1.2815	0.6409	0.3124	0.1856	0.0870
3.51	87.1407	1800	5.3780	1.6781	1.3809	1.2799	0.6378	0.3111	0.1843	0.0865
3.4762	96.8230	2000	5.3754	1.6774	1.3806	1.2795	0.6378	0.3110	0.1844	0.0864

Framework versions

Transformers 4.43.2
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.19.1

meng-lab
/

llama_3.1_8b_instruct_paradec_humaneval

Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-human-eval-final

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for meng-lab/llama_3.1_8b_instruct_paradec_humaneval

Dataset used to train meng-lab/llama_3.1_8b_instruct_paradec_humaneval

Collection including meng-lab/llama_3.1_8b_instruct_paradec_humaneval

AdaDecode

Evaluation results