metadata

license: other
base_model: meta-llama/Meta-Llama-3-70B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: C013_Meta-Llama-3-70B_pretrain_20240508_200642
    results: []

C013_Meta-Llama-3-70B_pretrain_20240508_200642

This model is a fine-tuned version of /mnt/fl/models/llama3/Meta-Llama-3-70B on the C013_data dataset. It achieves the following results on the evaluation set:

Loss: 0.7400

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 32
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_ratio: 0.075
num_epochs: 4.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.8776	0.2090	7	0.7902
0.8473	0.4179	14	0.7703
0.8293	0.6269	21	0.7603
0.8173	0.8358	28	0.7481
0.7415	1.0448	35	0.7402
0.6794	1.2537	42	0.7419
0.6688	1.4627	49	0.7392
0.6498	1.6716	56	0.7367
0.6701	1.8806	63	0.7358
0.664	2.0896	70	0.7355
0.6447	2.2985	77	0.7361
0.6412	2.5075	84	0.7373
0.6458	2.7164	91	0.7383
0.6356	2.9254	98	0.7387
0.6398	3.1343	105	0.7387
0.6228	3.3433	112	0.7391
0.6139	3.5522	119	0.7395
0.591	3.7612	126	0.7398

Framework versions

Transformers 4.40.2
Pytorch 2.3.0
Datasets 2.19.1
Tokenizers 0.19.1