metadata

license: other
base_model: meta-llama/Meta-Llama-3-8B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: C016_random_sample_llama3-8b-base_pretrain_20240504_181744
    results: []

C016_random_sample_llama3-8b-base_pretrain_20240504_181744

This model is a fine-tuned version of /data/pro-align/progressalign/shared_storage/downloaded_models/llama3-8b-base on the C016_random_sample_data dataset. It achieves the following results on the evaluation set:

Loss: 2.4196

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 20
num_epochs: 4.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
2.5472	0.1947	200	2.5262
2.4431	0.3895	400	2.4733
2.4163	0.5842	600	2.4443
2.4462	0.7790	800	2.4281
2.4353	0.9737	1000	2.4196
2.2111	1.1685	1200	2.4290
2.2503	1.3632	1400	2.4281
2.258	1.5579	1600	2.4271
2.254	1.7527	1800	2.4266
2.2508	1.9474	2000	2.4266
2.2112	2.1422	2200	2.4287
2.2063	2.3369	2400	2.4293
2.2544	2.5316	2600	2.4291
2.2024	2.7264	2800	2.4289
2.2074	2.9211	3000	2.4288
2.2268	3.1159	3200	2.4297
2.1556	3.3106	3400	2.4294
2.1953	3.5054	3600	2.4296
2.2002	3.7001	3800	2.4294
2.2437	3.8948	4000	2.4291

Framework versions

Transformers 4.40.1
Pytorch 2.3.0
Datasets 2.19.0
Tokenizers 0.19.1