--- license: other base_model: meta-llama/Meta-Llama-3-70B tags: - llama-factory - full - generated_from_trainer model-index: - name: C013_Meta-Llama-3-70B_pretrain_20240508_200642 results: [] --- # C013_Meta-Llama-3-70B_pretrain_20240508_200642 This model is a fine-tuned version of [/mnt/fl/models/llama3/Meta-Llama-3-70B](https://huggingface.co//mnt/fl/models/llama3/Meta-Llama-3-70B) on the C013_data dataset. It achieves the following results on the evaluation set: - Loss: 0.7400 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-06 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 32 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: polynomial - lr_scheduler_warmup_ratio: 0.075 - num_epochs: 4.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.8776 | 0.2090 | 7 | 0.7902 | | 0.8473 | 0.4179 | 14 | 0.7703 | | 0.8293 | 0.6269 | 21 | 0.7603 | | 0.8173 | 0.8358 | 28 | 0.7481 | | 0.7415 | 1.0448 | 35 | 0.7402 | | 0.6794 | 1.2537 | 42 | 0.7419 | | 0.6688 | 1.4627 | 49 | 0.7392 | | 0.6498 | 1.6716 | 56 | 0.7367 | | 0.6701 | 1.8806 | 63 | 0.7358 | | 0.664 | 2.0896 | 70 | 0.7355 | | 0.6447 | 2.2985 | 77 | 0.7361 | | 0.6412 | 2.5075 | 84 | 0.7373 | | 0.6458 | 2.7164 | 91 | 0.7383 | | 0.6356 | 2.9254 | 98 | 0.7387 | | 0.6398 | 3.1343 | 105 | 0.7387 | | 0.6228 | 3.3433 | 112 | 0.7391 | | 0.6139 | 3.5522 | 119 | 0.7395 | | 0.591 | 3.7612 | 126 | 0.7398 | ### Framework versions - Transformers 4.40.2 - Pytorch 2.3.0 - Datasets 2.19.1 - Tokenizers 0.19.1