meta-llama/Llama-3.2-3B

This model is a fine-tuned version of meta-llama/Llama-3.2-3B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6898

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.7972 0.0275 200 0.8336
0.7579 0.0551 400 0.7996
0.8037 0.0826 600 0.7918
0.7333 0.1101 800 0.7879
0.7871 0.1376 1000 0.7818
0.8135 0.1652 1200 0.7736
0.7612 0.1927 1400 0.7699
0.7421 0.2202 1600 0.7643
0.7451 0.2478 1800 0.7595
0.7388 0.2753 2000 0.7556
0.7707 0.3028 2200 0.7523
0.7063 0.3303 2400 0.7481
0.8091 0.3579 2600 0.7440
0.764 0.3854 2800 0.7407
0.714 0.4129 3000 0.7370
0.6745 0.4405 3200 0.7339
0.6771 0.4680 3400 0.7295
0.7419 0.4955 3600 0.7257
0.71 0.5230 3800 0.7223
0.6362 0.5506 4000 0.7189
0.7616 0.5781 4200 0.7159
0.676 0.6056 4400 0.7126
0.6732 0.6332 4600 0.7094
0.7017 0.6607 4800 0.7067
0.6796 0.6882 5000 0.7038
0.7065 0.7157 5200 0.7012
0.6318 0.7433 5400 0.6987
0.639 0.7708 5600 0.6965
0.7078 0.7983 5800 0.6949
0.7029 0.8258 6000 0.6933
0.6977 0.8534 6200 0.6921
0.6803 0.8809 6400 0.6911
0.703 0.9084 6600 0.6905
0.6819 0.9360 6800 0.6901
0.6327 0.9635 7000 0.6899
0.6685 0.9910 7200 0.6899

Framework versions

  • Transformers 4.48.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.21.0
Downloads last month
9
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for yakazimir/llama_3_2_3B_tulu3

Finetuned
(110)
this model