SmolLM2-1.7B-TemporalQuestions

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-1.7B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0287
  • F1: 0.9837

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 2048
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss F1
0.2142 1.0 112 0.1175 0.9066
0.1218 2.0 224 0.0498 0.9606
0.0219 3.0 336 0.0363 0.9755
0.1146 4.0 448 0.0350 0.9749
0.0062 5.0 560 0.0297 0.9775
0.079 6.0 672 0.0361 0.9767
0.0261 7.0 784 0.0297 0.9818
0.0225 8.0 896 0.0290 0.9804
0.0004 9.0 1008 0.0308 0.9816
0.0105 10.0 1120 0.0333 0.9829
0.0014 11.0 1232 0.0287 0.9837
0.0035 12.0 1344 0.0301 0.9848
0.0035 13.0 1456 0.0290 0.9835
0.0785 14.0 1568 0.0291 0.9846
0.0014 15.0 1680 0.0303 0.9840
0.0003 16.0 1792 0.0356 0.9837
0.0002 17.0 1904 0.0347 0.9862
0.0003 18.0 2016 0.0338 0.9865
0.0002 19.0 2128 0.0335 0.9872
0.0001 20.0 2240 0.0352 0.9878
0.0001 21.0 2352 0.0362 0.9878
0.0001 22.0 2464 0.0378 0.9875
0.0001 23.0 2576 0.0386 0.9874
0.0001 24.0 2688 0.0391 0.9873
0.0001 25.0 2800 0.0394 0.9874
0.0001 26.0 2912 0.0396 0.9874
0.0001 27.0 3024 0.0397 0.9874
0.0001 28.0 3136 0.0397 0.9874
0.0001 29.0 3248 0.0398 0.9874
0.0036 29.7385 3330 0.0398 0.9874

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.21.0
Downloads last month
10
Safetensors
Model size
1.71B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hugosousa/SmolLM2-1.7B-TemporalQuestions

Finetuned
(6)
this model