opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6880
  • Accuracy: 0.4787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.0959 0.9999 2257 3.8126 0.3613
3.4463 1.9999 4514 3.2972 0.4099
3.1228 2.9998 6771 3.0851 0.4315
2.9166 3.9998 9028 2.9807 0.4418
2.8402 4.9997 11285 2.9249 0.4476
2.7832 5.9997 13542 2.8851 0.4521
2.7377 6.9996 15799 2.8602 0.4546
2.7101 8.0 18057 2.8389 0.4572
2.684 8.9999 20314 2.8260 0.4586
2.6654 9.9999 22571 2.8155 0.4596
2.6466 10.9998 24828 2.8077 0.4604
2.6474 11.9998 27085 2.8025 0.4615
2.6366 12.9997 29342 2.7983 0.4619
2.625 13.9997 31599 2.7928 0.4626
2.6109 14.9996 33856 2.7690 0.4654
2.5658 16.0 36114 2.7445 0.4686
2.5185 16.9999 38371 2.7228 0.4717
2.4637 17.9999 40628 2.7043 0.4747
2.3969 18.9998 42885 2.6895 0.4774
2.3245 19.9989 45140 2.6880 0.4787

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
17
Safetensors
Model size
98.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-earlystop-bpe_seed-42_1e-3

Evaluation results

  • Accuracy on kanishka/babylm2-rewritten-clean-spacy
    self-reported
    0.479