kanishka's picture
Model save
533077f verified
|
raw
history blame
2.86 kB
metadata
library_name: transformers
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: >-
      opt-babylm2-rewritten-clean-spacy_random-removal-num-adj-earlystop-bpe_seed-42_1e-3
    results: []

opt-babylm2-rewritten-clean-spacy_random-removal-num-adj-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6849
  • Accuracy: 0.4784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.0965 0.9997 2241 3.8077 0.3617
3.438 1.9997 4482 3.2954 0.4103
3.124 2.9997 6723 3.0855 0.4310
2.964 3.9997 8964 2.9821 0.4414
2.8431 4.9997 11205 2.9217 0.4475
2.7861 5.9997 13446 2.8828 0.4518
2.7474 6.9997 15687 2.8555 0.4548
2.7083 7.9997 17928 2.8363 0.4568
2.6908 8.9997 20169 2.8231 0.4584
2.6704 9.9997 22410 2.8133 0.4595
2.6536 10.9997 24651 2.8057 0.4603
2.6385 11.9997 26892 2.7991 0.4613
2.6417 12.9997 29133 2.7933 0.4621
2.6348 13.9997 31374 2.7883 0.4627
2.6156 14.9997 33615 2.7718 0.4648
2.5708 15.9997 35856 2.7449 0.4683
2.5219 16.9997 38097 2.7226 0.4711
2.4669 17.9997 40338 2.7019 0.4743
2.4023 18.9997 42579 2.6891 0.4769
2.3317 19.9997 44820 2.6849 0.4784

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1
  • Datasets 3.2.0
  • Tokenizers 0.21.0