wikipedia

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100000
  • training_steps: 400000

Training results

Training Loss Epoch Step Validation Loss
8.0346 1.0160 2000 7.2841
7.2752 2.0320 4000 7.2837
7.2619 3.0480 6000 7.2576
7.2048 4.0640 8000 7.1034
6.9764 5.0800 10000 6.7982
6.4942 6.0960 12000 6.2032
5.9863 7.1120 14000 5.8090
5.6312 8.1280 16000 5.5046
5.3378 9.1440 18000 5.2551
5.0847 10.1600 20000 5.0341
4.8615 11.1760 22000 4.8711
4.6664 12.1920 24000 4.6536
4.496 13.2080 26000 4.5433
4.3435 14.2240 28000 4.3970
4.2129 15.2400 30000 4.2968
4.1048 16.2560 32000 4.2358
4.019 17.2720 34000 4.1681
3.9376 18.2880 36000 4.1167
3.8758 19.3040 38000 4.0411
3.8237 20.3200 40000 4.0252
3.782 21.3360 42000 4.0171
3.7474 22.3520 44000 3.9434
3.7152 23.3680 46000 3.9434
3.6944 24.3840 48000 3.9306
3.6642 25.4001 50000 3.9128
3.6556 26.4161 52000 3.8504
3.636 27.4321 54000 3.8669
3.6241 28.4481 56000 3.8512
3.6109 29.4641 58000 3.8412
3.5963 30.4801 60000 3.8352
3.5972 31.4961 62000 3.8418
3.5854 32.5121 64000 3.7850
3.5805 33.5281 66000 3.8185
3.5712 34.5441 68000 3.8206
3.5718 35.5601 70000 3.7920

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
7
Safetensors
Model size
10.7M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for fpadovani/german_wikipedia_42

Quantizations
1 model