dafrimi's picture
End of training
e1873da verified
metadata
license: bigcode-openrail-m
base_model: bigcode/starcoderbase-7b
tags:
  - generated_from_trainer
model-index:
  - name: starcoderbase7b_2048_context_length_lr_0.0005
    results: []

starcoderbase7b_2048_context_length_lr_0.0005

This model is a fine-tuned version of bigcode/starcoderbase-7b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 30
  • training_steps: 2000

Training results

Training Loss Epoch Step Validation Loss
0.6244 0.0125 25 0.5402
1.0172 0.025 50 1.4486
0.9991 0.0375 75 1.0535
0.715 0.05 100 1.6262
0.6957 0.0625 125 0.6796
0.5182 0.075 150 0.6086
0.497 0.0875 175 0.5938
0.4611 0.1 200 0.6104
0.4046 0.1125 225 0.5857
0.3753 0.125 250 0.6633
0.3517 0.1375 275 0.6479
0.2758 0.15 300 0.5788
0.2928 0.1625 325 0.6429
0.2669 0.175 350 0.5874
0.2608 0.1875 375 0.5497
0.2049 0.2 400 0.6268
0.2006 0.2125 425 0.6265
0.197 0.225 450 0.6236
0.177 0.2375 475 0.6124
0.1774 0.25 500 0.6231
0.1509 0.2625 525 0.5864
0.1389 0.275 550 0.6161
0.8679 0.2875 575 11.4657
6.5575 0.3 600 6.4917
6.0031 0.3125 625 5.5229
5.1391 0.325 650 5.2191
4.4917 0.3375 675 4.6562
3.9199 0.35 700 4.2153
3.855 0.3625 725 4.0902
3.5441 0.375 750 4.0601
3.3835 0.3875 775 3.8844
3.1663 0.4 800 3.8223
2.9285 0.4125 825 3.4541
3.0088 0.425 850 3.5302
2.9083 0.4375 875 3.3347
2.8438 0.45 900 3.3962
2.663 0.4625 925 3.0955
2.5084 0.475 950 3.0454
2.5818 0.4875 975 3.0131
2.4068 0.5 1000 3.0179
2.3994 0.5125 1025 2.8273
2.1942 0.525 1050 2.7333
2.1041 0.5375 1075 2.6163
2.0861 0.55 1100 2.6006
1.9868 0.5625 1125 2.5482
1.9496 0.575 1150 2.6079
1.8099 0.5875 1175 2.3777
1.6454 0.6 1200 2.2547
1.6484 0.6125 1225 2.3254
1.5729 0.625 1250 2.2835
1.5635 0.6375 1275 2.2167
1.3961 0.65 1300 2.2751
1.3495 0.6625 1325 2.1755
1.3524 0.675 1350 2.1377
1.3116 0.6875 1375 2.1407
1.282 0.7 1400 2.0955
1.114 0.7125 1425 2.0334
1.0985 0.725 1450 2.0133
1.1216 0.7375 1475 2.0139
1.0544 0.75 1500 2.0464
1.0221 0.7625 1525 1.9984
0.9368 0.775 1550 2.0069
0.8973 0.7875 1575 1.9595
0.9332 0.8 1600 1.9372
0.9227 0.8125 1625 1.9910
0.8507 0.825 1650 2.0251
0.8242 0.8375 1675 1.9892
0.7571 0.85 1700 2.0327
0.7519 0.8625 1725 1.9949
0.7209 0.875 1750 2.0050
0.7315 0.8875 1775 2.0076
0.77 0.9 1800 2.0315
0.7719 0.9125 1825 2.0241
0.681 0.925 1850 2.0440
0.7371 0.9375 1875 2.0380
0.6823 0.95 1900 2.0392
0.6891 0.9625 1925 2.0563
0.7266 0.975 1950 2.0511
0.6888 0.9875 1975 2.0501
0.6663 1.0 2000 2.0501

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0a0+07cecf4168.nv24.05
  • Datasets 2.20.0
  • Tokenizers 0.19.1