train_2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Use OptimizerNames.ADAMW_APEX_FUSED with betas=(0.826646043090655,0.991636944120939) and epsilon=3.4341677539323e-07 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 5000
num_epochs: 200

Training Loss	Epoch	Step	Validation Loss
0.0237	1.0	18731	0.1124
0.0216	2.0	37462	0.1128
0.0201	3.0	56193	0.1153