1b611539-5973-4f1c-bd53-715cc60672e4

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000204
train_batch_size: 4
eval_batch_size: 4
seed: 40
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 313

Training Loss	Epoch	Step	Validation Loss
No log	0.0032	1	1.0355
0.4726	0.16	50	0.5734
0.4415	0.32	100	0.5618
0.4474	0.48	150	0.5592
0.488	0.64	200	0.5428
0.4632	0.8	250	0.5315
0.4823	0.96	300	0.5291