deepseek-Instruct-8B

This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 4
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
1.9256	0.1144	50	1.8577
1.4949	0.2288	100	0.9622
0.5315	0.3432	150	0.3328
0.3079	0.4577	200	0.3011
0.2974	0.5721	250	0.2960
0.2921	0.6865	300	0.2903
0.2869	0.8009	350	0.2832
0.2757	0.9153	400	0.2731
0.2676	1.0297	450	0.2644
0.2594	1.1442	500	0.2590
0.2546	1.2586	550	0.2535
0.2497	1.3730	600	0.2505
0.2477	1.4874	650	0.2489
0.2462	1.6018	700	0.2463
0.2438	1.7162	750	0.2452
0.2439	1.8307	800	0.2436
0.2434	1.9451	850	0.2426
0.2414	2.0595	900	0.2415
0.2408	2.1739	950	0.2406
0.2374	2.2883	1000	0.2396
0.2388	2.4027	1050	0.2385
0.2357	2.5172	1100	0.2378
0.2358	2.6316	1150	0.2377
0.236	2.7460	1200	0.2371
0.2352	2.8604	1250	0.2361
0.2342	2.9748	1300	0.2357
0.2337	3.0892	1350	0.2352
0.2337	3.2037	1400	0.2346
0.2335	3.3181	1450	0.2343
0.2327	3.4325	1500	0.2337
0.2314	3.5469	1550	0.2337
0.2322	3.6613	1600	0.2334
0.2318	3.7757	1650	0.2330
0.2292	3.8902	1700	0.2329