tfa_output_2025_m02_d02_t23h_28m_54s

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.4602

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	3.1242
3.7239	0.5714	1	3.1242
4.5819	1.5714	2	3.1012
4.5655	2.5714	3	3.0775
4.5318	3.5714	4	3.0655
4.5864	4.5714	5	3.0469
4.4551	5.5714	6	3.0325
4.4845	6.5714	7	3.0158
4.5483	7.5714	8	3.0042
4.4145	8.5714	9	2.9926
4.4484	9.5714	10	2.9827
4.3074	10.5714	11	2.9709
4.3609	11.5714	12	2.9587
4.3821	12.5714	13	2.9485
4.386	13.5714	14	2.9399
4.3846	14.5714	15	2.9299
4.3531	15.5714	16	2.9202
4.3193	16.5714	17	2.9091
4.2898	17.5714	18	2.9001
4.3685	18.5714	19	2.8888
4.232	19.5714	20	2.8802
4.2805	20.5714	21	2.8718
4.275	21.5714	22	2.8589
4.2062	22.5714	23	2.8513
4.1492	23.5714	24	2.8427
4.1998	24.5714	25	2.8323
4.1638	25.5714	26	2.8219
4.1229	26.5714	27	2.8149
4.2027	27.5714	28	2.8057
4.1399	28.5714	29	2.7971
4.1457	29.5714	30	2.7907
4.1507	30.5714	31	2.7815
4.0924	31.5714	32	2.7740
4.1176	32.5714	33	2.7660
4.1109	33.5714	34	2.7583
3.9774	34.5714	35	2.7497
4.0628	35.5714	36	2.7429
4.0824	36.5714	37	2.7344
4.0686	37.5714	38	2.7263
4.0403	38.5714	39	2.7191
4.0444	39.5714	40	2.7140
3.9816	40.5714	41	2.7064
3.9371	41.5714	42	2.6999
3.9101	42.5714	43	2.6939
3.9853	43.5714	44	2.6860
3.9293	44.5714	45	2.6800
3.8705	45.5714	46	2.6748
3.9374	46.5714	47	2.6683
3.8989	47.5714	48	2.6611
3.9209	48.5714	49	2.6557
3.8378	49.5714	50	2.6503
3.9311	50.5714	51	2.6434
3.8503	51.5714	52	2.6379
3.7551	52.5714	53	2.6334
3.757	53.5714	54	2.6291
3.8337	54.5714	55	2.6228
3.8533	55.5714	56	2.6176
3.7737	56.5714	57	2.6125
3.7589	57.5714	58	2.6064
3.7929	58.5714	59	2.6018
3.7802	59.5714	60	2.5972
3.824	60.5714	61	2.5932
3.7761	61.5714	62	2.5883
3.7067	62.5714	63	2.5848
3.7647	63.5714	64	2.5791
3.6702	64.5714	65	2.5760
3.7744	65.5714	66	2.5721
3.7251	66.5714	67	2.5674
3.6592	67.5714	68	2.5618
3.8159	68.5714	69	2.5583
3.6529	69.5714	70	2.5554
3.6874	70.5714	71	2.5510
3.6516	71.5714	72	2.5466
3.5826	72.5714	73	2.5438
3.6663	73.5714	74	2.5397
3.6507	74.5714	75	2.5351
3.591	75.5714	76	2.5343
3.6226	76.5714	77	2.5294
3.5843	77.5714	78	2.5260
3.6361	78.5714	79	2.5216
3.5118	79.5714	80	2.5197
3.6315	80.5714	81	2.5154
3.5687	81.5714	82	2.5112
3.5679	82.5714	83	2.5103
3.4985	83.5714	84	2.5059
3.5778	84.5714	85	2.5034
3.5422	85.5714	86	2.5003
3.6483	86.5714	87	2.4969
3.5949	87.5714	88	2.4933
3.5475	88.5714	89	2.4904
3.5944	89.5714	90	2.4861
3.5698	90.5714	91	2.4841
3.5287	91.5714	92	2.4832
3.5029	92.5714	93	2.4792
3.4956	93.5714	94	2.4758
3.5941	94.5714	95	2.4739
3.4637	95.5714	96	2.4710
3.5336	96.5714	97	2.4683
3.4492	97.5714	98	2.4661
3.4548	98.5714	99	2.4624
3.5259	99.5714	100	2.4602

Framework versions

Transformers 4.48.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

brando
/

tfa_output_2025_m02_d02_t23h_28m_54s

tfa_output_2025_m02_d02_t23h_28m_54s

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for brando/tfa_output_2025_m02_d02_t23h_28m_54s

Evaluation results