longformer_pos

This model is a fine-tuned version of severinsimmler/xlm-roberta-longformer-base-16384 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6453
Precision: 0.5508
Recall: 0.5803
F1: 0.5651
Accuracy: 0.8941

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.35	50	0.7424	0.0	0.0	0.0	0.7648
No log	2.7	100	0.4849	0.0415	0.0388	0.0401	0.8160
No log	4.05	150	0.3986	0.0902	0.1163	0.1016	0.8418
No log	5.41	200	0.3393	0.1827	0.1880	0.1853	0.8675
No log	6.76	250	0.3370	0.275	0.2132	0.2402	0.8788
No log	8.11	300	0.2937	0.3605	0.5310	0.4295	0.8864
No log	9.46	350	0.2793	0.4088	0.4302	0.4193	0.8997
No log	10.81	400	0.2500	0.4457	0.5969	0.5104	0.9066
No log	12.16	450	0.2894	0.5031	0.6221	0.5563	0.9107
0.3689	13.51	500	0.3678	0.5269	0.5116	0.5192	0.9036
0.3689	14.86	550	0.3156	0.5216	0.6085	0.5617	0.9100
0.3689	16.22	600	0.3824	0.5551	0.5756	0.5652	0.9115
0.3689	17.57	650	0.3347	0.4276	0.4981	0.4602	0.9075
0.3689	18.92	700	0.3705	0.4610	0.6880	0.5521	0.8920
0.3689	20.27	750	0.3276	0.5447	0.6492	0.5924	0.9100
0.3689	21.62	800	0.4603	0.5650	0.5562	0.5605	0.9107
0.3689	22.97	850	0.3142	0.5677	0.6260	0.5954	0.9177
0.3689	24.32	900	0.3887	0.5747	0.6260	0.5993	0.9164
0.3689	25.68	950	0.5906	0.4670	0.6860	0.5557	0.8789
0.0798	27.03	1000	0.5407	0.6218	0.5736	0.5968	0.8989
0.0798	28.38	1050	0.4645	0.5044	0.5504	0.5264	0.9051
0.0798	29.73	1100	0.3217	0.5107	0.6027	0.5529	0.9104
0.0798	31.08	1150	0.4471	0.5523	0.6647	0.6033	0.9055
0.0798	32.43	1200	0.4611	0.5029	0.6725	0.5755	0.8980
0.0798	33.78	1250	0.4495	0.5783	0.6085	0.5930	0.9155
0.0798	35.14	1300	0.5293	0.5727	0.6105	0.5910	0.9128
0.0798	36.49	1350	0.4453	0.5652	0.5795	0.5722	0.9100
0.0798	37.84	1400	0.3912	0.5988	0.5988	0.5988	0.9162
0.0798	39.19	1450	0.3862	0.5917	0.6066	0.5990	0.9182
0.0393	40.54	1500	0.4303	0.5337	0.6744	0.5959	0.9137
0.0393	41.89	1550	0.3846	0.5129	0.6550	0.5753	0.9119
0.0393	43.24	1600	0.5571	0.5735	0.6047	0.5887	0.9124
0.0393	44.59	1650	0.4528	0.5719	0.6395	0.6038	0.9182
0.0393	45.95	1700	0.5202	0.6037	0.6260	0.6147	0.9130
0.0393	47.3	1750	0.5163	0.5743	0.5019	0.5357	0.8990
0.0393	48.65	1800	0.3528	0.5771	0.6531	0.6127	0.9157
0.0393	50.0	1850	0.4441	0.5654	0.6531	0.6061	0.9155
0.0393	51.35	1900	0.4517	0.6262	0.6105	0.6183	0.9151
0.0393	52.7	1950	0.4142	0.5812	0.6105	0.5955	0.9142
0.0315	54.05	2000	0.4539	0.5694	0.6357	0.6007	0.9180
0.0315	55.41	2050	0.4912	0.4107	0.5795	0.4807	0.9097
0.0315	56.76	2100	0.4442	0.5514	0.5194	0.5349	0.9190
0.0315	58.11	2150	0.4871	0.5414	0.6337	0.5839	0.9074
0.0315	59.46	2200	0.6469	0.5937	0.5465	0.5691	0.9072
0.0315	60.81	2250	0.4975	0.6346	0.6395	0.6371	0.9167
0.0315	62.16	2300	0.4800	0.6060	0.6260	0.6158	0.9151
0.0315	63.51	2350	0.5273	0.6047	0.5988	0.6018	0.9137
0.0315	64.86	2400	0.4613	0.5794	0.6221	0.6	0.9145
0.0315	66.22	2450	0.4839	0.5996	0.6298	0.6144	0.9189
0.0287	67.57	2500	0.4725	0.4970	0.6415	0.5601	0.9020
0.0287	68.92	2550	0.5888	0.6614	0.5717	0.6133	0.8999
0.0287	70.27	2600	0.4525	0.6021	0.5601	0.5803	0.9086
0.0287	71.62	2650	0.4416	0.5743	0.6066	0.5900	0.9157
0.0287	72.97	2700	0.4290	0.5084	0.6473	0.5695	0.8974
0.0287	74.32	2750	0.5249	0.5778	0.5543	0.5658	0.9103
0.0287	75.68	2800	0.5481	0.6149	0.5601	0.5862	0.9042

Framework versions

Transformers 4.38.2
Pytorch 2.1.2
Datasets 2.1.0
Tokenizers 0.15.2

DimasikKurd
/

longformer_pos

longformer_pos

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for DimasikKurd/longformer_pos

Evaluation results