library_name: transformers
language:
- de
base_model:
- deepset/gbert-base
pipeline_tag: token-classification
Model Card for Model ID
We fine-tuned our base model for 71 epochs on the Ca dataset, epoch 62 showed the best macro average f1 score on the evaluation dataset.
Metrics
seqeval entity-wise in evaulate
eval_AVGf1 0.7889642398534424
eval_DIAGNOSIS.f1 0.7870941224825319
eval_DIAGNOSIS.precision 0.760222310440651
eval_DIAGNOSIS.recall 0.815935236472092
eval_DIAGNOSTIC.f1 0.7870518994114499
eval_DIAGNOSTIC.precision 0.7433046993431026
eval_DIAGNOSTIC.recall 0.8362706083001705
eval_DRUG.f1 0.9196581196581196
eval_DRUG.precision 0.8951747088186356
eval_DRUG.recall 0.945518453427065
eval_MEDICAL_FINDING.f1 0.7699975080986794
eval_MEDICAL_FINDING.precision 0.7438613384689456
eval_MEDICAL_FINDING.recall 0.7980371900826446
eval_THERAPY.f1 0.6810195496164316
eval_THERAPY.precision 0.64
eval_THERAPY.recall 0.7276573241671074
eval_accuracy 0.9332097564796261
eval_f1 0.7744305184135064
eval_loss 0.5050501823425293
eval_precision 0.7437801708132195
eval_recall 0.8077155722830835
eval_runtime 50.3125
eval_samples_per_second 162.624
eval_steps_per_second 20.333
test_AVGf1 0.7491200818619402
test_DIAGNOSIS.f1 0.703534151254349
test_DIAGNOSIS.precision 0.7192062897791089
test_DIAGNOSIS.recall 0.6885304659498208
test_DIAGNOSTIC.f1 0.7718579234972678
test_DIAGNOSTIC.precision 0.7573726541554959
test_DIAGNOSTIC.recall 0.786908077994429
test_DRUG.f1 0.9024472008045592
test_DRUG.precision 0.878016960208741
test_DRUG.recall 0.9282758620689655
test_MEDICAL_FINDING.f1 0.7280362842264404
test_MEDICAL_FINDING.precision 0.6848203939745076
test_MEDICAL_FINDING.recall 0.7770738704279225
test_THERAPY.f1 0.639724849527085
test_THERAPY.precision 0.6100861008610086
test_THERAPY.recall 0.6723904202440126
test_accuracy 0.9229989726085077
test_f1 0.7327920332701502
test_loss 0.6381183862686157
test_precision 0.7048546859693045
test_recall 0.7630354091792847
test_runtime 58.5022
test_samples_per_second 162.199
test_steps_per_second 20.29