BERTić-COMtext-SR-legal-NER-ijekavica

BERTić-COMtext-SR-legal-NER-ijekavica is a variant of the BERTić model, fine-tuned on the task of named entity recognition in Serbian legal texts written in the Ijekavian pronunciation. The model was fine-tuned for 20 epochs on the Ijekavian variant of the COMtext.SR.legal dataset.

Benchmarking

This model was evaluated on the task of named entity recognition in Serbian legal texts. The model uses a newly developed named entity schema consisting of 21 entity types, tailored for the domain of Serbian legal texts, and encoded according the the IOB2 standard. The full entity list is available on the COMtext.SR GitHub repository.

This model was compared with SrBERTa, a model specially trained on Serbian legal texts, fine-tuned for 20 epochs for named entity recognition using the Ijekavian variant of the COMtext.SR.legal corpus of legal texts. Token-level accuracy and F1 (macro-averaged and per-class) were used as evaluation metrics and gold tokenized text was taken as input.

Two evaluation settings for both models were considered:

  • Default - only the entity type portion of the NE tag is considered, effectively ignoring the "B-" and "I-" prefixes
  • Strict - the entire NE tag is considered

For the strict setting, per-class results are given separately for each B-CLASS and I-CLASS tag. In addition, macro-averaged F1 scores are presented in two variants - one where the O (outside) class is ignored, and another where it is treated equally to other named entity classes.

BERTić-COMtext-SR-legal-NER-ijekavica and SrBERTa were fine-tuned and evaluated on the COMtext.SR.legal.ijekavica corpus using 10-fold CV.

The code and data to run these experiments is available on the COMtext.SR GitHub repository.

Results

Metrics BERTić-COMtext-SR-legal-NER-ijekavica (default) BERTić-COMtext-SR-legal-NER-ijekavica (strict) SrBERTa (default) SrBERTa (strict)
Accuracy 0.9839 0.9828 0.9688 0.9672
Macro F1 (with O) 0.8563 0.8474 0.7479 0.7225
Macro F1 (without O) 0.8403 0.8396 0.7328 0.7128
Per-class F1
PER 0.9856 0.9780 / 0.9765 0.8720 0.8177 / 0.9068
LOC 0.8933 0.9003 / 0.8134 0.6670 0.6666 / 0.5995
ADR 0.9253 0.9132 / 0.9161 0.8554 0.7806 / 0.8393
COURT 0.9427 0.9515 / 0.9340 0.8488 0.8417 / 0.8524
INST 0.8044 0.8152 / 0.8261 0.6793 0.6376 / 0.6420
COM 0.7225 0.7326 / 0.6782 0.4815 0.3632 / 0.4767
OTHORG 0.4670 0.3436 / 0.6080 0.2557 0.0609 / 0.3664
LAW 0.9523 0.9463 / 0.9511 0.9147 0.8868 / 0.9128
REF 0.8125 0.7602 / 0.7939 0.7564 0.6246 / 0.7485
IDPER 1.0000 1.0000 / N/A 1.0000 1.0000 / N/A
IDCOM 0.9722 0.9722 / N/A 0.9667 0.9667 / N/A
IDTAX 1.0000 1.0000 / N/A 0.9815 0.9815 / N/A
NUMACC 1.0000 1.0000 / N/A 0.6667 0.6667 / N/A
NUMDOC 0.8148 0.8148 / N/A 0.3333 0.3333 / N/A
NUMCAR 0.6222 0.5397 / 0.5000 0.4545 0.5000 / 0.0000
NUMPLOT 0.7088 0.7088 / N/A 0.5479 0.5479 / N/A
IDOTH 0.5949 0.5949 / N/A 0.4776 0.4776 / N/A
CONTACT 0.8000 0.8000 / N/A 0.0000 0.0000 / N/A
DATE 0.9664 0.9378 / 0.9615 0.9547 0.9104 / 0.9480
MONEY 0.9741 0.9613 / 0.9715 0.8825 0.8854 / 0.8851
MISC 0.4183 0.4213 / 0.3874 0.1814 0.1492 / 0.1694
O 0.9942 0.9942 0.9872 0.9872
Downloads last month
105
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ICEF-NLP/bcms-bertic-comtext-sr-legal-ner-ijekavica

Finetuned
(5)
this model