srbNLI: Serbian Natural Language Inference Model

Model Overview

srbNLI is a fine-tuned Natural Language Inference (NLI) model for Serbian, created by adapting the SciFact dataset. The model is based on state-of-the-art transformer architectures. It is trained to recognize relationships between claims and evidence in Serbian text, with applications in scientific claim verification and potential expansion to broader claim verification tasks.

Key Details

Model Type: Transformer-based
Language: Serbian
Task: Natural Language Inference (NLI), Textual Entailment, Claim Verification
Dataset: srbSciFact (automatically translated SciFact dataset)
Fine-tuning: Fine-tuned on Serbian NLI data (support, contradiction, and neutral categories).
Metrics: Accuracy, Precision, Recall, F1-score

Motivation

This model addresses the lack of NLI datasets and models for Serbian, a low-resource language. It provides a tool for textual entailment and claim verification, especially for scientific claims, with broader potential for misinformation detection and automated fact-checking.

Training

Base Models Used: DeBERTa-v3-large
Training Data: Automatically translated SciFact dataset
Fine-tuning: Conducted on a single DGX NVIDIA A100 GPU (40 GB)
Hyperparameters: Optimized learning rate, batch size, weight decay, epochs, and early stopping

Evaluation

The model was evaluated using standard NLI metrics (accuracy, precision, recall, F1-score). It was also compared to the GPT-4o model for generalization capabilities.

Use Cases

Claim Verification: Scientific claims and general domain claims in Serbian
Misinformation Detection: Identifying contradictions or support between claims and evidence
Cross-lingual Applications: Potential for cross-lingual claim verification with multilingual models

Future Work

Improving accuracy with human-corrected translations and Serbian-specific datasets
Expanding to general-domain claim verification
Enhancing multilingual NLI capabilities

Results Comparison

The table below presents a comparison of the fine-tuned models (DeBERTa-v3-large, RoBERTa-large, BERTić, GPT-4o, and others) on the srbSciFact dataset, focusing on key metrics: Accuracy (Acc), Precision (P), Recall (R), and F1-score (F1). The models were evaluated on their ability to classify relationships between claims and evidence in Serbian text.

Model	Accuracy	Precision (P)	Recall (R)	F1-score (F1)
DeBERTa-v3-large	0.70	0.86	0.82	0.84
RoBERTa-large	0.57	0.63	0.76	0.69
BERTić (Serbian)	0.56	0.56	0.37	0.44
GPT-4o (English)	0.66	0.70	0.77	0.78
mDeBERTa-base	0.63	0.92	0.75	0.83
XLM-RoBERTa-large	0.64	0.89	0.77	0.83
mBERT-cased	0.48	0.76	0.50	0.60
mBERT-uncased	0.57	0.45	0.61	0.52

Observations

DeBERTa-v3-large performed the best overall, with an accuracy of 0.70 and an F1-score of 0.84.
RoBERTa-large and BERTić showed lower performance, especially in recall, suggesting challenges in handling complex linguistic inference in Serbian.
GPT-4o outperforms all fine-tuned models in F1-score when the prompt is in English, but the DeBERTa-v3-large model slightly outperforms GPT-4o when the prompt is in Serbian.
mDeBERTa-base and XLM-RoBERTa-large exhibited strong cross-lingual performance, with F1-scores of 0.83 and 0.83, respectively.

This demonstrates the potential of adapting advanced transformer models to Serbian while highlighting areas for future improvement, such as refining translations and expanding domain-specific data.

MilosKosRad
/

ScientificNLIsrb