Disease mention recognizer for Spanish clinical texts 🦠🔬

This model derives from participation of SINAI team in DISease TExt Mining Shared Task (DISTEMIST). The DISTEMIST-entities subtrack required automatically finding disease mentions in clinical cases. Taking into account the length of clinical texts in the dataset, we opted for a sentence-level NER approach based on fine-tuning of a RoBERTa model pre-trained on Spanish biomedical corpora.

Evaluation and results

Using the biomedical model on EHRs can be considered as cross-domain experiment and the fact that our biomedical system exhibits encouraging results on the NER task highlights the existence of domain transfer potential between biomedical and clinical fields. Table below summarizes the official micro-average scores obtained by this model during the official evaluation. Team standings are available here.

Precision	Recall	F1-score
0.7520	0.7259	0.7387

System description paper and citation

System description paper is published in proceedings of 10th BioASQ Workshop, which will be held as a Lab in CLEF 2022 on September 5-8, 2022:

@inproceedings{ChizhikovaEtAl:CLEF2022,
title = {SINAI at CLEF 2022: Leveraging biomedical transformers to detect and normalize disease mentions},
author = {Mariia Chizhikova and Jaime Collado-Montañéz and Pilar López-Úbeda and Manuel C. Díaz-Galiano and L. Alfonso Ureña-López and M. Teresa Martín-Valdivia},
pages = {265--273},
url = {http://ceur-ws.org/Vol-XXX/#paper-17},
crossref = {CLEF2022}}