|
--- |
|
license: apache-2.0 |
|
language: |
|
- es |
|
base_model: |
|
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es |
|
tags: |
|
- medical |
|
--- |
|
|
|
|
|
# **ClinLinker-KB-GP** |
|
|
|
## Model Description |
|
ClinLinker-KB-GP is a state-of-the-art model designed for medical entity linking (MEL) in Spanish, specifically optimized for tasks in the clinical domain. It is based on bi-encoder models enriched with knowledge from medical knowledge graphs like UMLS and SNOMED-CT. This model leverages contrastive learning techniques to enhance the quality of embedding spaces and improve the retrieval of relevant concepts for medical entities mentioned in clinical text. |
|
|
|
The "GP" in ClinLinker-KB-GP stands for **Grand Parents**. In this model, hierarchical relationships were used, including **parent** and **grandparent** terms as positive candidates. This strategy improves the embedding quality by incorporating terms that are conceptually close at different levels in the knowledge graph, enhancing the linking process. |
|
|
|
## Intended Use |
|
- **Domain:** Clinical Natural Language Processing (NLP) for medical entity linking in Spanish. |
|
- **Primary Tasks:** Recognizing and normalizing medical entities such as diseases, symptoms, and procedures from clinical texts and linking them to their corresponding standardized terminologies in SNOMED-CT. |
|
- **Corpora Evaluated:** ClinLinker-KB-GP was tested on several Spanish medical corpora including DisTEMIST (for diseases), MedProcNER (for procedures), and SympTEMIST (for symptoms). It achieved top-tier performance, with top-200 accuracy values of 0.969 in SympTEMIST, 0.943 in MedProcNER, and 0.912 in DisTEMIST. |
|
- **Target Users:** Researchers, healthcare practitioners, and developers working with Spanish medical data for entity recognition and normalization tasks. |
|
|
|
## Performance |
|
ClinLinker-KB-GP achieved the following key results: |
|
- **Top-200 Accuracy:** |
|
- DisTEMIST: 91.2% |
|
- MedProcNER: 94.3% |
|
- SympTEMIST: 96.9% |
|
- **Top-25 Accuracy:** |
|
- The model achieves up to 86.4% accuracy in retrieving the correct concept in the top-25 candidates for disease and procedure normalization tasks. |
|
- **Cross-Encoder Integration:** ClinLinker-KB-GP is particularly effective when used with a cross-encoder for reranking candidate concepts, leading to improved accuracy in zero-shot and few-shot learning scenarios. |
|
|
|
## Technical Details |
|
- **Architecture:** The model is a bi-encoder with contrastive learning, designed to generate embeddings for clinical terms, using the relational structure of medical concepts extracted from the UMLS and SNOMED-CT knowledge bases. |
|
- **Training Strategy:** ClinLinker-KB-GP was trained with a hierarchical relationship structure, incorporating "parent" and "grandparent" nodes from medical knowledge graphs to enhance the embeddings’ quality. The training process also utilizes hard negative mining techniques to optimize candidate retrieval. |
|
|
|
## Usage |
|
Users can utilize our pre-trained model in several ways: |
|
- By using the provided **FaissEncoder** class to perform efficient entity linking with FAISS-based search. |
|
- By training their own Bi-encoder model for medical entity linking using our framework available on GitHub: |
|
[https://github.com/ICB-UMA/ClinLinker-KB](https://github.com/ICB-UMA/ClinLinker-KB) |
|
- Alternatively, users can load the model directly with Hugging Face’s `AutoModel` and `AutoTokenizer` for flexible integration in custom pipelines: |
|
|
|
```python |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP") |
|
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP") |
|
|
|
|
|
## Limitations |
|
- **Language Restriction:** ClinLinker-KB-GP is currently optimized for Spanish clinical corpora. |
|
- **Expert Supervision:** While the model shows high accuracy in entity linking tasks, it is designed to assist semi-automated systems, requiring expert supervision for final validation. |
|
|
|
## Citation |
|
If you use ClinLinker-KB-GP in your research, please cite the following: |
|
```bibtex |
|
@misc{gallego2024clinlinker, |
|
title={ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish}, |
|
author={Fernando Gallego and Guillermo López-García and Luis Gasco-Sánchez and Martin Krallinger and Francisco J. Veredas}, |
|
year={2024}, |
|
eprint={2404.06367}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |