PyTorch
Spanish
roberta
medical

ClinLinker-KB-GP

Model Description

ClinLinker-KB-GP is a state-of-the-art model designed for medical entity linking (MEL) in Spanish, specifically optimized for tasks in the clinical domain. It is based on bi-encoder models enriched with knowledge from medical knowledge graphs like UMLS and SNOMED-CT. This model leverages contrastive learning techniques to enhance the quality of embedding spaces and improve the retrieval of relevant concepts for medical entities mentioned in clinical text.

The "GP" in ClinLinker-KB-GP stands for Grand Parents. In this model, hierarchical relationships were used, including parent and grandparent terms as positive candidates. This strategy improves the embedding quality by incorporating terms that are conceptually close at different levels in the knowledge graph, enhancing the linking process.

Intended Use

  • Domain: Clinical Natural Language Processing (NLP) for medical entity linking in Spanish.
  • Primary Tasks: Recognizing and normalizing medical entities such as diseases, symptoms, and procedures from clinical texts and linking them to their corresponding standardized terminologies in SNOMED-CT.
  • Corpora Evaluated: ClinLinker-KB-GP was tested on several Spanish medical corpora including DisTEMIST (for diseases), MedProcNER (for procedures), and SympTEMIST (for symptoms). It achieved top-tier performance, with top-200 accuracy values of 0.969 in SympTEMIST, 0.943 in MedProcNER, and 0.912 in DisTEMIST.
  • Target Users: Researchers, healthcare practitioners, and developers working with Spanish medical data for entity recognition and normalization tasks.

Performance

ClinLinker-KB-GP achieved the following key results:

  • Top-200 Accuracy:
    • DisTEMIST: 91.2%
    • MedProcNER: 94.3%
    • SympTEMIST: 96.9%
  • Top-25 Accuracy:
    • The model achieves up to 86.4% accuracy in retrieving the correct concept in the top-25 candidates for disease and procedure normalization tasks.
  • Cross-Encoder Integration: ClinLinker-KB-GP is particularly effective when used with a cross-encoder for reranking candidate concepts, leading to improved accuracy in zero-shot and few-shot learning scenarios.

Technical Details

  • Architecture: The model is a bi-encoder with contrastive learning, designed to generate embeddings for clinical terms, using the relational structure of medical concepts extracted from the UMLS and SNOMED-CT knowledge bases.
  • Training Strategy: ClinLinker-KB-GP was trained with a hierarchical relationship structure, incorporating "parent" and "grandparent" nodes from medical knowledge graphs to enhance the embeddings’ quality. The training process also utilizes hard negative mining techniques to optimize candidate retrieval.

Usage

Users can utilize our pre-trained model in several ways:

  • By using the provided FaissEncoder class to perform efficient entity linking with FAISS-based search.

  • By training their own Bi-encoder model for medical entity linking using our framework available on GitHub:
    https://github.com/ICB-UMA/ClinLinker-KB

  • Alternatively, users can load the model directly with Hugging Face’s AutoModel and AutoTokenizer for flexible integration in custom pipelines:

    from transformers import AutoModel, AutoTokenizer
    
    model = AutoModel.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
    tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/ClinLinker-KB-GP")
    

Limitations

  • Language Restriction: ClinLinker-KB-GP is currently optimized for Spanish clinical corpora.
  • Expert Supervision: While the model shows high accuracy in entity linking tasks, it is designed to assist semi-automated systems, requiring expert supervision for final validation.

Citation

If you use ClinLinker-KB-GP in your research, please cite the following:

@misc{gallego2024clinlinker,
     title={ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish}, 
     author={Fernando Gallego and Guillermo López-García and Luis Gasco-Sánchez and Martin Krallinger and Francisco J. Veredas},
     year={2024},
     eprint={2404.06367},
     archivePrefix={arXiv},
     primaryClass={cs.CL}
}
Downloads last month
2
Inference API
Unable to determine this model's library. Check the docs .

Model tree for ICB-UMA/ClinLinker-KB-GP

Finetuned
(2)
this model