--- tags: - spacy - arxiv:2408.06930 - medical language: - nl license: cc-by-sa-4.0 model-index: - name: Echocardiogram_SpanCategorizer_tricuspid_regurgitation results: - task: type: token-classification dataset: type: test name: "internal test set" metrics: - name: "Weighted f1" type: f1 value: 0.905 verified: false - name: "Weighted precision" type: precision value: 0.930 verified: false - name: "Weighted recall" type: recall value: 0.881 verified: false pipeline_tag: token-classification metrics: - f1 - precision - recall --- # Description This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler. # Minimum working example ```python !pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-tricuspid-regurgitation/resolve/main/nl_Echocardiogram_SpanCategorizer_tricuspid_regurgitation-any-py3-none-any.whl ``` ```python import spacy nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_tricuspid_regurgitation") ``` ```python prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe T.I.") for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']): print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}") ``` # Label Scheme
View label scheme (4 labels for 1 components) | Component | Labels | | --- | --- | | **`spancat`** | `tricuspid_valve_native_regurgitation_not_present`, `tricuspid_valve_native_regurgitation_mild`, `tricuspid_valve_native_regurgitation_moderate`, `tricuspid_valve_native_regurgitation_severe` |
# Intended use The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch. # Data The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure. | Feature | Description | | --- | --- | | **Name** | `Echocardiogram_SpanCategorizer_tricuspid_regurgitation` | | **Version** | `1.0.0` | | **spaCy** | `>=3.7.4,<3.8.0` | | **Default Pipeline** | `tok2vec`, `spancat` | | **Components** | `tok2vec`, `spancat` | | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | | **Sources** | n/a | | **License** | `cc-by-sa-4.0` | | **Author** | [Bauke Arends]() | # Contact If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues # Usage If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930 # References Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930