baukearends commited on
Commit
cd80e53
1 Parent(s): b10be2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -24
README.md CHANGED
@@ -1,45 +1,91 @@
1
  ---
2
  tags:
3
  - spacy
 
 
4
  language:
5
  - nl
6
  license: cc-by-sa-4.0
7
  model-index:
8
- - name: nl_Echocardiogram_SpanCategorizer_lv_syst_func
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
- Package to classify spans for the presence and severity of left ventricular systolic dysfunction in Dutch echocardiogram reports.
12
 
13
- | Feature | Description |
14
- | --- | --- |
15
- | **Name** | `nl_Echocardiogram_SpanCategorizer_lv_syst_func` |
16
- | **Version** | `1.0.0` |
17
- | **spaCy** | `>=3.7.4,<3.8.0` |
18
- | **Default Pipeline** | `tok2vec`, `spancat` |
19
- | **Components** | `tok2vec`, `spancat` |
20
- | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
21
- | **Sources** | n/a |
22
- | **License** | `cc-ny-sa-4.0` |
23
- | **Author** | [Bauke Arends]() |
 
 
 
 
 
24
 
25
- ### Label Scheme
26
 
27
  <details>
28
 
29
- <summary>View label scheme (7 labels for 1 components)</summary>
30
 
31
  | Component | Labels |
32
  | --- | --- |
33
- | **`spancat`** | `lv_sys_func_normal`, `lv_sys_func_mild`, `lv_sys_func_moderate`, `lv_sys_func_severe`, `lv_sys_func_unchanged`, `lv_sys_func_unknown`, `lv_sys_func_improved` |
34
 
35
  </details>
36
 
37
- ### Accuracy
38
 
39
- | Type | Score |
 
 
 
 
 
 
40
  | --- | --- |
41
- | `SPANS_SC_F` | 76.99 |
42
- | `SPANS_SC_P` | 79.16 |
43
- | `SPANS_SC_R` | 74.93 |
44
- | `TOK2VEC_LOSS` | 1264.95 |
45
- | `SPANCAT_LOSS` | 154542.04 |
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - spacy
4
+ - arxiv:2408.06930
5
+ - medical
6
  language:
7
  - nl
8
  license: cc-by-sa-4.0
9
  model-index:
10
+ - name: Echocardiogram_SpanCategorizer_lv_syst_func
11
+ results:
12
+ - task:
13
+ type: token-classification
14
+ dataset:
15
+ type: test
16
+ name: "internal test set"
17
+ metrics:
18
+ - name: "Weighted f1"
19
+ type: f1
20
+ value: 0.770
21
+ verified: false
22
+ - name: "Weighted precision"
23
+ type: precision
24
+ value: 0.792
25
+ verified: false
26
+ - name: "Weighted recall"
27
+ type: recall
28
+ value: 0.749
29
+ verified: false
30
+
31
+ pipeline_tag: token-classification
32
+ metrics:
33
+ - f1
34
+ - precision
35
+ - recall
36
  ---
 
37
 
38
+ # Description
39
+ This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.
40
+
41
+ # Minimum working example
42
+ ```python
43
+ !pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-lv-syst-func/resolve/main/nl_Echocardiogram_SpanCategorizer_lv_syst_func-any-py3-none-any.whl
44
+ ```
45
+ ```python
46
+ import spacy
47
+ nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_lv_syst_func")
48
+ ```
49
+ ```python
50
+ prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.")
51
+ for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
52
+ print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")
53
+ ```
54
 
55
+ # Label Scheme
56
 
57
  <details>
58
 
59
+ <summary>View label scheme (4 labels for 1 components)</summary>
60
 
61
  | Component | Labels |
62
  | --- | --- |
63
+ | **`spancat`** | `lv_sys_func_normal`, `lv_sys_func_mild`, `lv_sys_func_moderate`, `lv_sys_func_severe` |
64
 
65
  </details>
66
 
 
67
 
68
+ # Intended use
69
+ The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.
70
+
71
+ # Data
72
+ The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.
73
+
74
+ | Feature | Description |
75
  | --- | --- |
76
+ | **Name** | `Echocardiogram_SpanCategorizer_lv_syst_func` |
77
+ | **Version** | `1.0.0` |
78
+ | **spaCy** | `>=3.7.4,<3.8.0` |
79
+ | **Default Pipeline** | `tok2vec`, `spancat` |
80
+ | **Components** | `tok2vec`, `spancat` |
81
+ | **License** | `cc-by-sa-4.0` |
82
+ | **Author** | [Bauke Arends]() |
83
+
84
+ # Contact
85
+ If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues
86
+
87
+ # Usage
88
+ If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930
89
+
90
+ # References
91
+ Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930