CLTL commited on
Commit
a4a4113
1 Parent(s): 81a66cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -6,14 +6,20 @@ license: mit
6
 
7
  # MedRoBERTa.nl
8
 
9
- # Description
10
- This model is a RoBERTa-based model pre-trained from scratch on Dutch hospital notes sourced from Electronic Health Records. All code used for the creation of MedRoBERTa.nl can be found at https://github.com/cltl-students/verkijk_stella_rma_thesis_dutch_medical_language_model.
11
 
12
- # Intended uses and limitations
13
- The model was trained on Dutch hospital notes from the Amsterdam Medical Centres. It is meant to be used on medical NLP tasks for Dutch.
14
 
15
- # Authors
 
 
 
 
 
 
16
  Stella Verkijk, Piek Vossen
17
 
18
- # Reference
19
  Paper: Verkijk, S. & Vossen, P. (2022) MedRoBERTa.nl: A Language Model for Dutch Electroniz Health Records. Computational Linguistics in the Netherlands Journal, 11.
 
6
 
7
  # MedRoBERTa.nl
8
 
9
+ ## Description
10
+ This model is a RoBERTa-based model pre-trained from scratch on Dutch hospital notes sourced from Electronic Health Records. The model is not fine-tuned. All code used for the creation of MedRoBERTa.nl can be found at https://github.com/cltl-students/verkijk_stella_rma_thesis_dutch_medical_language_model.
11
 
12
+ ## Intended use
13
+ The model can be fine-tuned on any type of task. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.
14
 
15
+ ## Data
16
+ The model was trained on nearly 10 million hospital notes from the Amsterdam University Medical Centres. The training data was anonymized before starting the pre-training procedure.
17
+
18
+ ## Privacy
19
+ By anonymizing the training data we made sure the model did not learn any representative associations linked to names. Apart from the trianing data, the model's vocabulary was also anonymized. This ensures that the model can not predict any names in the generative fill-mask task.
20
+
21
+ ## Authors
22
  Stella Verkijk, Piek Vossen
23
 
24
+ ## Reference
25
  Paper: Verkijk, S. & Vossen, P. (2022) MedRoBERTa.nl: A Language Model for Dutch Electroniz Health Records. Computational Linguistics in the Netherlands Journal, 11.