CLTL
/

MedRoBERTa.nl

Inference Endpoints

Model card Files Files and versions Community

CLTL commited on Jan 19, 2022

Commit

a4a4113

•

1 Parent(s): 81a66cc

Update README.md

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -6,14 +6,20 @@ license: mit
 # MedRoBERTa.nl
-# Description
-This model is a RoBERTa-based model pre-trained from scratch on Dutch hospital notes sourced from Electronic Health Records. All code used for the creation of MedRoBERTa.nl can be found at https://github.com/cltl-students/verkijk_stella_rma_thesis_dutch_medical_language_model.
-# Intended uses and limitations
-The model was trained on Dutch hospital notes from the Amsterdam Medical Centres. It is meant to be used on medical NLP tasks for Dutch.
-# Authors
 Stella Verkijk, Piek Vossen
-# Reference
 Paper: Verkijk, S. & Vossen, P. (2022) MedRoBERTa.nl: A Language Model for Dutch Electroniz Health Records. Computational Linguistics in the Netherlands Journal, 11.

 # MedRoBERTa.nl
+## Description
+This model is a RoBERTa-based model pre-trained from scratch on Dutch hospital notes sourced from Electronic Health Records. The model is not fine-tuned. All code used for the creation of MedRoBERTa.nl can be found at https://github.com/cltl-students/verkijk_stella_rma_thesis_dutch_medical_language_model.
+## Intended use
+The model can be fine-tuned on any type of task. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.
+## Data
+The model was trained on nearly 10 million hospital notes from the Amsterdam University Medical Centres. The training data was anonymized before starting the pre-training procedure.
+## Privacy
+By anonymizing the training data we made sure the model did not learn any representative associations linked to names. Apart from the trianing data, the model's vocabulary was also anonymized. This ensures that the model can not predict any names in the generative fill-mask task.
+## Authors
 Stella Verkijk, Piek Vossen
+## Reference
 Paper: Verkijk, S. & Vossen, P. (2022) MedRoBERTa.nl: A Language Model for Dutch Electroniz Health Records. Computational Linguistics in the Netherlands Journal, 11.