ccasimiro commited on
Commit
7d19669
1 Parent(s): f9ce933

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -15
README.md CHANGED
@@ -17,21 +17,6 @@ widget:
17
  # Biomedical-clinical language model for Spanish
18
  Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
19
 
20
-
21
- ## BibTeX citation
22
- If you use any of these resources (datasets or models) in your work, please cite our latest paper:
23
-
24
- ```bibtex
25
- @misc{carrino2021biomedical,
26
- title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
27
- author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
28
- year={2021},
29
- eprint={2109.03570},
30
- archivePrefix={arXiv},
31
- primaryClass={cs.CL}
32
- }
33
- ```
34
-
35
  ## Tokenization and model pretraining
36
  This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
37
  **biomedical-clinical** corpus in Spanish collected from several sources (see next section).
@@ -92,6 +77,39 @@ The model is ready-to-use only for masked language modelling to perform the Fill
92
 
93
  However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ---
96
 
97
  ## How to use
 
17
  # Biomedical-clinical language model for Spanish
18
  Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Tokenization and model pretraining
21
  This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
22
  **biomedical-clinical** corpus in Spanish collected from several sources (see next section).
 
77
 
78
  However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
79
 
80
+ ## Cite
81
+ If you use our models, please cite our latest preprint:
82
+
83
+ ```bibtex
84
+
85
+ @misc{carrino2021biomedical,
86
+ title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
87
+ author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
88
+ year={2021},
89
+ eprint={2109.03570},
90
+ archivePrefix={arXiv},
91
+ primaryClass={cs.CL}
92
+ }
93
+
94
+ ```
95
+
96
+ If you use our Medical Crawler corpus, please cite the preprint:
97
+
98
+ ```bibtex
99
+
100
+ @misc{carrino2021spanish,
101
+ title={Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models},
102
+ author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Ona de Gibert Bonet and Asier Gutiérrez-Fandiño and Aitor Gonzalez-Agirre and Martin Krallinger and Marta Villegas},
103
+ year={2021},
104
+ eprint={2109.07765},
105
+ archivePrefix={arXiv},
106
+ primaryClass={cs.CL}
107
+ }
108
+
109
+ ```
110
+
111
+ ---
112
+
113
  ---
114
 
115
  ## How to use