BSC-LT
/

roberta-base-biomedical-clinical-es

@@ -17,21 +17,6 @@ widget:
 # Biomedical-clinical language model for Spanish
 Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
-## BibTeX  citation
-If you use any of these resources (datasets or models) in your work, please cite our latest paper:
-```bibtex
-@misc{carrino2021biomedical,
-      title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
-      author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
-      year={2021},
-      eprint={2109.03570},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL}
-}
-```
 ## Tokenization and model pretraining
 This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
 **biomedical-clinical** corpus in Spanish collected from several sources (see next section).
@@ -92,6 +77,39 @@ The model is ready-to-use only for masked language modelling to perform the Fill
 However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
 ---
 ## How to use

 # Biomedical-clinical language model for Spanish
 Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
 ## Tokenization and model pretraining
 This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
 **biomedical-clinical** corpus in Spanish collected from several sources (see next section).
 However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
+## Cite
+If you use our models, please cite our latest preprint:
+```bibtex
+@misc{carrino2021biomedical,
+      title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
+      author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
+      year={2021},
+      eprint={2109.03570},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+If you use our Medical Crawler corpus, please cite the preprint:
+```bibtex
+@misc{carrino2021spanish,
+      title={Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models},
+      author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Ona de Gibert Bonet and Asier Gutiérrez-Fandiño and Aitor Gonzalez-Agirre and Martin Krallinger and Marta Villegas},
+      year={2021},
+      eprint={2109.07765},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+---
 ---
 ## How to use