Update README.md
Browse files
README.md
CHANGED
@@ -17,21 +17,6 @@ widget:
|
|
17 |
# Biomedical-clinical language model for Spanish
|
18 |
Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
|
19 |
|
20 |
-
|
21 |
-
## BibTeX citation
|
22 |
-
If you use any of these resources (datasets or models) in your work, please cite our latest paper:
|
23 |
-
|
24 |
-
```bibtex
|
25 |
-
@misc{carrino2021biomedical,
|
26 |
-
title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
|
27 |
-
author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
|
28 |
-
year={2021},
|
29 |
-
eprint={2109.03570},
|
30 |
-
archivePrefix={arXiv},
|
31 |
-
primaryClass={cs.CL}
|
32 |
-
}
|
33 |
-
```
|
34 |
-
|
35 |
## Tokenization and model pretraining
|
36 |
This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
|
37 |
**biomedical-clinical** corpus in Spanish collected from several sources (see next section).
|
@@ -92,6 +77,39 @@ The model is ready-to-use only for masked language modelling to perform the Fill
|
|
92 |
|
93 |
However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
---
|
96 |
|
97 |
## How to use
|
|
|
17 |
# Biomedical-clinical language model for Spanish
|
18 |
Biomedical-clinical pretrained language model for Spanish. For more details about the corpus, the pretraining and the evaluation, read the paper "_Carrino, C. P., Armengol-Estapé, J., Gutiérrez-Fandiño, A., Llop-Palao, J., Pàmies, M., Gonzalez-Agirre, A., & Villegas, M. (2021). Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario._"
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
## Tokenization and model pretraining
|
21 |
This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model trained on a
|
22 |
**biomedical-clinical** corpus in Spanish collected from several sources (see next section).
|
|
|
77 |
|
78 |
However, the is intended to be fine-tuned on downstream tasks such as Named Entity Recognition or Text Classification.
|
79 |
|
80 |
+
## Cite
|
81 |
+
If you use our models, please cite our latest preprint:
|
82 |
+
|
83 |
+
```bibtex
|
84 |
+
|
85 |
+
@misc{carrino2021biomedical,
|
86 |
+
title={Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario},
|
87 |
+
author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Asier Gutiérrez-Fandiño and Joan Llop-Palao and Marc Pàmies and Aitor Gonzalez-Agirre and Marta Villegas},
|
88 |
+
year={2021},
|
89 |
+
eprint={2109.03570},
|
90 |
+
archivePrefix={arXiv},
|
91 |
+
primaryClass={cs.CL}
|
92 |
+
}
|
93 |
+
|
94 |
+
```
|
95 |
+
|
96 |
+
If you use our Medical Crawler corpus, please cite the preprint:
|
97 |
+
|
98 |
+
```bibtex
|
99 |
+
|
100 |
+
@misc{carrino2021spanish,
|
101 |
+
title={Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models},
|
102 |
+
author={Casimiro Pio Carrino and Jordi Armengol-Estapé and Ona de Gibert Bonet and Asier Gutiérrez-Fandiño and Aitor Gonzalez-Agirre and Martin Krallinger and Marta Villegas},
|
103 |
+
year={2021},
|
104 |
+
eprint={2109.07765},
|
105 |
+
archivePrefix={arXiv},
|
106 |
+
primaryClass={cs.CL}
|
107 |
+
}
|
108 |
+
|
109 |
+
```
|
110 |
+
|
111 |
+
---
|
112 |
+
|
113 |
---
|
114 |
|
115 |
## How to use
|