BSC-LT
/

roberta-base-biomedical-clinical-es

Inference Endpoints

Model card Files Files and versions Community

ccasimiro commited on Sep 17, 2021

Commit

f9ce933

•

1 Parent(s): 52697e8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ The training corpus is composed of several biomedical corpora in Spanish, collec
   - keep the original document boundaries
 Then, the biomedical corpora are concatenated and further global deduplication among the biomedical corpora have been applied.
-Eventually, the clinical corpus is concatenated to the cleaned biomedical corpus resulting in a medium-size biomedical-clinical corpus for Spanish composed of about 963M tokens. The table below shows some basic statistics of the individual cleaned corpora:
 | Name                                                                                    | No. tokens  | Description                                                                                                                                                                                                                                          |

   - keep the original document boundaries
 Then, the biomedical corpora are concatenated and further global deduplication among the biomedical corpora have been applied.
+Eventually, the clinical corpus is concatenated to the cleaned biomedical corpus resulting in a medium-size biomedical-clinical corpus for Spanish composed of about 968M tokens. The table below shows some basic statistics of the individual cleaned corpora:
 | Name                                                                                    | No. tokens  | Description                                                                                                                                                                                                                                          |