ccasimiro commited on
Commit
f9ce933
1 Parent(s): 52697e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -50,7 +50,7 @@ The training corpus is composed of several biomedical corpora in Spanish, collec
50
  - keep the original document boundaries
51
 
52
  Then, the biomedical corpora are concatenated and further global deduplication among the biomedical corpora have been applied.
53
- Eventually, the clinical corpus is concatenated to the cleaned biomedical corpus resulting in a medium-size biomedical-clinical corpus for Spanish composed of about 963M tokens. The table below shows some basic statistics of the individual cleaned corpora:
54
 
55
 
56
  | Name | No. tokens | Description |
 
50
  - keep the original document boundaries
51
 
52
  Then, the biomedical corpora are concatenated and further global deduplication among the biomedical corpora have been applied.
53
+ Eventually, the clinical corpus is concatenated to the cleaned biomedical corpus resulting in a medium-size biomedical-clinical corpus for Spanish composed of about 968M tokens. The table below shows some basic statistics of the individual cleaned corpora:
54
 
55
 
56
  | Name | No. tokens | Description |