crodri commited on
Commit
a85e194
·
1 Parent(s): 81380fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -138,7 +138,9 @@ It contains the following tasks and their related datasets:
138
 
139
  **[ViquiQuAD](https://doi.org/10.5281/zenodo.4562344)**: consisting of more than 15,000 questions outsourced from Catalan Wikipedia randomly chosen from a set of 596 articles that were originally written in Catalan.
140
 
141
- **[VilaQuAD](https://doi.org/10.5281/zenodo.4562337)**: contains 6282 pairs of questions and answers, outsourced from 2095 Catalan language articles from VilaWeb newswire text.
 
 
142
 
143
  **[XQuAD](https://doi.org/10.5281/zenodo.4526223)**: the Catalan translation of XQuAD, a multilingual collection of manual translations of 1,190 question-answer pairs from English Wikipedia used only as a _test set_
144
 
@@ -152,6 +154,7 @@ Here are the train/dev/test splits of the datasets:
152
  | TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
153
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
154
  | QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
 
155
 
156
  ### Evaluation Results
157
 
 
138
 
139
  **[ViquiQuAD](https://doi.org/10.5281/zenodo.4562344)**: consisting of more than 15,000 questions outsourced from Catalan Wikipedia randomly chosen from a set of 596 articles that were originally written in Catalan.
140
 
141
+ **[VilaQuAD](https://doi.org/10.5281/zenodo.4562337)**: contains 6,282 pairs of questions and answers, outsourced from 2095 Catalan language articles from VilaWeb newswire text.
142
+
143
+ **[CatalanQA]()**: an aggregation of 2 previous datasets (VilaQuAD and ViquiQuAD), 21,427 pairs of Q/A balanced by type of question, containing one question and one answer per context, although the contexts can repeat multiple times.
144
 
145
  **[XQuAD](https://doi.org/10.5281/zenodo.4526223)**: the Catalan translation of XQuAD, a multilingual collection of manual translations of 1,190 question-answer pairs from English Wikipedia used only as a _test set_
146
 
 
154
  | TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
155
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
156
  | QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
157
+ | QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
158
 
159
  ### Evaluation Results
160