UBC-NLP
/

MARBERTv2

@@ -7,6 +7,7 @@ We find that results with ARBERT and MARBERT on QA are not competitive, a clear
 To rectify this, we further pre-train the stronger model, MARBERT, on the same MSA data as ARBERT in addition to AraNews dataset but with a bigger sequence length of 512 tokens for 40 epochs. We call this
 further pre-trained model **MARBERTv2**, noting it has **29B tokens**. MARBERTv2 acquires best performance on all but one test set, where XLM-RLarge marginally outperforms us (only in F1).

 To rectify this, we further pre-train the stronger model, MARBERT, on the same MSA data as ARBERT in addition to AraNews dataset but with a bigger sequence length of 512 tokens for 40 epochs. We call this
 further pre-trained model **MARBERTv2**, noting it has **29B tokens**. MARBERTv2 acquires best performance on all but one test set, where XLM-RLarge marginally outperforms us (only in F1).
+For more information, please visit our own GitHub [repo](https://github.com/UBC-NLP/marbert).