Merge branch 'main' of https://huggingface.co/flax-community/roberta-hindi into main
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ You can use this model directly with a pipeline for masked language modeling:
|
|
50 |
|
51 |
## Training data
|
52 |
|
53 |
-
The RoBERTa model was pretrained on the reunion of the following datasets:
|
54 |
- [OSCAR](https://huggingface.co/datasets/oscar) is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
|
55 |
- [mC4](https://huggingface.co/datasets/mc4) is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus.
|
56 |
- [IndicGLUE](https://indicnlp.ai4bharat.org/indic-glue/) is a natural language understanding benchmark.
|
|
|
50 |
|
51 |
## Training data
|
52 |
|
53 |
+
The RoBERTa Hindi model was pretrained on the reunion of the following datasets:
|
54 |
- [OSCAR](https://huggingface.co/datasets/oscar) is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
|
55 |
- [mC4](https://huggingface.co/datasets/mc4) is a multilingual colossal, cleaned version of Common Crawl's web crawl corpus.
|
56 |
- [IndicGLUE](https://indicnlp.ai4bharat.org/indic-glue/) is a natural language understanding benchmark.
|