RoBERTa-small-bulgarian
The RoBERTa model was originally introduced in this paper. This is a smaller version of RoBERTa-base-bulgarian with only 6 hidden layers, but similar performance.
Intended uses
This model can be used for cloze tasks (masked language modeling) or finetuned on other tasks in Bulgarian.
Limitations and bias
The training data is unfiltered text from the internet and may contain all sorts of biases.
Training data
This model was trained on the following data:
- bg_dedup from OSCAR
- Newscrawl 1 million sentences 2017 from Leipzig Corpora Collection
- Wikipedia 1 million sentences 2016 from Leipzig Corpora Collection
Training procedure
The model was pretrained using a masked language-modeling objective with dynamic masking as described here
It was trained for 160k steps. The batch size was limited to 8 due to GPU memory limitations.
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.