Transformer language model for Croatian and Serbian

Trained on 0.7GB dataset Croatian and Serbian language for one epoch. Dataset from Leipzig Corpora.

Information of dataset

Model #params Arch. Training data
Andrija/SRoBERTa 120M First Leipzig Corpus (0.7 GB of text)

How to use in code

from transformers import AutoTokenizer, AutoModelForMaskedLM
  
tokenizer = AutoTokenizer.from_pretrained("Andrija/SRoBERTa")

model = AutoModelForMaskedLM.from_pretrained("Andrija/SRoBERTa")
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.