RoBERTa Latin base model Version 2 (Uncased)

Prerequisites

transformers==4.19.2

Model architecture

This model uses RoBERTa base setttings except vocabulary size.

Tokenizer

Using BPE tokenizer with a vocabulary size 50,000.

Training Data

  • Subset of CC-100/la : Monolingual Datasets from Web Crawl Data

Usage

from transformers import pipeline

unmasker = pipeline('fill-mask', model='ClassCat/roberta-base-latin-v2')
unmasker("vita brevis, ars <mask>")
Downloads last month
152
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ClassCat/roberta-base-latin-v2

Finetunes
1 model

Dataset used to train ClassCat/roberta-base-latin-v2