ClassCat commited on
Commit
b5252eb
·
1 Parent(s): a834e4e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: la
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - cc100
6
+ widget:
7
+ - text: vita brevis, ars <mask>.
8
+ - text: errare <mask> est.
9
+ - text: usus est magister <mask>.
10
+ ---
11
+
12
+ ## RoBERTa Latin base model Version 2 (Uncased)
13
+
14
+ ### Prerequisites
15
+
16
+ transformers==4.19.2
17
+
18
+ ### Model architecture
19
+
20
+ This model uses RoBERTa base setttings except vocabulary size.
21
+
22
+ ### Tokenizer
23
+
24
+ Using BPE tokenizer with a vocabulary size 50,000.
25
+
26
+ ### Training Data
27
+
28
+ * Subset of [CC-100/la](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
29
+
30
+ ### Usage
31
+
32
+ ```python
33
+ from transformers import pipeline
34
+
35
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-base-latin-v2')
36
+ unmasker("vita brevis, ars <mask>")
37
+ ```