BERTislav / README.md
npedrazzini's picture
Update README.md
7240761 verified
---
license: apache-2.0
metrics:
- perplexity
pipeline_tag: fill-mask
language:
- cu
- orv
- chu
tags:
- roberta-based
- old church slavonic
- old east slavic
- old russian
- middle russian
- early slavic
widget:
- text: >-
моли непрестанно о всѣхъ [MASK], честную память твою присно въ пѣснехъ
почитающихъ
example_title: Example 1
- text: да испишеть имѧна ваша. [MASK] возмуть мѣсѧчное свое съли слебное
example_title: Example 2
---
# BERTislav
Baseline fill-mask model based on ruBERT and fine-tuned on a 10M-word corpus of mixed Old Church Slavonic, (Later) Church Slavonic, Old East Slavic, Middle Russian, and Medieval Serbian texts.
# Overview
- **Model Name:** BERTislav
- **Task**: Fill-mask
- **Base Model:** [ai-forever/ruBert-base](https://huggingface.co/ai-forever/ruBert-base)
- **Languages:** orv (Old East Slavic, Middle Russian), cu (Old Church Slavonic, Church Slavonic)
- **Developed by:** [Nilo Pedrazzini](https://huggingface.co/npedrazzini)
# Input Format
A `str`-type input with [MASK]ed tokens.
# Output Format
The predicted token, with the confidence score for each labels.
# Examples
### Example 1:
COMING SOON
# Uses
The model can be used as a baseline model for further finetuning to perform specific downstream tasks (e.g. linguistic annotation).
# Bias, Risks, and Limitations
The model should only be considered a baseline, and should **not** be evaluated on its own.
Testing is needed regarding its usefulness to improve the performance of language models finetuned for specific tasks.
# Training Details
The texts used as training data are from the following sources:
- [Fundamental Digital Library Russian Literature & Folklore](https://feb-web.ru/indexen.htm) (FEB-web)
- Puškinskij Dom's [*Библиотека литературы Древней Руси*](http://lib.pushkinskijdom.ru/Default.aspx?tabid=2070)
- [Cyrillomethodiana](https://histdict.uni-sofia.bg/)
- Parts of the Bdinski Sbornik, as digitized in [Obdurodon](http://bdinski.obdurodon.org/).
- [Tromsø Old Russian and Old Church Slavonic Treebank](https://torottreebank.github.io/) (TOROT).
**NB: Texts were heavily normalized and anyone planning to use the model is advised to do the same for the best outcome.
Use the [provided normalization script](https://huggingface.co/npedrazzini/BERTislav/blob/main/normalize.py), customizing it as needed.**
# Model Card Authors
Nilo Pedrazzini
# Model Card Contact
[email protected]
# How to use the model
COMING SOON