YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Logion: Machine Learning for Greek Philology
The most advanced Ancient Greek BERT model trained to date! Read the paper on arxiv by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
We train a WordPiece tokenizer (with a vocab size of 50,000) on a corpus of over 70 million words of premodern Greek. Using this tokenizer and the same corpus, we train a BERT model.
Further information on this project and code for error detection can be found on GitHub.
We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
How to use
Requirements:
pip install transformers
Load the model and tokenizer directly from the HuggingFace Model Hub:
from transformers import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-50k_wordpiece")
model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")
Cite
If you use this model in your research, please cite the paper:
@misc{logion-base,
title={Logion: Machine Learning for Greek Philology},
author={Cowen-Breen, C. and Brooks, C. and Haubold, J. and Graziosi, B.},
year={2023},
eprint={2305.01099},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 755
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.