|
--- |
|
title: README |
|
emoji: 🏢 |
|
colorFrom: red |
|
colorTo: blue |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# LatinCy |
|
|
|
Synthetic trained spaCy pipelines for Latin NLP |
|
|
|
Developed by [Patrick J. Burns](https://diyclassics.github.io/), 2023. |
|
|
|
## Paper |
|
|
|
Details about training, datasets, etc. can be found in the following paper: |
|
Burns, P.J. 2023. “LatinCy: Synthetic Trained Pipelines for Latin NLP.” https://arxiv.org/abs/2305.04365v1. |
|
|
|
### Citation |
|
``` |
|
@misc{burns_latincy_2023, |
|
title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}}, |
|
author = {Burns, Patrick J.}, |
|
url = {https://arxiv.org/abs/2305.04365v1}, |
|
shorttitle = {{LatinCy}}, |
|
abstract = {This paper introduces {LatinCy}, a set of trained general purpose Latin-language "core" pipelines for use with the {spaCy} natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields {POS} tagging, 97.41\% accuracy; lemmatization, 94.66\% accuracy; morphological tagging 92.76\% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a {spaCy} model available for {NLP} work.}, |
|
date = {2023-05-07}, |
|
langid = {english}, |
|
} |
|
``` |
|
|