diyclassics
commited on
Commit
·
fc9231e
1
Parent(s):
feb57ec
Add paper
Browse files
README.md
CHANGED
@@ -12,3 +12,21 @@ pinned: false
|
|
12 |
Synthetic trained spaCy pipelines for Latin NLP
|
13 |
|
14 |
Developed by [Patrick J. Burns](https://diyclassics.github.io/), 2023.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
Synthetic trained spaCy pipelines for Latin NLP
|
13 |
|
14 |
Developed by [Patrick J. Burns](https://diyclassics.github.io/), 2023.
|
15 |
+
|
16 |
+
## Paper
|
17 |
+
|
18 |
+
Details about training, datasets, etc. can be found in the following paper:
|
19 |
+
Burns, P.J. 2023. “LatinCy: Synthetic Trained Pipelines for Latin NLP.” https://arxiv.org/abs/2305.04365v1.
|
20 |
+
|
21 |
+
### Citation
|
22 |
+
```
|
23 |
+
@misc{burns_latincy_2023,
|
24 |
+
title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}},
|
25 |
+
author = {Burns, Patrick J.},
|
26 |
+
url = {https://arxiv.org/abs/2305.04365v1},
|
27 |
+
shorttitle = {{LatinCy}},
|
28 |
+
abstract = {This paper introduces {LatinCy}, a set of trained general purpose Latin-language "core" pipelines for use with the {spaCy} natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields {POS} tagging, 97.41\% accuracy; lemmatization, 94.66\% accuracy; morphological tagging 92.76\% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a {spaCy} model available for {NLP} work.},
|
29 |
+
date = {2023-05-07},
|
30 |
+
langid = {english},
|
31 |
+
}
|
32 |
+
```
|