diyclassics's picture
Update spaCy pipeline
bc4112a verified
metadata
tags:
  - spacy
language:
  - la
license: mit

Code required to train lg floret embeddings for Latin on LatinCy Assets data. Based on spaCy project Train floret vectors from Wikipedia and OSCAR.

Feature Description
Name la_vectors_floret_lg
Version 3.8.0
spaCy >=3.8.3,<3.9.0
Default Pipeline
Components
Vectors -1 keys, 200000 unique vectors (300 dimensions)
Sources UD_Latin-Perseus
UD_Latin-PROIEL
UD_Latin-ITTB
UD_Latin-LLCT
UD_Latin-UDante
Wikipedia
OSCAR
Corpus Thomisticum
The Latin Library
CLTK-Tesserae Latin
Patrologia Latina
License MIT
Author Patrick J. Burns