fastText
Serbian

FastText Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе модели у Gensim, али и оригиналном формату

Trained on the Serbian language corpus - 9.5 billion words

The files include models in both Gensim and the original format.

from gensim.models import FastText
model = Word2Vec.load("TeslaFT")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0]
0.5305264
0.7095266
0.6041575
0.5771946
0.8870213
from gensim.models.fasttext import load_facebook_model
model = load_facebook_model("TeslaFT.bin")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0]
0.5305264
0.7095266
0.6041575
0.5771946
0.8870213
Author
Mihailo Škorić
Computation
TESLA project


Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA

Downloads last month
6
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train te-sla/FastTextSr