metadata

license: apache-2.0
language:
  - hu
library_name: sentence-transformers
tags:
  - sentence-similarity
widget:
  - source_sentence: Szép napunk van.
    sentences:
      - Jó az idő.
      - Szép az autó.
      - Elutazok egy napra.
    example_title: Példa

Hungarian Experimental Sentence-BERT

The pre-trained hubert-base-cc[https://huggingface.co/SZTAKI-HLT/hubert-base-cc] was fine-tuned on the Hunglish 2.0[http://mokk.bme.hu/resources/hunglishcorpus/] parallel corpus to mimic the bert-base-nli-stsb-mean-tokens[https://huggingface.co/sentence-transformers/bert-base-nli-stsb-mean-tokens] model provided by UKPLab. Sentence embeddings were obtained by applying mean pooling to the huBERT output. The data was split into training (98%) and validation (2%) sets. By the end of the training, a mean squared error of 0.106 was computed on the validation set. Our code was based on the Sentence-Transformers[https://www.sbert.net] library. Our model was trained for 2 epochs on a single GTX 1080Ti GPU card with the batch size set to 32. The training took approximately 15 hours.

Limitations

max_seq_length = 128

Usage

Citation

If you use this model, please cite the following paper:

@article {bertopic,
    title = {Analyzing Narratives of Patient Experiences: A BERT Topic Modeling Approach},
    journal = {Acta Polytechnica Hungarica},
    year = {2023},
    author = {Osváth, Mátyás and Yang, Zijian Győző and Kósa, Karolina},
    pages = {153--171},
    volume = {20},
    number = {7}
}