File size: 1,816 Bytes
90b8da1 14e54e5 f1e55fc 6c4565a ea22dca 942c2a0 9765fb9 4bc0656 9765fb9 4bc0656 9765fb9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: apache-2.0
language:
- hu
library_name: sentence-transformers
tags:
- sentence-similarity
widget:
- source_sentence: "Szép napunk van."
sentences:
- "Jó az idő."
- "Szép az autó."
- "Elutazok egy napra."
example_title: "Példa"
---
# Hungarian Experimental Sentence-BERT
The pre-trained [huBERT](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) was fine-tuned on the[ Hunglish 2.0](http://mokk.bme.hu/resources/hunglishcorpus) parallel corpus to mimic the [bert-base-nli-stsb-mean-tokens](https://huggingface.co/sentence-transformers/bert-base-nli-stsb-mean-tokens) model provided by UKPLab. Sentence embeddings were obtained by applying mean pooling to the huBERT output. The data was split into training (98%) and validation (2%) sets. By the end of the training, a mean squared error of 0.106 was computed on the validation set. Our code was based on the [Sentence-Transformers](https://www.sbert.net) library. Our model was trained for 2 epochs on a single GTX 1080Ti GPU card with the batch size set to 32. The training took approximately 15 hours.
## Limitations
- max_seq_length = 128
## Usage
```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('NYTK/sentence-transformers-experimental-hubert-hungarian')
embeddings = model.encode(sentences)
print(embeddings)
```
## Citation
If you use this model, please cite the following paper:
```
@article {bertopic,
title = {Analyzing Narratives of Patient Experiences: A BERT Topic Modeling Approach},
journal = {Acta Polytechnica Hungarica},
year = {2023},
author = {Osváth, Mátyás and Yang, Zijian Győző and Kósa, Karolina},
pages = {153--171},
volume = {20},
number = {7}
}
``` |