Update README.md
Browse files
README.md
CHANGED
@@ -12,4 +12,29 @@ widget:
|
|
12 |
- "Szép az autó."
|
13 |
- "Elutazok egy napra."
|
14 |
example_title: "Példa"
|
15 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
- "Szép az autó."
|
13 |
- "Elutazok egy napra."
|
14 |
example_title: "Példa"
|
15 |
+
---
|
16 |
+
|
17 |
+
# Hungarian Experimental Sentence-BERT
|
18 |
+
|
19 |
+
The pre-trained hubert-base-cc[https://huggingface.co/SZTAKI-HLT/hubert-base-cc] was fine-tuned on the Hunglish 2.0[http://mokk.bme.hu/resources/hunglishcorpus/] parallel corpus to mimic the bert-base-nli-stsb-mean-tokens[https://huggingface.co/sentence-transformers/bert-base-nli-stsb-mean-tokens] model provided by UKPLab. Sentence embeddings were obtained by applying mean pooling to the huBERT output. The data was split into training (98%) and validation (2%) sets. By the end of the training, a mean squared error of 0.106 was computed on the validation set. Our code was based on the Sentence-Transformers[https://www.sbert.net] library. Our model was trained for 2 epochs on a single GTX 1080Ti GPU card with the batch size set to 32. The training took approximately 15 hours.
|
20 |
+
|
21 |
+
## Limitations
|
22 |
+
|
23 |
+
- max_seq_length = 128
|
24 |
+
|
25 |
+
## Usage
|
26 |
+
|
27 |
+
## Citation
|
28 |
+
If you use this model, please cite the following paper:
|
29 |
+
|
30 |
+
```
|
31 |
+
@article {bertopic,
|
32 |
+
title = {Analyzing Narratives of Patient Experiences: A BERT Topic Modeling Approach},
|
33 |
+
journal = {Acta Polytechnica Hungarica},
|
34 |
+
year = {2023},
|
35 |
+
author = {Osváth, Mátyás and Yang, Zijian Győző and Kósa, Karolina},
|
36 |
+
pages = {153--171},
|
37 |
+
volume = {20},
|
38 |
+
number = {7}
|
39 |
+
}
|
40 |
+
```
|