metadata
license: apache-2.0
extra_gated_fields:
Name: text
Country: country
Institution: text
E-mail: text
Use case: text
extra_gated_prompt: >-
Our models are intended for academic use only. If you are not affiliated with
an academic institution, please provide a rationale for using our models.
[README UNDER CONSTRUCTION]
emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses huBERT tokenizer, and was fine-tuned on a huBERT base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were:
emotion | precision | recall | f1-score |
---|---|---|---|
0 - Anger | 0.70 | 0.74 | 0.72 |
1 - Disgust | 0.72 | 0.73 | 0.73 |
2 - Fear | 0.61 | 0.47 | 0.53 |
3 - Happiness | 0.38 | 0.37 | 0.38 |
4 - Neutral | 0.65 | 0.62 | 0.63 |
5 - Sad | 0.74 | 0.72 | 0.73 |
6 - Successful | 0.79 | 0.81 | 0.80 |
7 - Trustful | 0.76 | 0.78 | 0.77 |
weighted avg | 0.73 | 0.74 | 0.73 |
Accuracy reached 74%. |
The emotions are based on Plutchik 1980, with anticipation substituted with neutral.
Proper use of the model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")
The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.