README.md · poltextlab/emBERT at main

metadata

license: apache-2.0
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  E-mail: text
  Use case: text
extra_gated_prompt: >-
  Our models are intended for academic use only. If you are not affiliated with
  an academic institution, please provide a rationale for using our models.

[README UNDER CONSTRUCTION]

emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses huBERT tokenizer, and was fine-tuned on a huBERT base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were:

emotion	precision	recall	f1-score
0 - Anger	0.70	0.74	0.72
1 - Disgust	0.72	0.73	0.73
2 - Fear	0.61	0.47	0.53
3 - Happiness	0.38	0.37	0.38
4 - Neutral	0.65	0.62	0.63
5 - Sad	0.74	0.72	0.73
6 - Successful	0.79	0.81	0.80
7 - Trustful	0.76	0.78	0.77
weighted avg	0.73	0.74	0.73
Accuracy reached 74%.

The emotions are based on Plutchik 1980, with anticipation substituted with neutral.

Proper use of the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")

The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.