HunEmBERT3 / README.md
poltextlab's picture
added gated fields
10ffc8b verified
metadata
license: apache-2.0
language:
  - hu
metrics:
  - accuracy
model-index:
  - name: huBERTPlain
    results:
      - task:
          type: text-classification
        metrics:
          - type: f1
            value: 0.91
widget:
  - text: >-
      A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is
      van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén.
    example_title: Positive
  - text: >-
      Magyarország több évtizede küzd demográfiai válsággal, és egyre több
      gyermekre vágyó pár meddőségi problémákkal néz szembe.
    exmaple_title: Negative
  - text: Tisztelt fideszes, KDNP-s Képviselőtársaim!
    example_title: Neutral
extra_gated_fields:
  Name: text
  Country: country
  Institution: text
  E-mail: text
  Use case: text
extra_gated_prompt: >-
  Our models are intended for academic use only. If you are not affiliated with
  an academic institution, please provide a rationale for using our models.

Model description

Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from parlament.hu.

Intended uses & limitations

The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:

  • 'Label_0': Neutral
  • 'Label_1': Positive
  • 'Label_2': Negative

Training

The fine-tuned version of the original huBERT model (SZTAKI-HLT/hubert-base-cc), trained on HunEmPoli corpus.

Category Count Ratio Sentiment Count Ratio
Neutral 351 1.85% Neutral 351 1.85%
Fear 162 0.85% Negative 11180 58.84%
Sadness 4258 22.41%
Anger 643 3.38%
Disgust 6117 32.19%
Success 6602 34.74% Positive 7471 39.32%
Joy 441 2.32%
Trust 428 2.25%
Sum 19002

Eval results

Class Precision Recall F-Score
Neutral 0.83 0.71 0.76
Positive 0.87 0.91 0.9
Negative 0.94 0.91 0.93
Macro AVG 0.88 0.85 0.86
Weighted WVG 0.91 0.91 0.91

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")

BibTeX entry and citation info

If you use the model, please cite the following paper:

Bibtex:

@ARTICLE{10149341,
  author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
  journal={IEEE Access}, 
  title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, 
  year={2023},
  volume={11},
  number={},
  pages={60267-60278},
  doi={10.1109/ACCESS.2023.3285536}
}