HunEmBERT3 / README.md
poltextlab's picture
added gated fields
10ffc8b verified
---
license: apache-2.0
language:
- hu
metrics:
- accuracy
model-index:
- name: huBERTPlain
results:
- task:
type: text-classification
metrics:
- type: f1
value: 0.91
widget:
- text: "A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén."
example_title: "Positive"
- text: "Magyarország több évtizede küzd demográfiai válsággal, és egyre több gyermekre vágyó pár meddőségi problémákkal néz szembe."
exmaple_title: "Negative"
- text: "Tisztelt fideszes, KDNP-s Képviselőtársaim!"
example_title: "Neutral"
extra_gated_fields:
Name: text
Country: country
Institution: text
E-mail: text
Use case: text
extra_gated_prompt: Our models are intended for academic use only. If you are not
affiliated with an academic institution, please provide a rationale for using our
models.
---
## Model description
Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`.
## Intended uses & limitations
The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:
* 'Label_0': Neutral
* 'Label_1': Positive
* 'Label_2': Negative
## Training
The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.
| Category | Count | Ratio | Sentiment | Count | Ratio |
| -------- | ----- | ------ | --------- | ----- | ------ |
| Neutral | 351 | 1.85% | Neutral | 351 | 1.85% |
| Fear | 162 | 0.85% | Negative | 11180 | 58.84% |
| Sadness | 4258 | 22.41% |
| Anger | 643 | 3.38% |
| Disgust | 6117 | 32.19% |
| Success | 6602 | 34.74% | Positive | 7471 | 39.32% |
| Joy | 441 | 2.32% |
| Trust | 428 | 2.25% |
| Sum | 19002 | | | | |
## Eval results
| Class | Precision | Recall | F-Score |
|-----|------------|------------|------|
|Neutral|0.83|0.71|0.76|
|Positive|0.87|0.91|0.9|
|Negative|0.94|0.91|0.93|
|Macro AVG|0.88|0.85|0.86|
|Weighted WVG|0.91|0.91|0.91|
## Usage
```py
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")
```
### BibTeX entry and citation info
If you use the model, please cite the following paper:
Bibtex:
```bibtex
@ARTICLE{10149341,
author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
journal={IEEE Access},
title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication},
year={2023},
volume={11},
number={},
pages={60267-60278},
doi={10.1109/ACCESS.2023.3285536}
}
```