|
--- |
|
license: apache-2.0 |
|
language: |
|
- hu |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: huBERTPlain |
|
results: |
|
- task: |
|
type: text-classification |
|
metrics: |
|
- type: f1 |
|
value: 0.91 |
|
widget: |
|
- text: "A vegetációs időben az országban rendszeresen jelentkező jégesők ellen is van mód védekezni lokálisan, ki-ki a saját nagy értékű ültetvényén." |
|
example_title: "Positive" |
|
|
|
- text: "Magyarország több évtizede küzd demográfiai válsággal, és egyre több gyermekre vágyó pár meddőségi problémákkal néz szembe." |
|
exmaple_title: "Negative" |
|
|
|
- text: "Tisztelt fideszes, KDNP-s Képviselőtársaim!" |
|
example_title: "Neutral" |
|
|
|
extra_gated_fields: |
|
Name: text |
|
Country: country |
|
Institution: text |
|
E-mail: text |
|
Use case: text |
|
extra_gated_prompt: Our models are intended for academic use only. If you are not |
|
affiliated with an academic institution, please provide a rationale for using our |
|
models. |
|
--- |
|
|
|
## Model description |
|
|
|
Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`. |
|
|
|
## Intended uses & limitations |
|
|
|
The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where: |
|
* 'Label_0': Neutral |
|
* 'Label_1': Positive |
|
* 'Label_2': Negative |
|
|
|
## Training |
|
|
|
The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus. |
|
|
|
| Category | Count | Ratio | Sentiment | Count | Ratio | |
|
| -------- | ----- | ------ | --------- | ----- | ------ | |
|
| Neutral | 351 | 1.85% | Neutral | 351 | 1.85% | |
|
| Fear | 162 | 0.85% | Negative | 11180 | 58.84% | |
|
| Sadness | 4258 | 22.41% | |
|
| Anger | 643 | 3.38% | |
|
| Disgust | 6117 | 32.19% | |
|
| Success | 6602 | 34.74% | Positive | 7471 | 39.32% | |
|
| Joy | 441 | 2.32% | |
|
| Trust | 428 | 2.25% | |
|
| Sum | 19002 | | | | | |
|
|
|
## Eval results |
|
|
|
| Class | Precision | Recall | F-Score | |
|
|-----|------------|------------|------| |
|
|Neutral|0.83|0.71|0.76| |
|
|Positive|0.87|0.91|0.9| |
|
|Negative|0.94|0.91|0.93| |
|
|Macro AVG|0.88|0.85|0.86| |
|
|Weighted WVG|0.91|0.91|0.91| |
|
|
|
|
|
## Usage |
|
|
|
```py |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3") |
|
model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3") |
|
``` |
|
|
|
### BibTeX entry and citation info |
|
|
|
If you use the model, please cite the following paper: |
|
|
|
Bibtex: |
|
```bibtex |
|
@ARTICLE{10149341, |
|
author={{"U}veges, Istv{\'a}n and Ring, Orsolya}, |
|
journal={IEEE Access}, |
|
title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, |
|
year={2023}, |
|
volume={11}, |
|
number={}, |
|
pages={60267-60278}, |
|
doi={10.1109/ACCESS.2023.3285536} |
|
} |
|
``` |