luerhard
/

PopBERT

Text Classification

Inference Endpoints

Model card Files Files and versions Community

PopBERT / README.md

luerhard's picture

Fix Code Example in README

cf44004 verified 11 months ago

|

history blame contribute delete

2.8 kB

	---
	license: mit
	language:
	- de
	pipeline_tag: text-classification
	metrics:
	- f1
	library_name: transformers
	---

	# PopBERT

	PopBERT is a model for German-language populism detection in political speeches within the German Bundestag, based on the deepset/gbert-large model: https://huggingface.co/deepset/gbert-large

	It is a multilabel model trained on a manually curated dataset of sentences from the 18th and 19th legislative periods.
	In addition to capturing the foundational dimensions of populism, namely "anti-elitism" and "people-centrism," the model was also fine-tuned to identify the underlying ideological orientation as either "left-wing" or "right-wing."

	# Prediction

	The model outputs a Tensor of length 4.
	The table connects the position of the predicted probability to its dimension.

	\| Index \| Dimension \|
	\|-----------\|--------------------------\|
	\| 0 \| Anti-Elitism \|
	\| 1 \| People-Centrism \|
	\| 2 \| Left-Wing Host-Ideology \|
	\| 3 \| Right-Wing Host-Ideology \|

	# Usage Example

	```python
	import torch
	from transformers import AutoModelForSequenceClassification
	from transformers import AutoTokenizer

	# load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("luerhard/PopBERT")

	# load model
	model = AutoModelForSequenceClassification.from_pretrained("luerhard/PopBERT")

	# define text to be predicted
	text = (
	"Das ist Klassenkampf von oben, das ist Klassenkampf im Interesse von "
	"Vermögenden und Besitzenden gegen die Mehrheit der Steuerzahlerinnen und "
	"Steuerzahler auf dieser Erde."
	)

	# encode text with tokenizer
	encodings = tokenizer(text, return_tensors="pt")

	# predict
	with torch.inference_mode():
	out = model(**encodings)

	# get probabilties
	probs = torch.nn.functional.sigmoid(out.logits)
	print(probs.detach().numpy())
	```

	```
	[[0.8765146 0.34838045 0.983123 0.02148379]]
	```


	# Performance

	To maximize performance, it is recommended to use the following thresholds per dimension:

	```
	[0.415961, 0.295400, 0.429109, 0.302714]
	```

	Using these thresholds, the model achieves the following performance on the test set:

	\| Dimension \| Precision \| Recall \| F1 \|
	\|---------------------\|---------------\|---------------\|---------------\|
	\| Anti-Elitism \| 0.81 \| 0.88 \| 0.84 \|
	\| People-Centrism \| 0.70 \| 0.73 \| 0.71 \|
	\| Left-Wing Ideology \| 0.69 \| 0.77 \| 0.73 \|
	\| Right-Wing Ideology \| 0.68 \| 0.66 \| 0.67 \|
	\| --- \| --- \| --- \| --- \|
	\| micro avg \| 0.75 \| 0.80 \| 0.77 \|
	\| macro avg \| 0.72 \| 0.76 \| 0.74 \|