Lukas Erhard commited on
Commit
b67ea4f
·
1 Parent(s): 862494c

update readme for rev1 model

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - de
5
+ pipeline_tag: text-classification
6
+ metrics:
7
+ - f1
8
+ library_name: transformers
9
+ ---
10
+
11
+ # PopBERT
12
+
13
+ PopBERT is a model for German-language populism detection in political speeches within the German Bundestag, based on the deepset/gbert-large model: https://huggingface.co/deepset/gbert-large
14
+
15
+ It is a multilabel model trained on a manually curated dataset of sentences from the 18th and 19th legislative periods.
16
+ In addition to capturing the foundational dimensions of populism, namely "anti-elitism" and "people-centrism," the model was also fine-tuned to identify the underlying ideological orientation as either "left-wing" or "right-wing."
17
+
18
+ # Prediction
19
+
20
+ The model outputs a Tensor of length 4.
21
+ The table connects the position of the predicted probability to its dimension.
22
+
23
+ | **Index** | **Dimension** |
24
+ |-----------|--------------------------|
25
+ | 0 | Anti-Elitism |
26
+ | 1 | People-Centrism |
27
+ | 2 | Left-Wing Host-Ideology |
28
+ | 3 | Right-Wing Host-Ideology |
29
+
30
+ # Usage Example
31
+
32
+ ```python
33
+ from transformers import AutoModel
34
+ from transformers import AutoTokenizer
35
+
36
+ # load tokenizer
37
+ tokenizer = AutoTokenizer.from_pretrained("luerhard/PopBERT")
38
+
39
+ # load model
40
+ model = AutoModel.from_pretrained("luerhard/PopBERT")
41
+
42
+ # define text to be predicted
43
+ text = (
44
+ "Das ist Klassenkampf von oben, das ist Klassenkampf im Interesse von "
45
+ "Vermögenden und Besitzenden gegen die Mehrheit der Steuerzahlerinnen und "
46
+ "Steuerzahler auf dieser Erde."
47
+ )
48
+
49
+ # encode text with tokenizer
50
+ encodings = tokenizer(text)
51
+
52
+ # predict
53
+ with torch.inference_mode():
54
+ out = model(**encodings)
55
+
56
+ # get probabilties
57
+ probs = torch.nn.functional.sigmoid(out.logits)
58
+ print(probs.detach().numpy())
59
+ ```
60
+
61
+ ```
62
+ array([[0.87651485, 0.34838045, 0.983123 , 0.02148381]], dtype=float32)
63
+ ```
64
+
65
+
66
+ # Performance
67
+
68
+ To maximize performance, it is recommended to use the following thresholds per dimension:
69
+
70
+ ```
71
+ [0.415961, 0.295400, 0.429109, 0.302714]
72
+ ```
73
+
74
+ Using these thresholds, the model achieves the follwing performance on the test set:
75
+
76
+ | Dimension | Precision | Recall | F1 |
77
+ |---------------------|---------------|---------------|---------------|
78
+ | Anti-Elitism | 0.81 | 0.88 | 0.84 |
79
+ | People-Centrism | 0.70 | 0.73 | 0.71 |
80
+ | Left-Wing Ideology | 0.69 | 0.77 | 0.73 |
81
+ | Right-Wing Ideology | 0.68 | 0.66 | 0.67 |
82
+ | --- | --- | --- | --- |
83
+ | micro avg | 0.75 | 0.80 | 0.77 |
84
+ | macro avg | 0.72 | 0.76 | 0.74 |