File size: 2,095 Bytes
f14fc3a
c63da06
 
 
 
 
 
 
 
 
 
f14fc3a
c63da06
7163e40
c63da06
 
 
 
 
 
5afc8f1
f14fc3a
c63da06
 
 
dd424f7
c63da06
0e0c98c
1464bde
7163e40
c63da06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
042b1f4
 
c63da06
 
 
 
 
 
 
 
 
 
 
 
 
042b1f4
c63da06
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
language: da
tags:
- danish
- bert
- sentiment
- text-classification
- Maltehb/danish-bert-botxo
- Helsinki-NLP/opus-mt-en-da
- go-emotion
- Certainly
license: cc-by-4.0
datasets:
- go_emotions
metrics:
- Accuracy
widget:
- text: "Det er så sødt af dig at tænke på andre på den måde ved du det?"
- text: "Jeg vil gerne have en playstation."
- text: "Jeg elsker dig"
- text: "Hvordan håndterer jeg min irriterende nabo?"
---

# Danish-Bert-GoÆmotion

Danish Go-Emotions classifier. [Maltehb/danish-bert-botxo](https://huggingface.co/Maltehb/danish-bert-botxo) (uncased) finetuned on a translation of the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset using [Helsinki-NLP/opus-mt-en-da](https://huggingface.co/Helsinki-NLP/opus-mt-de-en). Thus, performance is obviousely dependent on the translation model.

## Training
- Translating the training data with MT: [Notebook](https://colab.research.google.com/github/RJuro/Da-HyggeBERT-finetuning/blob/main/HyggeBERT_translation_en_da.ipynb)
- Fine-tuning danish-bert-botxo: coming soon...

## Training Parameters:

```
Num examples = 189900
Num Epochs = 3
Train batch = 8
Eval batch = 8
Learning Rate = 3e-5
Warmup steps = 4273
Total optimization steps = 71125
```

## Loss
### Training loss
![](wb_loss.png)

### Eval. loss
```
0.1178 (21100 examples)
```


## Using the model with `transformers`
Easiest use with `transformers` and `pipeline`:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model = AutoModelForSequenceClassification.from_pretrained('RJuro/Da-HyggeBERT')
tokenizer = AutoTokenizer.from_pretrained('RJuro/Da-HyggeBERT')

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

classifier('jeg elsker dig')
```

`[{'label': 'kærlighed', 'score': 0.9634820818901062}]`

## Using the model with `simpletransformers`

```python
from simpletransformers.classification import MultiLabelClassificationModel

model = MultiLabelClassificationModel('bert', 'RJuro/Da-HyggeBERT')

predictions, raw_outputs = model.predict(df['text'])
```