|
--- |
|
language: |
|
- da |
|
tags: |
|
- bert |
|
- pytorch |
|
- hatespeech |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- social media |
|
metrics: |
|
- f1 |
|
widget: |
|
- text: "Senile gamle idiot" |
|
--- |
|
|
|
# Danish BERT for hate speech classification |
|
|
|
The BERT HateSpeech model classifies offensive Danish text into 4 categories: |
|
* `Særlig opmærksomhed` (special attention, e.g. threat) |
|
* `Personangreb` (personal attack) |
|
* `Sprogbrug` (offensive language) |
|
* `Spam & indhold` (spam) |
|
This model is intended to be used after the [BERT HateSpeech detection model](https://huggingface.co/DaNLP/da-bert-hatespeech-detection). |
|
|
|
It is based on the pretrained [Danish BERT](https://github.com/certainlyio/nordic_bert) model by BotXO which has been fine-tuned on social media data. |
|
|
|
See the [DaNLP documentation](https://danlp-alexandra.readthedocs.io/en/latest/docs/tasks/hatespeech.html#bertdr) for more details. |
|
|
|
|
|
Here is how to use the model: |
|
|
|
```python |
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
|
model = BertForSequenceClassification.from_pretrained("DaNLP/da-bert-hatespeech-classification") |
|
tokenizer = BertTokenizer.from_pretrained("DaNLP/da-bert-hatespeech-classification") |
|
``` |
|
|
|
## Training data |
|
|
|
The data used for training has not been made publicly available. It consists of social media data manually annotated in collaboration with Danmarks Radio. |
|
|
|
|