metadata
language:
- da
tags:
- bert
- pytorch
- hatespeech
license: cc-by-sa-4.0
datasets:
- social media
metrics:
- f1
widget:
- text: Senile gamle idiot
Danish BERT for hate speech classification
The BERT HateSpeech model classifies offensive Danish text into 4 categories:
Særlig opmærksomhed
(special attention, e.g. threat)Personangreb
(personal attack)Sprogbrug
(offensive language)Spam & indhold
(spam) This model is intended to be used after the BERT HateSpeech detection model.
It is based on the pretrained Danish BERT model by BotXO which has been fine-tuned on social media data.
See the DaNLP documentation for more details.
Here is how to use the model:
from transformers import BertTokenizer, BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained("DaNLP/da-bert-hatespeech-classification")
tokenizer = BertTokenizer.from_pretrained("DaNLP/da-bert-hatespeech-classification")
Training data
The data used for training has not been made publicly available. It consists of social media data manually annotated in collaboration with Danmarks Radio.