ophelielacroix's picture
initial commit
19fdf61
|
raw
history blame
1.34 kB
metadata
language:
  - da
tags:
  - bert
  - pytorch
  - hatespeech
license: cc-by-sa-4.0
datasets:
  - social media
metrics:
  - f1
widget:
  - text: Senile gamle idiot

Danish BERT for hate speech classification

The BERT HateSpeech model classifies offensive Danish text into 4 categories:

  • Særlig opmærksomhed (special attention, e.g. threat)
  • Personangreb (personal attack)
  • Sprogbrug (offensive language)
  • Spam & indhold (spam) This model is intended to be used after the BERT HateSpeech detection model.

It is based on the pretrained Danish BERT model by BotXO which has been fine-tuned on social media data.

See the DaNLP documentation for more details.

Here is how to use the model:

from transformers import BertTokenizer, BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("DaNLP/da-bert-hatespeech-classification")
tokenizer = BertTokenizer.from_pretrained("DaNLP/da-bert-hatespeech-classification")

Training data

The data used for training has not been made publicly available. It consists of social media data manually annotated in collaboration with Danmarks Radio.