amarkv's picture
Update README.md
d2403c4
|
raw
history blame
2.4 kB
metadata
language:
  - ru
tags:
  - russian
  - pretraining
  - conversational
license: mit
widget:
  - text: '[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] норм'
    example_title: Dialog example 1

response-toxicity-classifier-base

BERT classifier from Skoltech, finetuned on contextual data with 4 labels.

Training

Skoltech/russian-inappropriate-messages was finetuned on a multiclass data with four classes (check the exact mapping between idx and label in model.config).

  1. OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
  2. Toxic label — the message might be seen as a offensive one in given context.
  3. Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
  4. Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)

The model was finetuned on a soon-to-be-posted dialogs datasets.

Evaluation results

Model achieves the following results on the validation datasets (will be posted soon):

OK - F1-score TOXIC - F1-score SEVERE TOXIC - F1-score RISKS - F1-score
internet dialogs 0.896 0.348 0.490 0.591
chatbot dialogs 0.940 0.295 0.729 0.46

Use in transformers

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
    logits = model(**inputs).logits
    probas = torch.sigmoid(logits)[0].cpu().detach().numpy()

The work was done during internship at Tinkoff by Nikita Stepanov.