File size: 2,609 Bytes
e5bbd67 af6db24 2489de7 af6db24 57dfe7c e5bbd67 7ade9e9 da4dc8e e5bbd67 2489de7 0139ea5 2489de7 2db9271 2489de7 2db9271 2489de7 2db9271 2489de7 0139ea5 2db9271 d2403c4 0139ea5 9f849fa 0139ea5 2489de7 ef6f535 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
language: ["ru"]
tags:
- russian
- pretraining
- conversational
license: mit
widget:
- text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] норм"
example_title: "Dialog example 1"
- text: "[CLS] привет [SEP] привет! [SEP] как дела? [RESPONSE_TOKEN] ты *****"
example_title: "Dialog example 2"
---
# response-toxicity-classifier-base
[BERT classifier from Skoltech](https://huggingface.co/Skoltech/russian-inappropriate-messages), finetuned on contextual data with 4 labels.
# Training
[*Skoltech/russian-inappropriate-messages*](https://huggingface.co/Skoltech/russian-inappropriate-messages) was finetuned on a multiclass data with four classes (*check the exact mapping between idx and label in* `model.config`).
1) OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
2) Toxic label — the message might be seen as a offensive one in given context.
3) Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
4) Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)
The model was finetuned on a soon-to-be-posted dialogs datasets.
# Evaluation results
Model achieves the following results on the validation datasets (will be posted soon):
|| OK - F1-score | TOXIC - F1-score | SEVERE TOXIC - F1-score | RISKS - F1-score |
|---------|---------------|------------------|-------------------------|------------------|
|internet dialogs | 0.896 | 0.348 | 0.490 | 0.591 |
|chatbot dialogs | 0.940 | 0.295 | 0.729 | 0.46 |
# Use in transformers
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
logits = model(**inputs).logits
probas = torch.softmax(logits, dim=-1)[0].cpu().detach().numpy()
```
The work was done during internship at Tinkoff by [Nikita Stepanov](https://huggingface.co/nikitast), mentored by [Alexander Markov] (https://huggingface.co/amarkv).
|