File size: 2,578 Bytes
6b1f225 86a2fd4 8b1fccf 8bf71eb 8b1fccf 8bf71eb 8b1fccf 6b1f225 bb2f0ee 125b739 8539235 bb2f0ee 8539235 fa005e0 e81ff35 1ff1a29 e81ff35 bb2f0ee ecb53ee 1ff1a29 ecb53ee bb2f0ee 566945e bb2f0ee 2468cee fa005e0 2f33dc9 bb2f0ee e81ff35 fa005e0 2468cee fa005e0 e81ff35 bb2f0ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: mit
widget:
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]супер, вот только проснулся, у тебя как?"
example_title: "Dialog example 1"
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм"
example_title: "Dialog example 2"
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?"
example_title: "Dialog example 3"
---
This classification model is based on [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
The model should be used to produce relevance and specificity of the last message in the context of a dialogue.
The labels explanation:
- `relevance`: is the last message in the dialogue relevant in the context of the full dialogue
- `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue
The preferable length of the dialogue is 4 where the last message is needed to be estimated
It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random.
Then it was finetuned on manually labelled examples (dataset will be posted soon).
It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
The performance of the model on validation split (dataset will be posted soon)[tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):
| | threshold | f0.5 | ROC AUC |
|:------------|------------:|-------:|----------:|
| relevance | 0.51 | 0.82 | 0.74 |
| specificity | 0.54 | 0.81 | 0.8 |
How to use:
```python
# pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
# model.cuda()
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
logits = model(**inputs).logits
probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
relevance, specificity = probas
``` |