|
--- |
|
license: mit |
|
widget: |
|
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]супер, вот только проснулся, у тебя как?" |
|
example_title: "Dialog example 1" |
|
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм" |
|
example_title: "Dialog example 2" |
|
- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?" |
|
example_title: "Dialog example 3" |
|
--- |
|
|
|
This classification model is based on [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2). |
|
The model should be used to produce relevance and specificity of the last message in the context of a dialogue. |
|
|
|
The labels explanation: |
|
- `relevance`: is the last message in the dialogue relevant in the context of the full dialogue |
|
- `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue |
|
|
|
The preferable length of the dialogue is 4 where the last message is needed to be estimated |
|
|
|
It is pretrained on a large corpus of dialog data in unsupervised manner: the model is trained to predict whether last response was in a real dialog, or it was pulled from some other dialog at random. |
|
|
|
Then it was finetuned on manually labelled examples (dataset will be posted soon). |
|
It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity). |
|
The performance of the model on validation split (dataset will be posted soon)[tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples): |
|
|
|
|
|
| | threshold | f0.5 | ROC AUC | |
|
|:------------|------------:|-------:|----------:| |
|
| relevance | 0.51 | 0.82 | 0.74 | |
|
| specificity | 0.54 | 0.81 | 0.8 | |
|
|
|
|
|
How to use: |
|
|
|
```python |
|
# pip install transformers |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
import torch |
|
tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny") |
|
model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny") |
|
# model.cuda() |
|
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt') |
|
with torch.inference_mode(): |
|
logits = model(**inputs).logits |
|
probas = torch.sigmoid(logits)[0].cpu().detach().numpy() |
|
relevance, specificity = probas |
|
``` |