t-bank-ai
/

response-quality-classifier-tiny

Text Classification

Inference Endpoints

Model card Files Files and versions Community

response-quality-classifier-tiny / README.md

egoriya's picture

Update README.md

2f33dc9 over 2 years ago

|

2.3 kB

	---
	license: mit
	widget:
	- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]супер, вот только проснулся, у тебя как?"
	example_title: "Dialog example 1"
	- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм"
	example_title: "Dialog example 2"
	- text: "привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?"
	example_title: "Dialog example 3"
	---

	This classification model is based on [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2).
	The model should be used to produce relevance and specificity of the last message in the context of a dialogue.

	The labels explanation:
	- `relevance`: is the last message in the dialogue relevant in the context of the full dialogue
	- `specificity`: is the last message in the dialogue interesting and promotes the continuation of the dialogue

	The preferable length of the dialogue is 4 where the last message is needed to be estimated

	It is pretrained on corpus of dialog data and finetuned on [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity).
	The performance of the model on validation split [tinkoff-ai/context_similarity](https://huggingface.co/tinkoff-ai/context_similarity) (with the best thresholds for validation samples):


	\| \| threshold \| f0.5 \| ROC AUC \|
	\|:------------\|------------:\|-------:\|----------:\|
	\| relevance \| 0.51 \| 0.82 \| 0.74 \|
	\| specificity \| 0.54 \| 0.81 \| 0.8 \|


	The preferable usage:

	```python
	# pip install transformers
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch
	tokenizer = AutoTokenizer.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
	model = AutoModelForSequenceClassification.from_pretrained("tinkoff-ai/response-quality-classifier-tiny")
	# model.cuda()
	inputs = tokenizer('привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?',
	padding=True, max_length=128, truncation=True, add_special_tokens=False, return_tensors='pt')
	with torch.inference_mode():
	logits = model(**inputs).logits
	probas = torch.sigmoid(logits)[0].cpu().detach().numpy()
	print(probas)
	```