Bert-based classifier (finetuned from rubert-tiny2)
Merged datasets:
The datasets split into train, val, test splits in 80-10-10 proportion The metrics obtained from test dataset is as follows:
precision | recall | f1-score | support | |
---|---|---|---|---|
0 | 0.9827 | 0.9827 | 0.9827 | 21216 |
1 | 0.9272 | 0.9274 | 0.9273 | 5054 |
accuracy | 0.9720 | 26270 | ||
macro avg | 0.9550 | 0.9550 | 0.9550 | 26270 |
weighted avg | 0.9720 | 0.9720 | 0.9720 | 26270 |
Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
PATH = 'khvatov/ru_toxicity_detector'
tokenizer = AutoTokenizer.from_pretrained(PATH)
model = AutoModelForSequenceClassification.from_pretrained(PATH)
# if torch.cuda.is_available():
# model.cuda()
model.to(torch.device("cpu"))
def get_toxicity_probs(text):
with torch.no_grad():
inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True).to(model.device)
proba = torch.nn.functional.softmax(model(**inputs).logits, dim=1).cpu().numpy()
return proba[0]
TEXT = "Марк был хороший"
print(f'text = {TEXT}, probs={get_toxicity_probs(TEXT)}')
# text = Марк был хороший, probs=[0.9940585 0.00594147]
Train
The model has been trained with Adam optimizer, the learning rate of 2e-5, and batch size of 32 for 3 epochs
- Downloads last month
- 98
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.