cointegrated commited on
Commit
b32c7b5
1 Parent(s): 04db74e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is a version of paraphrase detector by DeepPavlov ([details in the documentation](http://docs.deeppavlov.ai/en/master/features/overview.html#ranking-model-docs)) ported to the `Transformers` format.
2
+ All credit goes to the authors of DeepPavlov.
3
+
4
+ The model has been trained on the dataset from http://paraphraser.ru/.
5
+
6
+ It classifies texts as paraphrases (class 1) or non-paraphrases (class 0).
7
+
8
+ ```python
9
+ import torch
10
+ from transformers import AutoModelForSequenceClassification, BertTokenizer
11
+ model_name = 'cointegrated/rubert-base-cased-dp-paraphrase-detection'
12
+ model = AutoModelForSequenceClassification.from_pretrained(model_name).cuda()
13
+ tokenizer = BertTokenizer.from_pretrained(model_name)
14
+ text1 = 'Сегодня на улице хорошая погода'
15
+ text2 = 'Сегодня на улице отвратительная погода'
16
+ batch = tokenizer(text1, text2, return_tensors='pt').to(model.device)
17
+ with torch.inference_mode():
18
+ proba = torch.softmax(model(**batch).logits, -1).cpu().numpy()
19
+ print(proba)
20
+ # [[0.44876656 0.5512334 ]]
21
+ ```