cointegrated
commited on
Commit
•
b32c7b5
1
Parent(s):
04db74e
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This is a version of paraphrase detector by DeepPavlov ([details in the documentation](http://docs.deeppavlov.ai/en/master/features/overview.html#ranking-model-docs)) ported to the `Transformers` format.
|
2 |
+
All credit goes to the authors of DeepPavlov.
|
3 |
+
|
4 |
+
The model has been trained on the dataset from http://paraphraser.ru/.
|
5 |
+
|
6 |
+
It classifies texts as paraphrases (class 1) or non-paraphrases (class 0).
|
7 |
+
|
8 |
+
```python
|
9 |
+
import torch
|
10 |
+
from transformers import AutoModelForSequenceClassification, BertTokenizer
|
11 |
+
model_name = 'cointegrated/rubert-base-cased-dp-paraphrase-detection'
|
12 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name).cuda()
|
13 |
+
tokenizer = BertTokenizer.from_pretrained(model_name)
|
14 |
+
text1 = 'Сегодня на улице хорошая погода'
|
15 |
+
text2 = 'Сегодня на улице отвратительная погода'
|
16 |
+
batch = tokenizer(text1, text2, return_tensors='pt').to(model.device)
|
17 |
+
with torch.inference_mode():
|
18 |
+
proba = torch.softmax(model(**batch).logits, -1).cpu().numpy()
|
19 |
+
print(proba)
|
20 |
+
# [[0.44876656 0.5512334 ]]
|
21 |
+
```
|