cointegrated
/

rubert-base-cased-dp-paraphrase-detection

Text Classification

sentence-similarity

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Nov 22, 2021

Commit

b32c7b5

•

1 Parent(s): 04db74e

Create README.md

Files changed (1) hide show

README.md +21 -0

README.md ADDED Viewed

	@@ -0,0 +1,21 @@

+This is a version of paraphrase detector by DeepPavlov ([details in the documentation](http://docs.deeppavlov.ai/en/master/features/overview.html#ranking-model-docs)) ported to the `Transformers` format.
+All credit goes to the authors of DeepPavlov.
+The model has been trained on the dataset from http://paraphraser.ru/.
+It classifies texts as paraphrases (class 1) or non-paraphrases (class 0).
+```python
+import torch
+from transformers import AutoModelForSequenceClassification, BertTokenizer
+model_name = 'cointegrated/rubert-base-cased-dp-paraphrase-detection'
+model = AutoModelForSequenceClassification.from_pretrained(model_name).cuda()
+tokenizer = BertTokenizer.from_pretrained(model_name)
+text1 = 'Сегодня на улице хорошая погода'
+text2 = 'Сегодня на улице отвратительная погода'
+batch = tokenizer(text1, text2, return_tensors='pt').to(model.device)
+with torch.inference_mode():
+    proba = torch.softmax(model(**batch).logits, -1).cpu().numpy()
+print(proba)
+# [[0.44876656 0.5512334 ]]
+```