cointegrated
commited on
Commit
·
967fbbf
1
Parent(s):
d34f966
Update README.md
Browse files
README.md
CHANGED
@@ -26,13 +26,16 @@ It is based on [sentence-transformers/LaBSE](https://huggingface.co/sentence-tra
|
|
26 |
- Masked language modelling on `myv` monolingual data;
|
27 |
- Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
|
28 |
|
|
|
|
|
|
|
29 |
```python
|
30 |
import torch
|
31 |
from transformers import AutoTokenizer, AutoModel
|
32 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
33 |
-
model = AutoModel.from_pretrained("
|
34 |
sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
|
35 |
-
encoded_input = tokenizer(sentences, padding=True, truncation=True,
|
36 |
with torch.no_grad():
|
37 |
model_output = model(**encoded_input)
|
38 |
embeddings = model_output.pooler_output
|
@@ -40,4 +43,4 @@ embeddings = torch.nn.functional.normalize(embeddings)
|
|
40 |
print(embeddings.shape) # torch.Size([3, 768])
|
41 |
```
|
42 |
|
43 |
-
|
|
|
26 |
- Masked language modelling on `myv` monolingual data;
|
27 |
- Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
|
28 |
|
29 |
+
The model can be used as a sentence encoder or a masked language modelling predictor for Erzya, or fine-tuned for any downstream NLU dask.
|
30 |
+
|
31 |
+
Sentence embeddings can be produced with the code below:
|
32 |
```python
|
33 |
import torch
|
34 |
from transformers import AutoTokenizer, AutoModel
|
35 |
+
tokenizer = AutoTokenizer.from_pretrained("slone/LaBSE-en-ru-myv-v1")
|
36 |
+
model = AutoModel.from_pretrained("slone/LaBSE-en-ru-myv-v1")
|
37 |
sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
|
38 |
+
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
39 |
with torch.no_grad():
|
40 |
model_output = model(**encoded_input)
|
41 |
embeddings = model_output.pooler_output
|
|
|
43 |
print(embeddings.shape) # torch.Size([3, 768])
|
44 |
```
|
45 |
|
46 |
+
|