cointegrated commited on
Commit
967fbbf
·
1 Parent(s): d34f966

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -26,13 +26,16 @@ It is based on [sentence-transformers/LaBSE](https://huggingface.co/sentence-tra
26
  - Masked language modelling on `myv` monolingual data;
27
  - Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
28
 
 
 
 
29
  ```python
30
  import torch
31
  from transformers import AutoTokenizer, AutoModel
32
- tokenizer = AutoTokenizer.from_pretrained("cointegrated/LaBSE-en-ru")
33
- model = AutoModel.from_pretrained("cointegrated/LaBSE-en-ru")
34
  sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
35
- encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
36
  with torch.no_grad():
37
  model_output = model(**encoded_input)
38
  embeddings = model_output.pooler_output
@@ -40,4 +43,4 @@ embeddings = torch.nn.functional.normalize(embeddings)
40
  print(embeddings.shape) # torch.Size([3, 768])
41
  ```
42
 
43
- The model can be used as a sentence encoder or fine-tuned for any downstream NLU dask.
 
26
  - Masked language modelling on `myv` monolingual data;
27
  - Sentence pair classification to distinguish correct `ru-myv` translations from random pairs.
28
 
29
+ The model can be used as a sentence encoder or a masked language modelling predictor for Erzya, or fine-tuned for any downstream NLU dask.
30
+
31
+ Sentence embeddings can be produced with the code below:
32
  ```python
33
  import torch
34
  from transformers import AutoTokenizer, AutoModel
35
+ tokenizer = AutoTokenizer.from_pretrained("slone/LaBSE-en-ru-myv-v1")
36
+ model = AutoModel.from_pretrained("slone/LaBSE-en-ru-myv-v1")
37
  sentences = ["Hello World", "Привет Мир", "Шумбратадо Мастор"]
38
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
39
  with torch.no_grad():
40
  model_output = model(**encoded_input)
41
  embeddings = model_output.pooler_output
 
43
  print(embeddings.shape) # torch.Size([3, 768])
44
  ```
45
 
46
+