system HF staff commited on
Commit
16b93fe
·
1 Parent(s): b8ea8b9

Update model_cards/labse-README.md

Browse files
Files changed (1) hide show
  1. model_cards/labse-README.md +47 -0
model_cards/labse-README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ thumbnail:
4
+ tags:
5
+ - bert
6
+ - embeddings
7
+ license: Apache-2.0
8
+ ---
9
+
10
+ # LABSE BERT
11
+
12
+ ## Model description
13
+
14
+ Model for "Language-agnostic BERT Sentence Embedding" paper from Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang. Model available in [TensorFlow Hub](https://tfhub.dev/google/LaBSE/1).
15
+
16
+ ## Intended uses & limitations
17
+
18
+ #### How to use
19
+
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModel
22
+ import torch
23
+
24
+ # from sentence-transformers
25
+ def mean_pooling(model_output, attention_mask):
26
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
27
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
28
+ sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
29
+ sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
30
+ return sum_embeddings / sum_mask
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained("pvl/labse_bert", do_lower_case=False)
33
+ model = AutoModel.from_pretrained("pvl/labse_bert")
34
+
35
+ sentences = ['This framework generates embeddings for each input sentence',
36
+ 'Sentences are passed as a list of string.',
37
+ 'The quick brown fox jumps over the lazy dog.']
38
+
39
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
40
+
41
+ with torch.no_grad():
42
+ model_output = model(**encoded_input)
43
+
44
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
45
+
46
+
47
+ ```