iamgroot42
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,52 @@
|
|
2 |
license: mit
|
3 |
---
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
Model training details and data will be uploaded soon!
|
|
|
2 |
license: mit
|
3 |
---
|
4 |
|
5 |
+
## Usage
|
6 |
+
|
7 |
+
Code example
|
8 |
+
|
9 |
+
```python
|
10 |
+
import torch.nn.functional as F
|
11 |
+
from torch import Tensor
|
12 |
+
from transformers import AutoTokenizer, AutoModel
|
13 |
+
|
14 |
+
def average_pool(last_hidden_states: Tensor,
|
15 |
+
attention_mask: Tensor) -> Tensor:
|
16 |
+
last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
|
17 |
+
return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
|
18 |
+
|
19 |
+
input_texts = [
|
20 |
+
"what is the capital of Japan?",
|
21 |
+
"Kyoto",
|
22 |
+
"Tokyo",
|
23 |
+
"Beijing"
|
24 |
+
]
|
25 |
+
|
26 |
+
tokenizer = AutoTokenizer.from_pretrained("iamgroot42/rover_nexus")
|
27 |
+
model = AutoModel.from_pretrained("iamgroot42/rover_nexus")
|
28 |
+
|
29 |
+
# Tokenize the input texts
|
30 |
+
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')
|
31 |
+
|
32 |
+
outputs = model(**batch_dict)
|
33 |
+
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
|
34 |
+
|
35 |
+
# (Optionally) normalize embeddings
|
36 |
+
embeddings = F.normalize(embeddings, p=2, dim=1)
|
37 |
+
scores = (embeddings[:1] @ embeddings[1:].T) * 100
|
38 |
+
print(scores.tolist())
|
39 |
+
```
|
40 |
+
|
41 |
+
Use with sentence-transformers:
|
42 |
+
```python
|
43 |
+
from sentence_transformers import SentenceTransformer
|
44 |
+
from sentence_transformers.util import cos_sim
|
45 |
+
|
46 |
+
sentences = ['That is a happy person', 'That is a sad person']
|
47 |
+
|
48 |
+
model = SentenceTransformer('iamgroot42/rover_nexus')
|
49 |
+
embeddings = model.encode(sentences)
|
50 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
51 |
+
```
|
52 |
+
|
53 |
Model training details and data will be uploaded soon!
|