Add Sentence Transformers snippet to README
#2
by
tomaarsen
HF staff
- opened
README.md
CHANGED
@@ -3223,7 +3223,7 @@ Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)
|
|
3223 |
|
3224 |
### Why mean pooling?
|
3225 |
|
3226 |
-
`mean
|
3227 |
It has been proved to be the most effective way to produce high-quality sentence embeddings.
|
3228 |
We offer an `encode` function to deal with this.
|
3229 |
|
@@ -3256,7 +3256,7 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
|
|
3256 |
</p>
|
3257 |
</details>
|
3258 |
|
3259 |
-
You can use Jina Embedding models directly from transformers package:
|
3260 |
```python
|
3261 |
!pip install transformers
|
3262 |
from transformers import AutoModel
|
@@ -3277,7 +3277,22 @@ embeddings = model.encode(
|
|
3277 |
)
|
3278 |
```
|
3279 |
|
3280 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3281 |
|
3282 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
3283 |
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|
|
|
3223 |
|
3224 |
### Why mean pooling?
|
3225 |
|
3226 |
+
`mean pooling` takes all token embeddings from model output and averaging them at sentence/paragraph level.
|
3227 |
It has been proved to be the most effective way to produce high-quality sentence embeddings.
|
3228 |
We offer an `encode` function to deal with this.
|
3229 |
|
|
|
3256 |
</p>
|
3257 |
</details>
|
3258 |
|
3259 |
+
You can use Jina Embedding models directly from the `transformers` package:
|
3260 |
```python
|
3261 |
!pip install transformers
|
3262 |
from transformers import AutoModel
|
|
|
3277 |
)
|
3278 |
```
|
3279 |
|
3280 |
+
Or you can use the model with the `sentence-transformers` package:
|
3281 |
+
```python
|
3282 |
+
from sentence_transformers import SentenceTransformer, util
|
3283 |
+
|
3284 |
+
model = SentenceTransformer("jinaai/jina-embeddings-v2-base-es", trust_remote_code=True)
|
3285 |
+
embeddings = model.encode(['How is the weather today?', '¿Qué tiempo hace hoy?'])
|
3286 |
+
print(util.cos_sim(embeddings[0], embeddings[1]))
|
3287 |
+
```
|
3288 |
+
|
3289 |
+
And if you only want to handle shorter sequence, such as 2k, then you can set the `model.max_seq_length`
|
3290 |
+
|
3291 |
+
```python
|
3292 |
+
model.max_seq_length = 2048
|
3293 |
+
```
|
3294 |
+
|
3295 |
+
## Alternatives to Transformers and Sentence Transformers
|
3296 |
|
3297 |
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
3298 |
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|