feat: sentence transformers
Browse files
README.md
CHANGED
@@ -2902,7 +2902,9 @@ base_model:
|
|
2902 |
|
2903 |
# ModernBERT Embed
|
2904 |
|
2905 |
-
ModernBERT Embed is an embedding model trained from [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), brining the new advances of ModernBERT to embeddings!
|
|
|
|
|
2906 |
|
2907 |
## Performance
|
2908 |
|
@@ -2958,6 +2960,24 @@ embeddings = F.normalize(embeddings, p=2, dim=1)
|
|
2958 |
print(embeddings)
|
2959 |
```
|
2960 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2961 |
## Training
|
2962 |
|
2963 |
Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!
|
|
|
2902 |
|
2903 |
# ModernBERT Embed
|
2904 |
|
2905 |
+
ModernBERT Embed is an embedding model trained from [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), brining the new advances of ModernBERT to embeddings!
|
2906 |
+
|
2907 |
+
Trained on the [Nomic Embed](https://arxiv.org/abs/2402.01613) weakly-supervised and supervised datasets, `modernbert-embed` also supports Matryoshka Representation Learning dimensions of 256, reducing memory by 3x with minimal performance loss.
|
2908 |
|
2909 |
## Performance
|
2910 |
|
|
|
2960 |
print(embeddings)
|
2961 |
```
|
2962 |
|
2963 |
+
### Sentence Transformers
|
2964 |
+
|
2965 |
+
```python
|
2966 |
+
from sentence_transformers import SentenceTransformer
|
2967 |
+
|
2968 |
+
model = SentenceTransformer(
|
2969 |
+
"nomic-ai/modernbert-embed",
|
2970 |
+
)
|
2971 |
+
|
2972 |
+
# Verify that everything works as expected
|
2973 |
+
embeddings = model.encode(['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'])
|
2974 |
+
print(embeddings.shape)
|
2975 |
+
|
2976 |
+
similarities = model.similarity(embeddings, embeddings)
|
2977 |
+
print(similarities)
|
2978 |
+
```
|
2979 |
+
|
2980 |
+
|
2981 |
## Training
|
2982 |
|
2983 |
Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!
|