silma-ai
/

silma-embeddding-matryoshka-v0.1

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

loss:CosineSimilarityLoss

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

bakrianoo commited on Oct 17

Commit

036228c

•

1 Parent(s): 06c686a

Update README.md

Files changed (1) hide show

README.md +17 -15

README.md CHANGED Viewed

@@ -98,22 +98,11 @@ language:
 # SILMA Arabic Matryoshka Embedding Model 0.1
-### Model Description
-- **Model Type:** Sentence Transformer
-- **Base model:** [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) <!-- at revision 016fb9d6768f522a59c6e0d2d5d5d43a4e1bff60 -->
-- **Maximum Sequence Length:** 512 tokens
-- **Output Dimensionality:** 768 tokens
-- **Similarity Function:** Cosine Similarity
-### Full Model Architecture
-```
-SentenceTransformer(
-  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
-  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-)
-```
 ## Usage
 ### Direct Usage (Sentence Transformers)
@@ -137,7 +126,11 @@ model = SentenceTransformer(model_name)
 ### Samples
-### Samples
 #### [+] Short Sentence Similarity
@@ -304,6 +297,15 @@ This produced a finetuned `Matryoshka` model based on [aubmindlab/bert-base-arab
 - Datasets: 3.0.1
 - Tokenizers: 0.20.1
 ### Citation:
 #### BibTeX:

 # SILMA Arabic Matryoshka Embedding Model 0.1
+The **SILMA Arabic Matryoshka Embedding Model 0.1** is an advanced Arabic text embedding model designed to produce powerful, contextually rich representations of text,
+facilitating a wide range of applications, from semantic search to document classification.
+This model leverages the innovative **Matryoshka** Embedding technique which can be used in different dimensions to optimize the speed, storga, and accuracy trade-offs.
 ## Usage
 ### Direct Usage (Sentence Transformers)
 ### Samples
+Using Matryoshka, you can specify the first `(n)` dimensions to represent each text.
+In the following samples, you can check how each dimension affects the `cosine similarity` between a query and the two inputs.
+You can notice the in most cases, even too low dimension (i.e. 8) can produce acceptable semantic similarity scores.
 #### [+] Short Sentence Similarity
 - Datasets: 3.0.1
 - Tokenizers: 0.20.1
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
 ### Citation:
 #### BibTeX: