bakrianoo commited on
Commit
036228c
1 Parent(s): 06c686a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -15
README.md CHANGED
@@ -98,22 +98,11 @@ language:
98
 
99
  # SILMA Arabic Matryoshka Embedding Model 0.1
100
 
 
 
101
 
102
- ### Model Description
103
- - **Model Type:** Sentence Transformer
104
- - **Base model:** [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) <!-- at revision 016fb9d6768f522a59c6e0d2d5d5d43a4e1bff60 -->
105
- - **Maximum Sequence Length:** 512 tokens
106
- - **Output Dimensionality:** 768 tokens
107
- - **Similarity Function:** Cosine Similarity
108
 
109
- ### Full Model Architecture
110
-
111
- ```
112
- SentenceTransformer(
113
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
114
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
115
- )
116
- ```
117
  ## Usage
118
 
119
  ### Direct Usage (Sentence Transformers)
@@ -137,7 +126,11 @@ model = SentenceTransformer(model_name)
137
 
138
  ### Samples
139
 
140
- ### Samples
 
 
 
 
141
 
142
  #### [+] Short Sentence Similarity
143
 
@@ -304,6 +297,15 @@ This produced a finetuned `Matryoshka` model based on [aubmindlab/bert-base-arab
304
  - Datasets: 3.0.1
305
  - Tokenizers: 0.20.1
306
 
 
 
 
 
 
 
 
 
 
307
  ### Citation:
308
 
309
  #### BibTeX:
 
98
 
99
  # SILMA Arabic Matryoshka Embedding Model 0.1
100
 
101
+ The **SILMA Arabic Matryoshka Embedding Model 0.1** is an advanced Arabic text embedding model designed to produce powerful, contextually rich representations of text,
102
+ facilitating a wide range of applications, from semantic search to document classification.
103
 
104
+ This model leverages the innovative **Matryoshka** Embedding technique which can be used in different dimensions to optimize the speed, storga, and accuracy trade-offs.
 
 
 
 
 
105
 
 
 
 
 
 
 
 
 
106
  ## Usage
107
 
108
  ### Direct Usage (Sentence Transformers)
 
126
 
127
  ### Samples
128
 
129
+ Using Matryoshka, you can specify the first `(n)` dimensions to represent each text.
130
+
131
+ In the following samples, you can check how each dimension affects the `cosine similarity` between a query and the two inputs.
132
+
133
+ You can notice the in most cases, even too low dimension (i.e. 8) can produce acceptable semantic similarity scores.
134
 
135
  #### [+] Short Sentence Similarity
136
 
 
297
  - Datasets: 3.0.1
298
  - Tokenizers: 0.20.1
299
 
300
+ ### Full Model Architecture
301
+
302
+ ```
303
+ SentenceTransformer(
304
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
305
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
306
+ )
307
+ ```
308
+
309
  ### Citation:
310
 
311
  #### BibTeX: