---
base_model: aubmindlab/bert-base-arabertv02
datasets: []
language: []
library_name: sentence-transformers
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1000000
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: فتى يرتدي اللون الأحمر ينزلق على متن عربة نفخة
sentences:
- اثنان من الشباب الآسيويين يتسكعون
- فتى يلعب على عربة نفخة
- فتى يثقب سكيناً في عربة نفخة
- source_sentence: عامل بناء يقف على رافعة يضع ذراعًا كبيرًا على قمة قمة قيد الإنشاء.
sentences:
- الاطفال يركبون عربة متعة
- شخص يقف
- لا أحد يقف
- source_sentence: رجل مع حفرة طاقة كبيرة يقف بجانب ابنته مع خرطوم المكنسة الكهربائية.
sentences:
- جنديان يحملان أسلحة
- رجل يحمل مثقاب يقف بجانب فتاة تحمل خرطوم كهربائي
- الرجل والفتاة يرسمون الجدران
- source_sentence: رجل يرتدي قميص أسود يعزف على الجيتار.
sentences:
- الرجل يرتدي الأسود.
- هناك رجل يفرغ
- الرجل يرتدي قميصاً أزرق.
- source_sentence: رجل يرتدي قميص (فيجاس) الأحمر يجلس على طاولة ويلعب بالكاميرا
sentences:
- رجل يلعب بالكاميرا
- فتى يقفز في الهواء
- الرجل يقف ويأخذ الصور
model-index:
- name: SentenceTransformer based on aubmindlab/bert-base-arabertv02
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 768
type: sts-test-768
metrics:
- type: pearson_cosine
value: 0.8137491067613172
name: Pearson Cosine
- type: spearman_cosine
value: 0.8139804248887779
name: Spearman Cosine
- type: pearson_manhattan
value: 0.805239691712325
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8071457719582591
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8053105962459932
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8078084689219578
name: Spearman Euclidean
- type: pearson_dot
value: 0.8019135317246738
name: Pearson Dot
- type: spearman_dot
value: 0.7961388104098682
name: Spearman Dot
- type: pearson_max
value: 0.8137491067613172
name: Pearson Max
- type: spearman_max
value: 0.8139804248887779
name: Spearman Max
- type: pearson_cosine
value: 0.8137491067613172
name: Pearson Cosine
- type: spearman_cosine
value: 0.8139804248887779
name: Spearman Cosine
- type: pearson_manhattan
value: 0.805239691712325
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8071457719582591
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8053105962459932
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8078084689219578
name: Spearman Euclidean
- type: pearson_dot
value: 0.8019135317246738
name: Pearson Dot
- type: spearman_dot
value: 0.7961388104098682
name: Spearman Dot
- type: pearson_max
value: 0.8137491067613172
name: Pearson Max
- type: spearman_max
value: 0.8139804248887779
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 512
type: sts-test-512
metrics:
- type: pearson_cosine
value: 0.8127890716639393
name: Pearson Cosine
- type: spearman_cosine
value: 0.813769735512917
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8045619532064516
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.806084784718251
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8047817340341926
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8067787363048019
name: Spearman Euclidean
- type: pearson_dot
value: 0.7985706834990611
name: Pearson Dot
- type: spearman_dot
value: 0.7926669266198092
name: Spearman Dot
- type: pearson_max
value: 0.8127890716639393
name: Pearson Max
- type: spearman_max
value: 0.813769735512917
name: Spearman Max
- type: pearson_cosine
value: 0.8127890716639393
name: Pearson Cosine
- type: spearman_cosine
value: 0.813769735512917
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8045619532064516
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.806084784718251
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8047817340341926
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8067787363048019
name: Spearman Euclidean
- type: pearson_dot
value: 0.7985706834990611
name: Pearson Dot
- type: spearman_dot
value: 0.7926669266198092
name: Spearman Dot
- type: pearson_max
value: 0.8127890716639393
name: Pearson Max
- type: spearman_max
value: 0.813769735512917
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 256
type: sts-test-256
metrics:
- type: pearson_cosine
value: 0.810388221021721
name: Pearson Cosine
- type: spearman_cosine
value: 0.8138356923403065
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8015100804443567
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8026219149891689
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8016089017435591
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8030480833628191
name: Spearman Euclidean
- type: pearson_dot
value: 0.792265476718613
name: Pearson Dot
- type: spearman_dot
value: 0.787067391010805
name: Spearman Dot
- type: pearson_max
value: 0.810388221021721
name: Pearson Max
- type: spearman_max
value: 0.8138356923403065
name: Spearman Max
- type: pearson_cosine
value: 0.810388221021721
name: Pearson Cosine
- type: spearman_cosine
value: 0.8138356923403065
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8015100804443567
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8026219149891689
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8016089017435591
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8030480833628191
name: Spearman Euclidean
- type: pearson_dot
value: 0.792265476718613
name: Pearson Dot
- type: spearman_dot
value: 0.787067391010805
name: Spearman Dot
- type: pearson_max
value: 0.810388221021721
name: Pearson Max
- type: spearman_max
value: 0.8138356923403065
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 128
type: sts-test-128
metrics:
- type: pearson_cosine
value: 0.8071777671061434
name: Pearson Cosine
- type: spearman_cosine
value: 0.8128987608664245
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7969339482985063
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7972524285093451
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7971979787664204
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.797866628579141
name: Spearman Euclidean
- type: pearson_dot
value: 0.7752745908442699
name: Pearson Dot
- type: spearman_dot
value: 0.7685950685903284
name: Spearman Dot
- type: pearson_max
value: 0.8071777671061434
name: Pearson Max
- type: spearman_max
value: 0.8128987608664245
name: Spearman Max
- type: pearson_cosine
value: 0.8071777671061434
name: Pearson Cosine
- type: spearman_cosine
value: 0.8128987608664245
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7969339482985063
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7972524285093451
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7971979787664204
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.797866628579141
name: Spearman Euclidean
- type: pearson_dot
value: 0.7752745908442699
name: Pearson Dot
- type: spearman_dot
value: 0.7685950685903284
name: Spearman Dot
- type: pearson_max
value: 0.8071777671061434
name: Pearson Max
- type: spearman_max
value: 0.8128987608664245
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 64
type: sts-test-64
metrics:
- type: pearson_cosine
value: 0.7992861493805723
name: Pearson Cosine
- type: spearman_cosine
value: 0.809205854296297
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7841737408240652
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7848704254075567
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7865782078684138
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7874610680426495
name: Spearman Euclidean
- type: pearson_dot
value: 0.7341564461014968
name: Pearson Dot
- type: spearman_dot
value: 0.7244607540987561
name: Spearman Dot
- type: pearson_max
value: 0.7992861493805723
name: Pearson Max
- type: spearman_max
value: 0.809205854296297
name: Spearman Max
- type: pearson_cosine
value: 0.7992861493805723
name: Pearson Cosine
- type: spearman_cosine
value: 0.809205854296297
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7841737408240652
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7848704254075567
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7865782078684138
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7874610680426495
name: Spearman Euclidean
- type: pearson_dot
value: 0.7341564461014968
name: Pearson Dot
- type: spearman_dot
value: 0.7244607540987561
name: Spearman Dot
- type: pearson_max
value: 0.7992861493805723
name: Pearson Max
- type: spearman_max
value: 0.809205854296297
name: Spearman Max
---
# SentenceTransformer based on aubmindlab/bert-base-arabertv02
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/Arabert-matro-v4")
# Run inference
sentences = [
'رجل يرتدي قميص (فيجاس) الأحمر يجلس على طاولة ويلعب بالكاميرا',
'رجل يلعب بالكاميرا',
'الرجل يقف ويأخذ الصور',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Semantic Similarity
* Dataset: `sts-test-768`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.8137 |
| **spearman_cosine** | **0.814** |
| pearson_manhattan | 0.8052 |
| spearman_manhattan | 0.8071 |
| pearson_euclidean | 0.8053 |
| spearman_euclidean | 0.8078 |
| pearson_dot | 0.8019 |
| spearman_dot | 0.7961 |
| pearson_max | 0.8137 |
| spearman_max | 0.814 |
#### Semantic Similarity
* Dataset: `sts-test-512`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8128 |
| **spearman_cosine** | **0.8138** |
| pearson_manhattan | 0.8046 |
| spearman_manhattan | 0.8061 |
| pearson_euclidean | 0.8048 |
| spearman_euclidean | 0.8068 |
| pearson_dot | 0.7986 |
| spearman_dot | 0.7927 |
| pearson_max | 0.8128 |
| spearman_max | 0.8138 |
#### Semantic Similarity
* Dataset: `sts-test-256`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8104 |
| **spearman_cosine** | **0.8138** |
| pearson_manhattan | 0.8015 |
| spearman_manhattan | 0.8026 |
| pearson_euclidean | 0.8016 |
| spearman_euclidean | 0.803 |
| pearson_dot | 0.7923 |
| spearman_dot | 0.7871 |
| pearson_max | 0.8104 |
| spearman_max | 0.8138 |
#### Semantic Similarity
* Dataset: `sts-test-128`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8072 |
| **spearman_cosine** | **0.8129** |
| pearson_manhattan | 0.7969 |
| spearman_manhattan | 0.7973 |
| pearson_euclidean | 0.7972 |
| spearman_euclidean | 0.7979 |
| pearson_dot | 0.7753 |
| spearman_dot | 0.7686 |
| pearson_max | 0.8072 |
| spearman_max | 0.8129 |
#### Semantic Similarity
* Dataset: `sts-test-64`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.7993 |
| **spearman_cosine** | **0.8092** |
| pearson_manhattan | 0.7842 |
| spearman_manhattan | 0.7849 |
| pearson_euclidean | 0.7866 |
| spearman_euclidean | 0.7875 |
| pearson_dot | 0.7342 |
| spearman_dot | 0.7245 |
| pearson_max | 0.7993 |
| spearman_max | 0.8092 |
#### Semantic Similarity
* Dataset: `sts-test-768`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:----------|
| pearson_cosine | 0.8137 |
| **spearman_cosine** | **0.814** |
| pearson_manhattan | 0.8052 |
| spearman_manhattan | 0.8071 |
| pearson_euclidean | 0.8053 |
| spearman_euclidean | 0.8078 |
| pearson_dot | 0.8019 |
| spearman_dot | 0.7961 |
| pearson_max | 0.8137 |
| spearman_max | 0.814 |
#### Semantic Similarity
* Dataset: `sts-test-512`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8128 |
| **spearman_cosine** | **0.8138** |
| pearson_manhattan | 0.8046 |
| spearman_manhattan | 0.8061 |
| pearson_euclidean | 0.8048 |
| spearman_euclidean | 0.8068 |
| pearson_dot | 0.7986 |
| spearman_dot | 0.7927 |
| pearson_max | 0.8128 |
| spearman_max | 0.8138 |
#### Semantic Similarity
* Dataset: `sts-test-256`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8104 |
| **spearman_cosine** | **0.8138** |
| pearson_manhattan | 0.8015 |
| spearman_manhattan | 0.8026 |
| pearson_euclidean | 0.8016 |
| spearman_euclidean | 0.803 |
| pearson_dot | 0.7923 |
| spearman_dot | 0.7871 |
| pearson_max | 0.8104 |
| spearman_max | 0.8138 |
#### Semantic Similarity
* Dataset: `sts-test-128`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.8072 |
| **spearman_cosine** | **0.8129** |
| pearson_manhattan | 0.7969 |
| spearman_manhattan | 0.7973 |
| pearson_euclidean | 0.7972 |
| spearman_euclidean | 0.7979 |
| pearson_dot | 0.7753 |
| spearman_dot | 0.7686 |
| pearson_max | 0.8072 |
| spearman_max | 0.8129 |
#### Semantic Similarity
* Dataset: `sts-test-64`
* Evaluated with [EmbeddingSimilarityEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| pearson_cosine | 0.7993 |
| **spearman_cosine** | **0.8092** |
| pearson_manhattan | 0.7842 |
| spearman_manhattan | 0.7849 |
| pearson_euclidean | 0.7866 |
| spearman_euclidean | 0.7875 |
| pearson_dot | 0.7342 |
| spearman_dot | 0.7245 |
| pearson_max | 0.7993 |
| spearman_max | 0.8092 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 1,000,000 training samples
* Columns: anchor
, positive
, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string | string |
| details |
ما الذي تتجنبه؟
| ما الذي تحاولين تجنبه دائماً؟
| أنا في حالة اكتئاب ماذا يجب أن أفعل؟
|
| رجل يقف عند لافتة صفراء
| رجل يقترب من علامة
| رجل بجانب لافتة زرقاء
|
| لماذا قام (مودي) بحظر أوراق نقدية بقيمة 500 و 1000 روبية؟
| لماذا قام مودي بإلغاء عملة الـ 500 و 1000 روبية؟ وما سبب إدخال عملة الـ 2000 روبية فجأة؟
| ما هو أفضل خيار بعد الانتهاء من البكالوريوس في الهندسة الميكانيكية؟
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Evaluation Dataset
#### Omartificial-Intelligence-Space/arabic-n_li-triplet
* Dataset: Omartificial-Intelligence-Space/arabic-n_li-triplet
* Size: 6,584 evaluation samples
* Columns: anchor
, positive
, and negative
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
| type | string | string | string |
| details | امرأتان يتعانقان بينما يحملان حزمة
| إمرأتان يحملان حزمة
| الرجال يتشاجرون خارج مطعم
|
| طفلين صغيرين يرتديان قميصاً أزرق، أحدهما يرتدي الرقم 9 والآخر يرتدي الرقم 2 يقفان على خطوات خشبية في الحمام ويغسلان أيديهما في المغسلة.
| طفلين يرتديان قميصاً مرقماً يغسلون أيديهم
| طفلين يرتديان سترة يذهبان إلى المدرسة
|
| رجل يبيع الدونات لعميل خلال معرض عالمي أقيم في مدينة أنجليس
| رجل يبيع الدونات لعميل
| امرأة تشرب قهوتها في مقهى صغير
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 64
- `per_device_eval_batch_size`: 64
- `warmup_ratio`: 0.1
- `fp16`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters