SentenceTransformer based on FacebookAI/xlm-roberta-base

This is a sentence-transformers model finetuned from FacebookAI/xlm-roberta-base on the en-ar, en-fr, en-de, en-es, en-tr and en-it datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: FacebookAI/xlm-roberta-base
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Datasets:
- en-ar
- en-fr
- en-de
- en-es
- en-tr
- en-it
Languages: en, multilingual, ar, bg, ca, cs, da, de, el, es, et, fa, fi, fr, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/xlm-roberta-base-multilingual-en-ar-fr-de-es-tr-it")
# Run inference
sentences = [
    'Wir sind eins.',
    'Das versuchen wir zu bieten.',
    'Ihre Gehirne sind ungefähr 100 Millionen Mal komplizierter.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Dataset: en-ar
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-20.3955

Translation

Dataset: en-ar
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.7603
trg2src_accuracy	0.7825
mean_accuracy	0.7714

Semantic Similarity

Dataset: sts17-en-ar-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.4098
spearman_cosine	0.4425
pearson_manhattan	0.4069
spearman_manhattan	0.4194
pearson_euclidean	0.3801
spearman_euclidean	0.3865
pearson_dot	0.4078
spearman_dot	0.3768
pearson_max	0.4098
spearman_max	0.4425

Knowledge Distillation

Dataset: en-fr
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-19.6232

Translation

Dataset: en-fr
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.8982
trg2src_accuracy	0.8901
mean_accuracy	0.8942

Semantic Similarity

Dataset: sts17-fr-en-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5018
spearman_cosine	0.5334
pearson_manhattan	0.4461
spearman_manhattan	0.4547
pearson_euclidean	0.4431
spearman_euclidean	0.4481
pearson_dot	0.4017
spearman_dot	0.4134
pearson_max	0.5018
spearman_max	0.5334

Knowledge Distillation

Dataset: en-de
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-19.7279

Translation

Dataset: en-de
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.892
trg2src_accuracy	0.891
mean_accuracy	0.8915

Semantic Similarity

Dataset: sts17-en-de-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5263
spearman_cosine	0.5618
pearson_manhattan	0.5085
spearman_manhattan	0.5218
pearson_euclidean	0.5055
spearman_euclidean	0.5206
pearson_dot	0.3742
spearman_dot	0.3691
pearson_max	0.5263
spearman_max	0.5618

Knowledge Distillation

Dataset: en-es
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-19.4724

Translation

Dataset: en-es
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.9434
trg2src_accuracy	0.9465
mean_accuracy	0.9449

Semantic Similarity

Dataset: sts17-es-en-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.4945
spearman_cosine	0.5021
pearson_manhattan	0.4445
spearman_manhattan	0.4284
pearson_euclidean	0.4357
spearman_euclidean	0.417
pearson_dot	0.3751
spearman_dot	0.3796
pearson_max	0.4945
spearman_max	0.5021

Knowledge Distillation

Dataset: en-tr
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-20.7547

Translation

Dataset: en-tr
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.7432
trg2src_accuracy	0.7432
mean_accuracy	0.7432

Semantic Similarity

Dataset: sts17-en-tr-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5545
spearman_cosine	0.5819
pearson_manhattan	0.5104
spearman_manhattan	0.5088
pearson_euclidean	0.5046
spearman_euclidean	0.5053
pearson_dot	0.4726
spearman_dot	0.4298
pearson_max	0.5545
spearman_max	0.5819

Knowledge Distillation

Dataset: en-it
Evaluated with MSEEvaluator

Metric	Value
negative_mse	-19.7699

Translation

Dataset: en-it
Evaluated with TranslationEvaluator

Metric	Value
src2trg_accuracy	0.8781
trg2src_accuracy	0.8832
mean_accuracy	0.8807

Semantic Similarity

Dataset: sts17-it-en-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.5064
spearman_cosine	0.525
pearson_manhattan	0.4517
spearman_manhattan	0.4623
pearson_euclidean	0.4423
spearman_euclidean	0.4507
pearson_dot	0.4202
spearman_dot	0.4225
pearson_max	0.5064
spearman_max	0.525

Training Details

Training Datasets

en-ar

Dataset: en-ar at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 27.3 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 27.3 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`حسناً ان ما نقوم به اليوم .. هو ان نجبر الطلاب لتعلم الرياضيات`	`[0.3943225145339966, 0.18910610675811768, -0.3788299858570099, 0.4386662542819977, 0.2727023661136627, ...]`
`انها المادة الاهم ..`	`[0.6257511377334595, -0.1750679910182953, -0.5734405517578125, 0.11480475962162018, 1.1682192087173462, ...]`
`انا لا انفي لدقيقة واحدة ان الذين يهتمون بالحسابات اليدوية والذين هوايتهم القيام بذلك .. او القيام بالطرق التقليدية في اي مجال ان يقوموا بذلك كما يريدون .`	`[-0.04564047232270241, 0.4971524775028229, 0.28066301345825195, -0.726702094078064, -0.17846377193927765, ...]`

Loss: MSELoss

en-fr

Dataset: en-fr at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 3 tokens
mean: 30.18 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 3 tokens mean: 30.18 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Je ne crois pas que ce soit justifié.`	`[-0.361753910779953, 0.7323777079582214, 0.6518164277076721, -0.8461216688156128, -0.007496988866478205, ...]`
`Je fais cette distinction entre ce qu'on force les gens à faire et les matières générales, et la matière que quelqu'un va apprendre parce que ça lui plait et peut-être même exceller dans ce domaine.`	`[0.3047865629196167, 0.5270194411277771, 0.26616284251213074, 0.2612147927284241, 0.1950961947441101, ...]`
`Quels sont les problèmes en relation avec ça?`	`[0.2123892903327942, -0.09616081416606903, -0.41965243220329285, -0.5469444394111633, -0.6056491136550903, ...]`

Loss: MSELoss

en-de

Dataset: en-de at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 27.04 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 27.04 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Ich denke, dass es sich aus diesem Grund lohnt, den Leuten das Rechnen von Hand beizubringen.`	`[0.0960279330611229, 0.7833179831504822, -0.09527698159217834, 0.8104371428489685, 0.7545774579048157, ...]`
`Außerdem gibt es ein paar bestimmte konzeptionelle Dinge, die das Rechnen per Hand rechtfertigen, aber ich glaube es sind sehr wenige.`	`[-0.5939837098121643, 0.9714100956916809, 0.6800686717033386, -0.21585524082183838, -0.7509503364562988, ...]`
`Eine Sache, die ich mich oft frage, ist Altgriechisch, und wie das zusammengehört.`	`[-0.09777048230171204, 0.07093209028244019, -0.42989012598991394, -0.1457514613866806, 1.4382753372192383, ...]`

Loss: MSELoss

en-es

Dataset: en-es at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 25.42 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 25.42 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Y luego hay ciertas aspectos conceptuales que pueden beneficiarse del cálculo a mano pero creo que son relativamente pocos.`	`[-0.5939835906028748, 0.9714106917381287, 0.6800685524940491, -0.2158554196357727, -0.7509507536888123, ...]`
`Algo que pregunto a menudo es sobre el griego antiguo y cómo se relaciona.`	`[-0.09777048230171204, 0.07093209028244019, -0.42989012598991394, -0.1457514613866806, 1.4382753372192383, ...]`
`Vean, lo que estamos haciendo ahora es forzar a la gente a aprender matemáticas.`	`[0.3943225145339966, 0.18910610675811768, -0.3788299858570099, 0.4386662542819977, 0.2727023661136627, ...]`

Loss: MSELoss

en-tr

Dataset: en-tr at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 24.72 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 24.72 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Eğer insanlar elle hesaba ilgililerse ya da öğrenmek için özel amaçları varsa konu ne kadar acayip olursa olsun bunu öğrenmeliler, engellemeyi bir an için bile önermiyorum.`	`[-0.04564047232270241, 0.4971524775028229, 0.28066301345825195, -0.726702094078064, -0.17846377193927765, ...]`
`İnsanların kendi ilgi alanlarını takip etmeleri, kesinlikle doğru bir şeydir.`	`[0.2061387449502945, 0.5284574031829834, 0.3577779233455658, 0.28818392753601074, 0.17228049039840698, ...]`
`Ben bir biçimde Antik Yunan hakkında ilgiliyimdir. ancak tüm nüfusu Antik Yunan gibi bir konu hakkında bilgi edinmeye zorlamamalıyız.`	`[0.12050342559814453, 0.15652479231357574, 0.48636534810066223, -0.13693244755268097, 0.42764803767204285, ...]`

Loss: MSELoss

en-it

Dataset: en-it at d366ddd
Size: 5,000 training samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 3 tokens
mean: 26.41 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 3 tokens mean: 26.41 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Non credo che sia giustificato.`	`[-0.36175352334976196, 0.7323781251907349, 0.651816189289093, -0.8461223840713501, -0.007496151141822338, ...]`
`Perciò faccio distinzione tra quello che stiamo facendo fare alle persone, le materie che si ritengono principali, e le materie che le persone potrebbero seguire per loro interesse o forse a volte anche incitate a farlo.`	`[0.3047865927219391, 0.5270194411277771, 0.26616284251213074, 0.2612147927284241, 0.1950961947441101, ...]`
`Ma che argomenti porta la gente su questi temi?`	`[0.2123885154724121, -0.09616123884916306, -0.4196523427963257, -0.5469440817832947, -0.6056501865386963, ...]`

Loss: MSELoss

Evaluation Datasets

en-ar

Dataset: en-ar at d366ddd
Size: 993 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 3 tokens
mean: 28.03 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 3 tokens mean: 28.03 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`شكرا جزيلا كريس.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`انه فعلا شرف عظيم لي ان أصعد المنصة للمرة الثانية. أنا في غاية الامتنان.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`لقد بهرت فعلا بهذا المؤتمر, وأريد أن أشكركم جميعا على تعليقاتكم الطيبة على ما قلته تلك الليلة.`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

en-fr

Dataset: en-fr at d366ddd
Size: 992 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 30.72 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 30.72 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Merci beaucoup, Chris.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`C'est vraiment un honneur de pouvoir venir sur cette scène une deuxième fois. Je suis très reconnaissant.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`J'ai été très impressionné par cette conférence, et je tiens à vous remercier tous pour vos nombreux et sympathiques commentaires sur ce que j'ai dit l'autre soir.`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

en-de

Dataset: en-de at d366ddd
Size: 991 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 27.71 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 27.71 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Vielen Dank, Chris.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`Es ist mir wirklich eine Ehre, zweimal auf dieser Bühne stehen zu dürfen. Tausend Dank dafür.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`Ich bin wirklich begeistert von dieser Konferenz, und ich danke Ihnen allen für die vielen netten Kommentare zu meiner Rede vorgestern Abend.`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

en-es

Dataset: en-es at d366ddd
Size: 990 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 26.47 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 26.47 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Muchas gracias Chris.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`Y es en verdad un gran honor tener la oportunidad de venir a este escenario por segunda vez. Estoy extremadamente agradecido.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`He quedado conmovido por esta conferencia, y deseo agradecer a todos ustedes sus amables comentarios acerca de lo que tenía que decir la otra noche.`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

en-tr

Dataset: en-tr at d366ddd
Size: 993 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 25.4 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 25.4 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Çok teşekkür ederim Chris.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`Bu sahnede ikinci kez yer alma fırsatına sahip olmak gerçekten büyük bir onur. Çok minnettarım.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`Bu konferansta çok mutlu oldum, ve anlattıklarımla ilgili güzel yorumlarınız için sizlere çok teşekkür ederim.`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

en-it

Dataset: en-it at d366ddd
Size: 993 evaluation samples
Columns: non_english and label
Approximate statistics based on the first 1000 samples:
non_english label
type string list
details
min: 4 tokens
mean: 27.94 tokens
max: 128 tokens

size: 768 elements

	non_english	label
type	string	list
details	min: 4 tokens mean: 27.94 tokens max: 128 tokens	size: 768 elements

Samples:

non_english	label
`Grazie mille, Chris.`	`[-0.4331263303756714, 1.0602688789367676, -0.07791043072938919, -0.4170420169830322, 1.6768444776535034, ...]`
`E’ veramente un grande onore venire su questo palco due volte. Vi sono estremamente grato.`	`[0.27005696296691895, 0.5391750335693359, -0.2580486238002777, -0.6613674759864807, 0.6738830804824829, ...]`
`Sono impressionato da questa conferenza, e voglio ringraziare tutti voi per i tanti, lusinghieri commenti, anche perché... Ne ho bisogno!!`	`[-0.25320106744766235, 0.04791366308927536, -0.13174884021282196, -0.7357578277587891, 0.2366354614496231, ...]`

Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: False
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: None
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	en-ar loss	en-it loss	en-de loss	en-fr loss	en-es loss	en-tr loss	en-ar_mean_accuracy	en-ar_negative_mse	en-de_mean_accuracy	en-de_negative_mse	en-es_mean_accuracy	en-es_negative_mse	en-fr_mean_accuracy	en-fr_negative_mse	en-it_mean_accuracy	en-it_negative_mse	en-tr_mean_accuracy	en-tr_negative_mse	sts17-en-ar-test_spearman_max	sts17-en-de-test_spearman_max	sts17-en-tr-test_spearman_max	sts17-es-en-test_spearman_max	sts17-fr-en-test_spearman_max	sts17-it-en-test_spearman_max
0.2110	100	0.5581	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4219	200	0.3071	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6329	300	0.2675	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8439	400	0.2606	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.0549	500	0.2589	0.2519	0.2498	0.2511	0.2488	0.2503	0.2512	0.1254	-25.1903	0.2523	-25.1089	0.2591	-25.0276	0.2409	-24.8803	0.2180	-24.9768	0.1158	-25.1219	0.0308	0.1281	0.1610	0.1465	0.0552	0.0518
1.2658	600	0.2504	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.4768	700	0.2427	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.6878	800	0.2337	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.8987	900	0.2246	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2.1097	1000	0.2197	0.2202	0.2157	0.2151	0.2147	0.2139	0.2218	0.5841	-22.0204	0.8012	-21.5087	0.8495	-21.3935	0.7959	-21.4660	0.7815	-21.5699	0.6007	-22.1778	0.3346	0.4013	0.4727	0.3353	0.3827	0.3292
2.3207	1100	0.2163	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2.5316	1200	0.2123	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2.7426	1300	0.2069	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
2.9536	1400	0.2048	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
3.1646	1500	0.2009	0.2086	0.2029	0.2022	0.2012	0.2002	0.2111	0.7367	-20.8567	0.8739	-20.2247	0.9303	-20.0215	0.8755	-20.1213	0.8600	-20.2900	0.7165	-21.1119	0.4087	0.5473	0.5551	0.4724	0.4882	0.4690
3.3755	1600	0.2019	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
3.5865	1700	0.1989	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
3.7975	1800	0.196	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4.0084	1900	0.1943	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4.2194	2000	0.194	0.2040	0.1977	0.1973	0.1962	0.1947	0.2075	0.7714	-20.3955	0.8915	-19.7279	0.9449	-19.4724	0.8942	-19.6232	0.8807	-19.7699	0.7432	-20.7547	0.4425	0.5618	0.5819	0.5021	0.5334	0.5250
4.4304	2100	0.1951	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4.6414	2200	0.1928	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
4.8523	2300	0.1909	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 0.060 kWh
Carbon Emitted: 0.023 kg of CO2
Hours Used: 0.179 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 3.0.0.dev0
Transformers: 4.41.0.dev0
PyTorch: 2.3.0+cu121
Accelerate: 0.26.1
Datasets: 2.18.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

SentenceTransformer based on FacebookAI/xlm-roberta-base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Knowledge Distillation

Translation

Semantic Similarity

Knowledge Distillation

Translation

Semantic Similarity

Knowledge Distillation

Translation

Semantic Similarity

Knowledge Distillation

Translation

Semantic Similarity

Knowledge Distillation

Translation

Semantic Similarity

Knowledge Distillation

Translation

Semantic Similarity

Training Details

Training Datasets

en-ar

en-fr

en-de

en-es

en-tr

en-it

Evaluation Datasets

en-ar

en-fr

en-de

en-es

en-tr

en-it

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Environmental Impact

Training Hardware

Framework Versions

Citation

BibTeX

Sentence Transformers

MSELoss

Model tree for tomaarsen/xlm-roberta-base-multilingual-en-ar-fr-de-es-tr-it

Evaluation results