Sentence Similarity
sentence-transformers
Safetensors
Ukrainian
English
xlm-roberta
feature-extraction
Generated from Trainer
dataset_size:523982
loss:MSELoss
Eval Results
text-embeddings-inference

SentenceTransformer based on FacebookAI/xlm-roberta-base

This is a sentence-transformers model finetuned from FacebookAI/xlm-roberta-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

πŸ‘‰ Check out the model on GitHub.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("panalexeu/xlm-roberta-ua-distilled")
# Run inference
sentences = [
    "You'd better consult the doctor.",
    'ΠšΡ€Π°Ρ‰Π΅ ΠΏΡ€ΠΎΠΊΠΎΠ½ΡΡƒΠ»ΡŒΡ‚ΡƒΠΉΡΡ Ρƒ лікаря.',
    'Π‡Ρ… ΠΏΠΎΠ·Π½Π°Ρ‡Π°ΡŽΡ‚ΡŒ як AufklΓ€rungsfahrzeug 93 Ρ‚Π° AufklΓ€rungsfahrzeug 97 Π²Ρ–Π΄ΠΏΠΎΠ²Ρ–Π΄Π½ΠΎ.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

Metric Value
negative_mse -1.1089

Semantic Similarity

Metric sts17-en-en sts17-en-ua sts17-ua-ua
pearson_cosine 0.6785 0.5926 0.6159
spearman_cosine 0.7308 0.6198 0.6446

Training Details

Training Dataset

  • Dataset: parallel-sentences-talks, parallel-sentences-wikimatrix, parallel-sentences-tatoeba
  • Size: 523,982 training samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 5 tokens
    • mean: 21.11 tokens
    • max: 254 tokens
    • min: 4 tokens
    • mean: 23.15 tokens
    • max: 293 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    Her real name is Lydia (γƒͺディを, Ridia), but she was mistaken for a boy and called Ricard. Π‘ΠΏΡ€Π°Π²ΠΆΠ½Ρ” Ρ–ΠΌ'я β€” Лідія, Π°Π»Π΅ Ρ—Ρ— ΠΏΠΎΠΌΠΈΠ»ΠΊΠΎΠ²ΠΎ сприйняли Π·Π° Ρ…Π»ΠΎΠΏΡ‡ΠΈΠΊΠ° Ρ– Π½Π°Π·Π²Π°Π»ΠΈ Π Ρ–ΠΊΠ°Ρ€Π΄. [0.15217968821525574, -0.17830222845077515, -0.12677159905433655, 0.22082313895225525, 0.40085524320602417, ...]
    (Applause) So he didn't just learn water. (АплодисмСнти) Π’Ρ–Π½ Π½Π΅ Ρ‚Ρ–Π»ΡŒΠΊΠΈ Π²ΠΈΠ²Ρ‡ΠΈΠ² слово "Π²ΠΎΠ΄Π°". [-0.1058148592710495, -0.08846072107553482, -0.2684604823589325, -0.105219267308712, 0.3050258755683899, ...]
    It is tightly integrated with SAM, the Storage and Archive Manager, and hence is often referred to as SAM-QFS. Π’ΠΎΠ½Π° тісно Ρ–Π½Ρ‚Π΅Π³Ρ€ΠΎΠ²Π°Π½Π° Π· SAM (Storage and Archive Manager), Ρ‚ΠΎΠΌΡƒ часто Π½Π°Π·ΠΈΠ²Π°Ρ”Ρ‚ΡŒΡΡ SAM-QFS. [0.03270340710878372, -0.45798248052597046, -0.20090211927890778, 0.006579531356692314, -0.03178019821643829, ...]
  • Loss: MSELoss

Evaluation Dataset

  • Dataset: parallel-sentences-talks, parallel-sentences-wikimatrix, parallel-sentences-tatoeba
  • Size: 3,838 evaluation samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 5 tokens
    • mean: 15.64 tokens
    • max: 143 tokens
    • min: 5 tokens
    • mean: 16.98 tokens
    • max: 148 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    I have lost my wallet. Π― Π·Π°Π³ΡƒΠ±ΠΈΠ² Π³Π°ΠΌΠ°Π½Π΅Ρ†ΡŒ. [-0.11186987161636353, -0.03419225662946701, -0.31304317712783813, 0.0838347002863884, 0.108644500374794, ...]
    It's a pharmaceutical product. Π¦Π΅ Ρ„Π°Ρ€ΠΌΠ°Ρ†Π΅Π²Ρ‚ΠΈΡ‡Π½ΠΈΠΉ ΠΏΡ€ΠΎΠ΄ΡƒΠΊΡ‚. [0.04133488982915878, -0.4182000756263733, -0.30786487460136414, -0.09351564198732376, -0.023946482688188553, ...]
    We've all heard of the Casual Friday thing. Всі ΠΌΠΈ Ρ‡ΡƒΠ»ΠΈ ΠΏΡ€ΠΎ «дТинсову ΠΏβ€™ΡΡ‚Π½ΠΈΡ†ΡŽΒ» (Π²Ρ–Π»ΡŒΠ½Π° Ρ„ΠΎΡ€ΠΌΠ° одягу). [-0.10697802156209946, 0.21002227067947388, -0.2513434886932373, -0.3718843460083008, 0.06871984899044037, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 3
  • num_train_epochs: 4
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 3
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss mse-en-ua_negative_mse sts17-en-en_spearman_cosine sts17-en-ua_spearman_cosine sts17-ua-ua_spearman_cosine
0.0938 1024 0.3281 0.0297 -2.9592 0.2325 0.1547 0.2265
0.1876 2048 0.1136 0.2042 -21.6693 0.0553 0.0429 0.2442
0.2814 3072 0.1008 0.0273 -2.7461 0.2666 0.0758 0.2613
0.3752 4096 0.0843 0.0243 -2.4623 0.2541 0.0012 0.3680
0.4690 5120 0.0756 0.0216 -2.2095 0.3933 0.2535 0.4342
0.5628 6144 0.0661 0.0187 -1.9539 0.5739 0.4222 0.5056
0.6566 7168 0.0579 0.0164 -1.7513 0.6184 0.4897 0.5826
0.7504 8192 0.0526 0.0153 -1.6546 0.6219 0.4568 0.5842
0.8442 9216 0.0488 0.0142 -1.5525 0.6160 0.5012 0.5884
0.9380 10240 0.046 0.0135 -1.4957 0.6361 0.5046 0.5969
1.0318 11264 0.0437 0.0130 -1.4506 0.6453 0.5093 0.5939
1.1256 12288 0.0419 0.0125 -1.4049 0.6403 0.5054 0.6020
1.2194 13312 0.0404 0.0122 -1.3794 0.6654 0.5442 0.6182
1.3132 14336 0.0394 0.0118 -1.3434 0.6800 0.5790 0.6291
1.4070 15360 0.0383 0.0115 -1.3184 0.6836 0.5805 0.6301
1.5008 16384 0.0375 0.0114 -1.3067 0.6742 0.5555 0.6055
1.5946 17408 0.0368 0.0111 -1.2864 0.6909 0.5765 0.6256
1.6884 18432 0.036 0.0109 -1.2633 0.6875 0.5801 0.6178
1.7822 19456 0.0353 0.0107 -1.2490 0.7060 0.5959 0.6322
1.8760 20480 0.035 0.0106 -1.2357 0.7127 0.6047 0.6389
1.9698 21504 0.0344 0.0105 -1.2265 0.7265 0.6233 0.6459
2.0636 22528 0.0335 0.0103 -1.2108 0.7184 0.6151 0.6438
2.1574 23552 0.0327 0.0103 -1.2101 0.7122 0.6074 0.6427
2.2512 24576 0.0324 0.0102 -1.1972 0.7232 0.6174 0.6447
2.3450 25600 0.0322 0.0100 -1.1813 0.7217 0.6166 0.6457
2.4388 26624 0.032 0.0099 -1.1745 0.7308 0.6272 0.6534
2.5326 27648 0.0316 0.0098 -1.1673 0.7289 0.6125 0.6441
2.6264 28672 0.0314 0.0098 -1.1622 0.7222 0.6105 0.6365
2.7202 29696 0.0312 0.0097 -1.1593 0.7175 0.6121 0.6348
2.8140 30720 0.0308 0.0096 -1.1457 0.7204 0.6044 0.6377
2.9078 31744 0.0307 0.0095 -1.1411 0.7230 0.6175 0.6353
3.0016 32768 0.0305 0.0095 -1.1414 0.7130 0.6052 0.6340
3.0954 33792 0.0296 0.0095 -1.1360 0.7234 0.6160 0.6411
3.1892 34816 0.0295 0.0094 -1.1317 0.7220 0.6131 0.6396
3.2830 35840 0.0294 0.0094 -1.1306 0.7315 0.6167 0.6505
3.3768 36864 0.0293 0.0094 -1.1263 0.7219 0.6089 0.6450
3.4706 37888 0.0292 0.0093 -1.1225 0.7236 0.6141 0.6451
3.5644 38912 0.0291 0.0093 -1.1204 0.7331 0.6179 0.6460
3.6582 39936 0.029 0.0092 -1.1147 0.7226 0.6127 0.6406
3.7520 40960 0.029 0.0092 -1.1118 0.7245 0.6184 0.6425
3.8458 41984 0.0289 0.0092 -1.1102 0.7279 0.6179 0.6465
3.9396 43008 0.0288 0.0092 -1.1099 0.7298 0.6191 0.6438
3.9997 43664 - 0.0092 -1.1089 0.7308 0.6198 0.6446

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}
Downloads last month
62
Safetensors
Model size
278M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for panalexeu/xlm-roberta-ua-distilled

Finetuned
(3015)
this model

Datasets used to train panalexeu/xlm-roberta-ua-distilled

Space using panalexeu/xlm-roberta-ua-distilled 1

Evaluation results