SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iddqd21/fine-tuned-e5-semantic-similarity")
# Run inference
sentences = [
    'Karboksühemoglobiin/hemoglobiin.üld',
    'Carboxyhemoglobin/Hemoglobin.total',
    'Procainamide+N-acetylprocainamide',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 78,879 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 11.64 tokens
    • max: 36 tokens
    • min: 3 tokens
    • mean: 10.26 tokens
    • max: 32 tokens
    • min: 0.0
    • mean: 0.59
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Rakud.CD3+HLA-DR+/100 raku kohta Cells.CD3+HLA-DR+/100 cells 1.0
    Zellen.FMC7/100 Zellen Cells.FMC7/100 cells 1.0
    Apolipoprotéine AI/apolipoprotéine B Apolipoprotein A-I/Apolipoprotein B 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.1014 500 0.0633
0.2028 1000 0.0332
0.3043 1500 0.0296
0.4057 2000 0.0266
0.5071 2500 0.024
0.6085 3000 0.0239
0.7099 3500 0.0216
0.8114 4000 0.0205
0.9128 4500 0.0187
1.0142 5000 0.0185
1.1156 5500 0.0149
1.2170 6000 0.015
1.3185 6500 0.0142
1.4199 7000 0.0152
1.5213 7500 0.0138
1.6227 8000 0.0131
1.7241 8500 0.014
1.8256 9000 0.0133
1.9270 9500 0.0125
2.0284 10000 0.0128
2.1298 10500 0.0093
2.2312 11000 0.0091
2.3327 11500 0.0097
2.4341 12000 0.0096
2.5355 12500 0.0097
2.6369 13000 0.0093
2.7383 13500 0.0099
2.8398 14000 0.0104
2.9412 14500 0.009
3.0426 15000 0.0084
3.1440 15500 0.0065
3.2454 16000 0.0062
3.3469 16500 0.0062
3.4483 17000 0.0068
3.5497 17500 0.0076
3.6511 18000 0.0078
3.7525 18500 0.0068
3.8540 19000 0.008
3.9554 19500 0.0076
4.0568 20000 0.0057
4.1582 20500 0.0054
4.2596 21000 0.0052
4.3611 21500 0.0052
4.4625 22000 0.0056
4.5639 22500 0.0055
4.6653 23000 0.0057
4.7667 23500 0.006
4.8682 24000 0.0054
4.9696 24500 0.0052
5.0710 25000 0.0045
5.1724 25500 0.0039
5.2738 26000 0.0043
5.3753 26500 0.004
5.4767 27000 0.0044
5.5781 27500 0.0045
5.6795 28000 0.0039
5.7809 28500 0.0043
5.8824 29000 0.0047
5.9838 29500 0.0049
6.0852 30000 0.003
6.1866 30500 0.0034
6.2880 31000 0.003
6.3895 31500 0.0031
6.4909 32000 0.0033
6.5923 32500 0.0035
6.6937 33000 0.0037
6.7951 33500 0.0039
6.8966 34000 0.004
6.9980 34500 0.003
7.0994 35000 0.0024
7.2008 35500 0.0026
7.3022 36000 0.0029
7.4037 36500 0.0029
7.5051 37000 0.0025
7.6065 37500 0.0026
7.7079 38000 0.0032
7.8093 38500 0.0032
7.9108 39000 0.0029
8.0122 39500 0.0028
8.1136 40000 0.0024
8.2150 40500 0.0021
8.3164 41000 0.0022
8.4178 41500 0.0022
8.5193 42000 0.0024
8.6207 42500 0.0025
8.7221 43000 0.0023
8.8235 43500 0.0021
8.9249 44000 0.0026
9.0264 44500 0.0025
9.1278 45000 0.0021
9.2292 45500 0.0017
9.3306 46000 0.0022
9.4320 46500 0.002
9.5335 47000 0.0021
9.6349 47500 0.0019
9.7363 48000 0.0021
9.8377 48500 0.002
9.9391 49000 0.0021

Framework Versions

  • Python: 3.9.20
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+rocm6.2
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
7
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for iddqd21/fine-tuned-e5-semantic-similarity

Finetuned
(36)
this model