SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/fine_tuned_model_6")
# Run inference
sentences = [
    'What is the speed of a racing drone?',
    'What is the speed of a racing car?',
    'Who was the first person to swim across the Atlantic?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.8773
cosine_accuracy_threshold 0.8647
cosine_f1 0.8683
cosine_f1_threshold 0.8647
cosine_precision 0.8725
cosine_recall 0.8641
cosine_ap 0.9228
dot_accuracy 0.8773
dot_accuracy_threshold 0.8647
dot_f1 0.8683
dot_f1_threshold 0.8647
dot_precision 0.8725
dot_recall 0.8641
dot_ap 0.9228
manhattan_accuracy 0.8773
manhattan_accuracy_threshold 8.0259
manhattan_f1 0.8704
manhattan_f1_threshold 9.0067
manhattan_precision 0.8319
manhattan_recall 0.9126
manhattan_ap 0.9221
euclidean_accuracy 0.8773
euclidean_accuracy_threshold 0.5201
euclidean_f1 0.8683
euclidean_f1_threshold 0.5201
euclidean_precision 0.8725
euclidean_recall 0.8641
euclidean_ap 0.9228
max_accuracy 0.8773
max_accuracy_threshold 8.0259
max_f1 0.8704
max_f1_threshold 9.0067
max_precision 0.8725
max_recall 0.9126
max_ap 0.9228

Binary Classification

Metric Value
cosine_accuracy 0.8773
cosine_accuracy_threshold 0.8647
cosine_f1 0.8683
cosine_f1_threshold 0.8647
cosine_precision 0.8725
cosine_recall 0.8641
cosine_ap 0.9228
dot_accuracy 0.8773
dot_accuracy_threshold 0.8647
dot_f1 0.8683
dot_f1_threshold 0.8647
dot_precision 0.8725
dot_recall 0.8641
dot_ap 0.9228
manhattan_accuracy 0.8773
manhattan_accuracy_threshold 8.0259
manhattan_f1 0.8704
manhattan_f1_threshold 9.0067
manhattan_precision 0.8319
manhattan_recall 0.9126
manhattan_ap 0.9221
euclidean_accuracy 0.8773
euclidean_accuracy_threshold 0.5201
euclidean_f1 0.8683
euclidean_f1_threshold 0.5201
euclidean_precision 0.8725
euclidean_recall 0.8641
euclidean_ap 0.9228
max_accuracy 0.8773
max_accuracy_threshold 8.0259
max_f1 0.8704
max_f1_threshold 9.0067
max_precision 0.8725
max_recall 0.9126
max_ap 0.9228

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,972 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 12.22 tokens
    • max: 53 tokens
    • min: 4 tokens
    • mean: 11.89 tokens
    • max: 48 tokens
    • 0: ~51.60%
    • 1: ~48.40%
  • Samples:
    sentence1 sentence2 label
    What is the distance between the Earth and Mars? What is the distance between the Earth and Saturn? 0
    Tell me a joke Make me laugh with a joke 1
    How can I make money online with free of cost? How do I to make money online? 1
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 220 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 12.44 tokens
    • max: 44 tokens
    • min: 5 tokens
    • mean: 12.4 tokens
    • max: 55 tokens
    • 0: ~53.18%
    • 1: ~46.82%
  • Samples:
    sentence1 sentence2 label
    Who discovered the structure of DNA? Scientist who identified the double helix 1
    How to create a website from scratch? How to create a blog from scratch? 0
    What is the population of New York City? What is the population of Chicago? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.6615 -
0.3226 10 1.7113 - - -
0.6452 20 0.9588 - - -
0.9677 30 0.9243 - - -
1.0 31 - 0.8485 0.8985 -
1.2903 40 0.689 - - -
1.6129 50 0.4289 - - -
1.9355 60 0.4655 - - -
2.0 62 - 0.8143 0.9203 -
2.2581 70 0.4183 - - -
2.5806 80 0.3038 - - -
2.9032 90 0.2979 - - -
3.0 93 - 0.8121 0.9228 0.9228
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
3
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/fine_tuned_model_6

Finetuned
(58)
this model

Evaluation results