SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/fine_tuned_model_2")
# Run inference
sentences = [
    'How do you make a paper boat?',
    'How do you make a paper airplane?',
    'What are the benefits of using solar energy?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9478
cosine_accuracy_threshold 0.6633
cosine_f1 0.9559
cosine_f1_threshold 0.6633
cosine_precision 0.9155
cosine_recall 1.0
cosine_ap 0.9777
dot_accuracy 0.9478
dot_accuracy_threshold 0.6633
dot_f1 0.9559
dot_f1_threshold 0.6633
dot_precision 0.9155
dot_recall 1.0
dot_ap 0.9777
manhattan_accuracy 0.9391
manhattan_accuracy_threshold 9.6031
manhattan_f1 0.9489
manhattan_f1_threshold 12.6607
manhattan_precision 0.9028
manhattan_recall 1.0
manhattan_ap 0.9756
euclidean_accuracy 0.9478
euclidean_accuracy_threshold 0.8205
euclidean_f1 0.9559
euclidean_f1_threshold 0.8205
euclidean_precision 0.9155
euclidean_recall 1.0
euclidean_ap 0.9777
max_accuracy 0.9478
max_accuracy_threshold 9.6031
max_f1 0.9559
max_f1_threshold 12.6607
max_precision 0.9155
max_recall 1.0
max_ap 0.9777

Binary Classification

Metric Value
cosine_accuracy 0.9478
cosine_accuracy_threshold 0.7873
cosine_f1 0.9559
cosine_f1_threshold 0.6543
cosine_precision 0.9155
cosine_recall 1.0
cosine_ap 0.9777
dot_accuracy 0.9478
dot_accuracy_threshold 0.7873
dot_f1 0.9559
dot_f1_threshold 0.6543
dot_precision 0.9155
dot_recall 1.0
dot_ap 0.9777
manhattan_accuracy 0.9478
manhattan_accuracy_threshold 11.1232
manhattan_f1 0.9559
manhattan_f1_threshold 12.8623
manhattan_precision 0.9155
manhattan_recall 1.0
manhattan_ap 0.9774
euclidean_accuracy 0.9478
euclidean_accuracy_threshold 0.6522
euclidean_f1 0.9559
euclidean_f1_threshold 0.8315
euclidean_precision 0.9155
euclidean_recall 1.0
euclidean_ap 0.9777
max_accuracy 0.9478
max_accuracy_threshold 11.1232
max_f1 0.9559
max_f1_threshold 12.8623
max_precision 0.9155
max_recall 1.0
max_ap 0.9777

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,030 training samples
  • Columns: label, sentence2, and sentence1
  • Approximate statistics based on the first 1000 samples:
    label sentence2 sentence1
    type int string string
    details
    • 0: ~49.60%
    • 1: ~50.40%
    • min: 4 tokens
    • mean: 10.27 tokens
    • max: 22 tokens
    • min: 6 tokens
    • mean: 10.9 tokens
    • max: 22 tokens
  • Samples:
    label sentence2 sentence1
    1 Speed of sound in air What is the speed of sound?
    1 World's most popular tourist destination What is the most visited tourist attraction in the world?
    1 How do I write a resume? How do I create a resume?
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.6,
        "size_average": true
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 115 evaluation samples
  • Columns: label, sentence2, and sentence1
  • Approximate statistics based on the first 1000 samples:
    label sentence2 sentence1
    type int string string
    details
    • 0: ~43.48%
    • 1: ~56.52%
    • min: 5 tokens
    • mean: 10.04 tokens
    • max: 15 tokens
    • min: 6 tokens
    • mean: 10.81 tokens
    • max: 20 tokens
  • Samples:
    label sentence2 sentence1
    0 What methods are used to measure a nation's GDP? How is the GDP of a country measured?
    0 What is the currency of Japan? What is the currency of China?
    1 Steps to cultivate tomatoes at home How to grow tomatoes in a garden?
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.6,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • weight_decay: 0.01
  • num_train_epochs: 8
  • lr_scheduler_type: reduce_lr_on_plateau
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 8
  • max_steps: -1
  • lr_scheduler_type: reduce_lr_on_plateau
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.7625 -
0.6061 10 0.0417 - - -
0.9697 16 - 0.0119 0.9695 -
1.2121 20 0.0189 - - -
1.8182 30 0.0148 - - -
2.0 33 - 0.0102 0.9741 -
2.4242 40 0.0114 - - -
2.9697 49 - 0.0098 0.9752 -
3.0303 50 0.009 - - -
3.6364 60 0.008 - - -
4.0 66 - 0.0095 0.9778 -
4.2424 70 0.0065 - - -
4.8485 80 0.0056 - - -
4.9697 82 - 0.0092 0.9749 -
5.4545 90 0.0056 - - -
6.0 99 - 0.0088 0.9766 -
6.0606 100 0.0045 - - -
6.6667 110 0.0044 - - -
6.9697 115 - 0.0087 0.9777 -
7.2727 120 0.0038 - - -
7.7576 128 - 0.0090 0.9777 0.9777
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)}, 
    title={Dimensionality Reduction by Learning an Invariant Mapping}, 
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
6
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/fine_tuned_model_2

Finetuned
(58)
this model

Evaluation results