SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mspy/twitter-paraphrase-embeddings")
# Run inference
sentences = [
    'Calum I love you plz follow me',
    'CALUM PLEASE BE MY FIRST CELEBRITY TO FOLLOW ME',
    'Walking around downtown Chicago in a dress and listening to the new Iggy Pop',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6949
spearman_cosine 0.6626
pearson_manhattan 0.6881
spearman_manhattan 0.6631
pearson_euclidean 0.688
spearman_euclidean 0.6626
pearson_dot 0.6949
spearman_dot 0.6626
pearson_max 0.6949
spearman_max 0.6631

Training Details

Training Dataset

Unnamed Dataset

  • Size: 13,063 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 7 tokens
    • mean: 11.16 tokens
    • max: 28 tokens
    • min: 7 tokens
    • mean: 12.31 tokens
    • max: 22 tokens
    • min: 0.0
    • mean: 0.33
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    EJ Manuel the 1st QB to go in this draft But my bro from the 757 EJ Manuel is the 1st QB gone 1.0
    EJ Manuel the 1st QB to go in this draft Can believe EJ Manuel went as the 1st QB in the draft 1.0
    EJ Manuel the 1st QB to go in this draft EJ MANUEL IS THE 1ST QB what 0.6
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 4,727 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 7 tokens
    • mean: 10.04 tokens
    • max: 16 tokens
    • min: 7 tokens
    • mean: 12.22 tokens
    • max: 26 tokens
    • min: 0.0
    • mean: 0.33
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    A Walk to Remember is the definition of true love A Walk to Remember is on and Im in town and Im upset 0.2
    A Walk to Remember is the definition of true love A Walk to Remember is the cutest thing 0.6
    A Walk to Remember is the definition of true love A walk to remember is on ABC family youre welcome 0.2
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss spearman_cosine
0.1225 100 - 0.0729 0.6058
0.2449 200 - 0.0646 0.6340
0.3674 300 - 0.0627 0.6397
0.4899 400 - 0.0621 0.6472
0.6124 500 0.0627 0.0626 0.6496
0.7348 600 - 0.0621 0.6446
0.8573 700 - 0.0593 0.6695
0.9798 800 - 0.0636 0.6440
1.1023 900 - 0.0618 0.6525
1.2247 1000 0.0383 0.0604 0.6639
1.3472 1100 - 0.0608 0.6590
1.4697 1200 - 0.0620 0.6504
1.5922 1300 - 0.0617 0.6467
1.7146 1400 - 0.0615 0.6574
1.8371 1500 0.0293 0.0622 0.6536
1.9596 1600 - 0.0609 0.6599
2.0821 1700 - 0.0605 0.6658
2.2045 1800 - 0.0615 0.6588
2.3270 1900 - 0.0615 0.6575
2.4495 2000 0.0215 0.0614 0.6598
2.5720 2100 - 0.0603 0.6681
2.6944 2200 - 0.0606 0.6669
2.8169 2300 - 0.0605 0.6642
2.9394 2400 - 0.0606 0.6630
3.0618 2500 0.018 0.0611 0.6616
3.1843 2600 - 0.0611 0.6619
3.3068 2700 - 0.0611 0.6608
3.4293 2800 - 0.0608 0.6632
3.5517 2900 - 0.0608 0.6623
3.6742 3000 0.014 0.0615 0.6596
3.7967 3100 - 0.0612 0.6616
3.9192 3200 - 0.0610 0.6626

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.43.3
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mspy/twitter-paraphrase-embeddings

Finetuned
(208)
this model

Evaluation results