mspy's picture
Add new SentenceTransformer model.
68207ad verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:13063
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: I cant wait to leave Chicago
    sentences:
      - This is the shit Chicago needs to be recognized for not Keef
      - is candice singing again tonight
      - half time Chelsea were losing 10
  - source_sentence: Andre miller best lobbing pg in the game
    sentences:
      - Am I the only one who dont get Amber alert
      - Backstrom hurt in warmup Harding could start
      - Andre miller is even slower in person
  - source_sentence: Bayless couldve dunked that from the free throw
    sentences:
      - but what great finger roll by Bayless
      - Wow Bayless has to make EspnSCTop with that end of 3rd
      - i mean calum u didnt follow
  - source_sentence: Backstrom Hurt in warmups Harding gets the start
    sentences:
      - Should I go to Nashville or Chicago for my 17th birthday
      - I hate Chelsea possibly more than most
      - Of course Backstrom would get injured during warmups
  - source_sentence: Calum I love you plz follow me
    sentences:
      - CALUM PLEASE BE MY FIRST CELEBRITY TO FOLLOW ME
      - >-
        Walking around downtown Chicago in a dress and listening to the new Iggy
        Pop
      - I think Candice has what it takes to win American Idol AND Angie too
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: pearson_cosine
            value: 0.6949485250178733
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6626359968437283
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.688092975176289
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6630998028133662
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.6880277270034267
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6626358741747785
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.694948520847878
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6626359082695851
            name: Spearman Dot
          - type: pearson_max
            value: 0.6949485250178733
            name: Pearson Max
          - type: spearman_max
            value: 0.6630998028133662
            name: Spearman Max

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mspy/twitter-paraphrase-embeddings")
# Run inference
sentences = [
    'Calum I love you plz follow me',
    'CALUM PLEASE BE MY FIRST CELEBRITY TO FOLLOW ME',
    'Walking around downtown Chicago in a dress and listening to the new Iggy Pop',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6949
spearman_cosine 0.6626
pearson_manhattan 0.6881
spearman_manhattan 0.6631
pearson_euclidean 0.688
spearman_euclidean 0.6626
pearson_dot 0.6949
spearman_dot 0.6626
pearson_max 0.6949
spearman_max 0.6631

Training Details

Training Dataset

Unnamed Dataset

  • Size: 13,063 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 7 tokens
    • mean: 11.16 tokens
    • max: 28 tokens
    • min: 7 tokens
    • mean: 12.31 tokens
    • max: 22 tokens
    • min: 0.0
    • mean: 0.33
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    EJ Manuel the 1st QB to go in this draft But my bro from the 757 EJ Manuel is the 1st QB gone 1.0
    EJ Manuel the 1st QB to go in this draft Can believe EJ Manuel went as the 1st QB in the draft 1.0
    EJ Manuel the 1st QB to go in this draft EJ MANUEL IS THE 1ST QB what 0.6
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 4,727 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 7 tokens
    • mean: 10.04 tokens
    • max: 16 tokens
    • min: 7 tokens
    • mean: 12.22 tokens
    • max: 26 tokens
    • min: 0.0
    • mean: 0.33
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    A Walk to Remember is the definition of true love A Walk to Remember is on and Im in town and Im upset 0.2
    A Walk to Remember is the definition of true love A Walk to Remember is the cutest thing 0.6
    A Walk to Remember is the definition of true love A walk to remember is on ABC family youre welcome 0.2
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss spearman_cosine
0.1225 100 - 0.0729 0.6058
0.2449 200 - 0.0646 0.6340
0.3674 300 - 0.0627 0.6397
0.4899 400 - 0.0621 0.6472
0.6124 500 0.0627 0.0626 0.6496
0.7348 600 - 0.0621 0.6446
0.8573 700 - 0.0593 0.6695
0.9798 800 - 0.0636 0.6440
1.1023 900 - 0.0618 0.6525
1.2247 1000 0.0383 0.0604 0.6639
1.3472 1100 - 0.0608 0.6590
1.4697 1200 - 0.0620 0.6504
1.5922 1300 - 0.0617 0.6467
1.7146 1400 - 0.0615 0.6574
1.8371 1500 0.0293 0.0622 0.6536
1.9596 1600 - 0.0609 0.6599
2.0821 1700 - 0.0605 0.6658
2.2045 1800 - 0.0615 0.6588
2.3270 1900 - 0.0615 0.6575
2.4495 2000 0.0215 0.0614 0.6598
2.5720 2100 - 0.0603 0.6681
2.6944 2200 - 0.0606 0.6669
2.8169 2300 - 0.0605 0.6642
2.9394 2400 - 0.0606 0.6630
3.0618 2500 0.018 0.0611 0.6616
3.1843 2600 - 0.0611 0.6619
3.3068 2700 - 0.0611 0.6608
3.4293 2800 - 0.0608 0.6632
3.5517 2900 - 0.0608 0.6623
3.6742 3000 0.014 0.0615 0.6596
3.7967 3100 - 0.0612 0.6616
3.9192 3200 - 0.0610 0.6626

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.43.3
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}