SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("armaniii/all-mpnet-base-v2-augmentation-indomain-bm25-sts")
# Run inference
sentences = [
'Fanatics of the pro – life argument are sometimes so focused on the fetus that they put no value to the mother ’s life and do not even consider the viability of the fetus .',
'Life is life , whether it s outside the womb or not .',
'Legalization of marijuana is phasing out black markets and taking money away from drug cartels, organized crime, and street gangs.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-test
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.7295 |
spearman_cosine | 0.7235 |
pearson_manhattan | 0.7104 |
spearman_manhattan | 0.7118 |
pearson_euclidean | 0.7212 |
spearman_euclidean | 0.7235 |
pearson_dot | 0.7295 |
spearman_dot | 0.7235 |
pearson_max | 0.7295 |
spearman_max | 0.7235 |
Semantic Similarity
- Dataset:
sts-test
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.7146 |
spearman_cosine | 0.6886 |
pearson_manhattan | 0.707 |
spearman_manhattan | 0.6837 |
pearson_euclidean | 0.7115 |
spearman_euclidean | 0.6886 |
pearson_dot | 0.7146 |
spearman_dot | 0.6886 |
pearson_max | 0.7146 |
spearman_max | 0.6886 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 17,093 training samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 7 tokens
- mean: 33.23 tokens
- max: 97 tokens
- min: 4 tokens
- mean: 30.75 tokens
- max: 96 tokens
- min: 0.09
- mean: 0.55
- max: 0.95
- Samples:
sentence1 sentence2 score It is true that a Colorado study found a post-legalization increase in youths being treated for marijuana exposure .
In Colorado , recent figures correlate with the years since marijuana legalization to show a dramatic decrease in overall highway fatalities – and a two-fold increase in the frequency of marijuana-positive drivers in fatal auto crashes .
0.4642857142857143
The idea of a school uniform is that students wear the uniform at school , but do not wear the uniform , say , at a disco or other events outside school .
If it means that the schoolrooms will be more orderly , more disciplined , and that our young people will learn to evaluate themselves by what they are on the inside instead of what they 're wearing on the outside , then our public schools should be able to require their students to wear school uniforms . "
0.5714285714285714
The resulting embryonic stem cells could then theoretically be grown into adult cells to replace the ailing person 's mutated cells .
However , there is a more serious , less cartoonish objection to turning procreation into manufacturing .
0.4464285714285714
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Evaluation Dataset
Unnamed Dataset
- Size: 340 evaluation samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 8 tokens
- mean: 33.76 tokens
- max: 105 tokens
- min: 6 tokens
- mean: 31.86 tokens
- max: 102 tokens
- min: 0.09
- mean: 0.5
- max: 0.89
- Samples:
sentence1 sentence2 score [ quoting himself from Furman v. Georgia , 408 U.S. 238 , 257 ( 1972 ) ] As such it is a penalty that ' subjects the individual to a fate forbidden by the principle of civilized treatment guaranteed by the [ Clause ] . '
It provides a deterrent for prisoners already serving a life sentence .
0.3214285714285714
Of those savings , $ 25.7 billion would accrue to state and local governments , while $ 15.6 billion would accrue to the federal government .
Jaime Smith , deputy communications director for the governor ’s office , said , “ The legalization initiative was not driven by a desire for a revenue , but it has provided a small assist for our state budget . ”
0.5357142857142857
If the uterus is designed to sustain an unborn child ’s life , do n’t unborn children have a right to receive nutrition and shelter through the one organ designed to provide them with that ordinary care ?
We as parents are supposed to protect our children at all costs whether they are in the womb or not .
0.7678571428571428
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16warmup_ratio
: 0.1bf16
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss | sts-test_spearman_cosine |
---|---|---|---|---|
0.0935 | 100 | 0.0151 | 0.0098 | 0.7013 |
0.1871 | 200 | 0.0069 | 0.0112 | 0.6857 |
0.2806 | 300 | 0.0058 | 0.0106 | 0.6860 |
0.3742 | 400 | 0.0059 | 0.0102 | 0.6915 |
0.4677 | 500 | 0.0057 | 0.0097 | 0.6903 |
0.5613 | 600 | 0.0049 | 0.0100 | 0.6797 |
0.6548 | 700 | 0.0055 | 0.0101 | 0.6766 |
0.7484 | 800 | 0.0049 | 0.0116 | 0.6529 |
0.8419 | 900 | 0.0049 | 0.0105 | 0.6572 |
0.9355 | 1000 | 0.0051 | 0.0115 | 0.6842 |
1.0290 | 1100 | 0.0038 | 0.0094 | 0.7000 |
1.1225 | 1200 | 0.0029 | 0.0091 | 0.7027 |
1.2161 | 1300 | 0.0026 | 0.0093 | 0.7016 |
1.3096 | 1400 | 0.0027 | 0.0088 | 0.7192 |
1.4032 | 1500 | 0.0027 | 0.0097 | 0.7065 |
1.4967 | 1600 | 0.0028 | 0.0091 | 0.7011 |
1.5903 | 1700 | 0.0027 | 0.0095 | 0.7186 |
1.6838 | 1800 | 0.0026 | 0.0087 | 0.7277 |
1.7774 | 1900 | 0.0024 | 0.0085 | 0.7227 |
1.8709 | 2000 | 0.0025 | 0.0086 | 0.7179 |
1.9645 | 2100 | 0.0022 | 0.0086 | 0.7195 |
2.0580 | 2200 | 0.0017 | 0.0088 | 0.7183 |
2.1515 | 2300 | 0.0014 | 0.0088 | 0.7229 |
2.2451 | 2400 | 0.0014 | 0.0086 | 0.7200 |
2.3386 | 2500 | 0.0013 | 0.0088 | 0.7248 |
2.4322 | 2600 | 0.0014 | 0.0085 | 0.7286 |
2.5257 | 2700 | 0.0015 | 0.0085 | 0.7283 |
2.6193 | 2800 | 0.0014 | 0.0085 | 0.7263 |
2.7128 | 2900 | 0.0014 | 0.0085 | 0.7248 |
2.8064 | 3000 | 0.0013 | 0.0087 | 0.7191 |
2.8999 | 3100 | 0.0011 | 0.0086 | 0.7225 |
2.9935 | 3200 | 0.0012 | 0.0085 | 0.7235 |
3.0 | 3207 | - | - | 0.6886 |
Framework Versions
- Python: 3.9.2
- Sentence Transformers: 3.0.1
- Transformers: 4.43.1
- PyTorch: 2.3.1+cu121
- Accelerate: 0.34.2
- Datasets: 2.14.7
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 28
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for armaniii/all-mpnet-base-v2-augmentation-indomain-bm25-sts
Base model
sentence-transformers/all-mpnet-base-v2Evaluation results
- Pearson Cosine on sts testself-reported0.729
- Spearman Cosine on sts testself-reported0.723
- Pearson Manhattan on sts testself-reported0.710
- Spearman Manhattan on sts testself-reported0.712
- Pearson Euclidean on sts testself-reported0.721
- Spearman Euclidean on sts testself-reported0.723
- Pearson Dot on sts testself-reported0.729
- Spearman Dot on sts testself-reported0.723
- Pearson Max on sts testself-reported0.729
- Spearman Max on sts testself-reported0.723