MPNet base trained on GooAQ triplets with hard negatives

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/mpnet-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- train
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/mpnet-base-gooaq-hard-negatives")
# Run inference
sentences = [
    'are hard seltzers malt liquor?',
    'Seltzer is carbonated water. “Hard seltzer” is a flavored malt beverage — essentially the same as a Lime-A-Rita or a Colt 45 or a Smirnoff Ice. These products derive their alcohol from fermented malted grains and are then carbonated, flavored and sweetened.',
    'Bleaching action of chlorine is based on oxidation while that of sulphur is based on reduction. Chlorine acts with water to produce nascent oxygen. ... Sulphour dioxide removes oxygen from the coloured substance and makes it colourless.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: gooaq-dev
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.7413
cosine_accuracy@3	0.8697
cosine_accuracy@5	0.9055
cosine_accuracy@10	0.9427
cosine_precision@1	0.7413
cosine_precision@3	0.2899
cosine_precision@5	0.1811
cosine_precision@10	0.0943
cosine_recall@1	0.7413
cosine_recall@3	0.8697
cosine_recall@5	0.9055
cosine_recall@10	0.9427
cosine_ndcg@10	0.8442
cosine_mrr@10	0.8124
cosine_map@100	0.8148
dot_accuracy@1	0.7384
dot_accuracy@3	0.8669
dot_accuracy@5	0.9039
dot_accuracy@10	0.9389
dot_precision@1	0.7384
dot_precision@3	0.289
dot_precision@5	0.1808
dot_precision@10	0.0939
dot_recall@1	0.7384
dot_recall@3	0.8669
dot_recall@5	0.9039
dot_recall@10	0.9389
dot_ndcg@10	0.8411
dot_mrr@10	0.8095
dot_map@100	0.812

Training Details

Training Dataset

train

Dataset: train at 87594a1
Size: 2,286,783 training samples
Columns: question, answer, negative_1, negative_2, negative_3, negative_4, and negative_5

Approximate statistics based on the first 1000 samples:

	question	answer	negative_1	negative_2	negative_3	negative_4	negative_5
type	string	string	string	string	string	string	string
details	min: 8 tokens mean: 11.84 tokens max: 23 tokens	min: 13 tokens mean: 59.41 tokens max: 158 tokens	min: 13 tokens mean: 59.09 tokens max: 139 tokens	min: 14 tokens mean: 58.61 tokens max: 139 tokens	min: 14 tokens mean: 58.98 tokens max: 173 tokens	min: 15 tokens mean: 59.43 tokens max: 137 tokens	min: 13 tokens mean: 60.03 tokens max: 146 tokens

Samples:

question	answer	negative_1	negative_2	negative_3	negative_4	negative_5
`is toprol xl the same as metoprolol?`	`Metoprolol succinate is also known by the brand name Toprol XL. It is the extended-release form of metoprolol. Metoprolol succinate is approved to treat high blood pressure, chronic chest pain, and congestive heart failure.`	`Secondly, metoprolol and metoprolol ER have different brand-name equivalents: Brand version of metoprolol: Lopressor. Brand version of metoprolol ER: Toprol XL.`	`Pill with imprint 1 is White, Round and has been identified as Metoprolol Tartrate 25 mg.`	`Interactions between your drugs No interactions were found between Allergy Relief and metoprolol. This does not necessarily mean no interactions exist. Always consult your healthcare provider.`	`Metoprolol is a type of medication called a beta blocker. It works by relaxing blood vessels and slowing heart rate, which improves blood flow and lowers blood pressure. Metoprolol can also improve the likelihood of survival after a heart attack.`	`Metoprolol starts to work after about 2 hours, but it can take up to 1 week to fully take effect. You may not feel any different when you take metoprolol, but this doesn't mean it's not working. It's important to keep taking your medicine.`
`are you experienced cd steve hoffman?`	`The Are You Experienced album was apparently mastered from the original stereo UK master tapes (according to Steve Hoffman - one of the very few who has heard both the master tapes and the CDs produced over the years). ... The CD booklets were a little sparse, but at least they stayed true to the album's original design.`	`I Saw the Light. Showcasing the unique talent and musical influence of country-western artist Hank Williams, this candid biography also sheds light on the legacy of drug abuse and tormented relationships that contributes to the singer's legend.`	`(Read our ranking of his top 10.) And while Howard dresses the part of director, any notion of him as a tortured auteur or dictatorial taskmasker — the clichés of the Hollywood director — are tossed aside. He's very nice.`	`He was a music star too. Where're you people born and brought up? We 're born and brought up here in Anambra State at Nkpor town, near Onitsha.`	`At the age of 87 he has now retired from his live shows and all the traveling involved. And although he still picks up his Martin Guitar and does a show now and then, his life is now devoted to writing his memoirs.`	`The owner of the mysterious voice behind all these videos is a man who's seen a lot, visiting a total of 56 intimate celebrity spaces over the course of five years. His name is Joe Sabia — that's him in the photo — and he's currently the VP of creative development at Condé Nast Entertainment.`
`how are babushka dolls made?`	`Matryoshka dolls are made of wood from lime, balsa, alder, aspen, and birch trees; lime is probably the most common wood type. ... After cutting, the trees are stripped of most of their bark, although a few inner rings of bark are left to bind the wood and keep it from splitting.`	`A quick scan of the auction and buy-it-now listings on eBay finds porcelain doll values ranging from around $5 and $10 to several thousand dollars or more but no dolls listed above $10,000.`	`Japanese dolls are called as ningyō in Japanese and literally translates to 'human form'.`	`Matyoo: All Fresno Girl dolls come just as real children are born.`	`As of September 2016, there are over 100 characters. The main toy line includes 13-inch Dolls, the mini-series, and a variety of mini play-sets and plush dolls as well as Lalaloopsy Littles, smaller siblings of the 13-inch dolls. A spin-off known as "Lala-Oopsies" came out in late 2012.`	`LOL dolls are little baby dolls that come wrapped inside a surprise toy ball. Each ball has layers that contain stickers, secret messages, mix and match accessories–and finally–a doll. ... The doll on the ball is almost never the doll inside. Dolls are released in series, so not every doll is available all the time.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

sentence-transformers/gooaq

Dataset: sentence-transformers/gooaq at b089f72
Size: 10,000 evaluation samples
Columns: question and answer
Approximate statistics based on the first 1000 samples:
question answer
type string string
details
min: 8 tokens
mean: 11.89 tokens
max: 22 tokens

min: 14 tokens
mean: 59.65 tokens
max: 131 tokens

	question	answer
type	string	string
details	min: 8 tokens mean: 11.89 tokens max: 22 tokens	min: 14 tokens mean: 59.65 tokens max: 131 tokens

Samples:

question	answer
`how to transfer data from ipad to usb?`	`First, in “Locations,” tap the “On My iPhone” or “On My iPad” section. Here, tap and hold the empty space, and then select “New Folder.” Name it, and then tap “Done” to create a new folder for the files you want to transfer. Now, from the “Locations” section, select your USB flash drive.`
`what quorn products are syn free?`	`['bacon style pieces.', 'bacon style rashers, chilled.', 'BBQ sliced fillets.', 'beef style and red onion burgers.', 'pieces.', 'chicken style slices.', 'fajita strips.', 'family roast.']`
`what is the difference between turmeric ginger?`	`Ginger offers a sweet and spicy zing to dishes. Turmeric provides a golden yellow colour and a warm and bitter taste with a peppery aroma.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss	gooaq-dev_cosine_map@100
0	0	-	-	0.1405
0.2869	20500	0.5303	-	-
0.2939	21000	0.5328	-	-
0.3009	21500	0.515	-	-
0.3079	22000	0.5264	0.0297	0.7919
0.3149	22500	0.5189	-	-
0.3218	23000	0.5284	-	-
0.3288	23500	0.5308	-	-
0.3358	24000	0.509	0.0281	0.7932
0.3428	24500	0.5074	-	-
0.3498	25000	0.5196	-	-
0.3568	25500	0.5041	-	-
0.3638	26000	0.4976	0.0291	0.7950
0.3708	26500	0.5025	-	-
0.3778	27000	0.5175	-	-
0.3848	27500	0.4921	-	-
0.3918	28000	0.4924	0.0298	0.7938
0.3988	28500	0.49	-	-
0.4058	29000	0.4924	-	-
0.4128	29500	0.4902	-	-
0.4198	30000	0.4846	0.0269	0.7966
0.4268	30500	0.4815	-	-
0.4338	31000	0.4881	-	-
0.4408	31500	0.4848	-	-
0.4478	32000	0.4882	0.0264	0.8004
0.4548	32500	0.4809	-	-
0.4618	33000	0.4896	-	-
0.4688	33500	0.4744	-	-
0.4758	34000	0.4827	0.0252	0.8038
0.4828	34500	0.4703	-	-
0.4898	35000	0.4765	-	-
0.4968	35500	0.4625	-	-
0.5038	36000	0.4698	0.0269	0.8025
0.5108	36500	0.4666	-	-
0.5178	37000	0.4594	-	-
0.5248	37500	0.4621	-	-
0.5318	38000	0.4538	0.0266	0.8047
0.5387	38500	0.4576	-	-
0.5457	39000	0.4594	-	-
0.5527	39500	0.4503	-	-
0.5597	40000	0.4538	0.0265	0.8038
0.5667	40500	0.4521	-	-
0.5737	41000	0.4575	-	-
0.5807	41500	0.4544	-	-
0.5877	42000	0.4462	0.0245	0.8077
0.5947	42500	0.4491	-	-
0.6017	43000	0.4651	-	-
0.6087	43500	0.4549	-	-
0.6157	44000	0.4461	0.0262	0.8046
0.6227	44500	0.4571	-	-
0.6297	45000	0.4478	-	-
0.6367	45500	0.4482	-	-
0.6437	46000	0.4439	0.0244	0.8070
0.6507	46500	0.4384	-	-
0.6577	47000	0.446	-	-
0.6647	47500	0.4425	-	-
0.6717	48000	0.4308	0.0248	0.8067
0.6787	48500	0.4374	-	-
0.6857	49000	0.4342	-	-
0.6927	49500	0.4455	-	-
0.6997	50000	0.4322	0.0242	0.8077
0.7067	50500	0.4288	-	-
0.7137	51000	0.4317	-	-
0.7207	51500	0.4295	-	-
0.7277	52000	0.4291	0.0231	0.8130
0.7347	52500	0.4279	-	-
0.7417	53000	0.4287	-	-
0.7486	53500	0.4252	-	-
0.7556	54000	0.4341	0.0243	0.8112
0.7626	54500	0.419	-	-
0.7696	55000	0.4323	-	-
0.7766	55500	0.4252	-	-
0.7836	56000	0.4313	0.0264	0.8107
0.7906	56500	0.4222	-	-
0.7976	57000	0.4226	-	-
0.8046	57500	0.4152	-	-
0.8116	58000	0.4222	0.0236	0.8131
0.8186	58500	0.4184	-	-
0.8256	59000	0.4144	-	-
0.8326	59500	0.4242	-	-
0.8396	60000	0.4148	0.0242	0.8125
0.8466	60500	0.4222	-	-
0.8536	61000	0.4184	-	-
0.8606	61500	0.4138	-	-
0.8676	62000	0.4119	0.0240	0.8133
0.8746	62500	0.411	-	-
0.8816	63000	0.4172	-	-
0.8886	63500	0.4145	-	-
0.8956	64000	0.4168	0.0240	0.8137
0.9026	64500	0.4071	-	-
0.9096	65000	0.4119	-	-
0.9166	65500	0.403	-	-
0.9236	66000	0.4092	0.0238	0.8141
0.9306	66500	0.4079	-	-
0.9376	67000	0.4129	-	-
0.9446	67500	0.4082	-	-
0.9516	68000	0.4054	0.0235	0.8149
0.9586	68500	0.4129	-	-
0.9655	69000	0.4085	-	-
0.9725	69500	0.414	-	-
0.9795	70000	0.4075	0.0239	0.8142
0.9865	70500	0.4104	-	-
0.9935	71000	0.4087	-	-
1.0	71462	-	-	0.8148

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Energy Consumed: 3.989 kWh
Carbon Emitted: 1.551 kg of CO2
Hours Used: 11.599 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.11.6
Sentence Transformers: 3.1.0.dev0
Transformers: 4.41.2
PyTorch: 2.3.0+cu121
Accelerate: 0.31.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

tomaarsen
/

mpnet-base-gooaq-hard-negatives