SentenceTransformer based on sentence-transformers/msmarco-distilbert-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/msmarco-distilbert-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/msmarco-distilbert-base-v2
- Maximum Sequence Length: 350 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 350, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("kperkins411/msmarco-distilbert-base-v2_triplet_legal")
# Run inference
sentences = [
'In what circumstances can FCE assume responsibility for a Program Patent?',
'Notwithstanding the foregoing, in the event ExxonMobil decides not to prosecute, defend, enforce, maintain or decides to abandon any Program Patent, then ExxonMobil will provide notice thereof to FCE, and FCE will then have the right, but not the obligation, to prosecute or maintain the Program Patent and sole responsibility for the continuing costs, taxes, legal fees, maintenance fees and other fees associated with that Program Patent.',
'4. Limitation of Liability of the Sponsor. The Sponsor shall not be liable for any error of judgment or mistake of law or for any act or omission in the oversight, administration or management of the Trust or the performance of its duties hereunder, except for willful misfeasance, bad faith or gross negligence in the performance of its duties, or by reason of the reckless disregard of its obligations and duties hereunder. As used in this Section 4, the term "Sponsor" shall include Domini and/or any of its affiliates and the directors, officers and employees of Domini and/or any of its affiliates.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
msmarco-distilbert-base-v2
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.3953 |
cosine_accuracy@3 | 0.5377 |
cosine_accuracy@5 | 0.5945 |
cosine_accuracy@10 | 0.6736 |
cosine_precision@1 | 0.3953 |
cosine_precision@3 | 0.1792 |
cosine_precision@5 | 0.1189 |
cosine_precision@10 | 0.0674 |
cosine_recall@1 | 0.3953 |
cosine_recall@3 | 0.5377 |
cosine_recall@5 | 0.5945 |
cosine_recall@10 | 0.6736 |
cosine_ndcg@10 | 0.5277 |
cosine_mrr@10 | 0.4819 |
cosine_map@100 | 0.489 |
dot_accuracy@1 | 0.3964 |
dot_accuracy@3 | 0.5335 |
dot_accuracy@5 | 0.5933 |
dot_accuracy@10 | 0.6744 |
dot_precision@1 | 0.3964 |
dot_precision@3 | 0.1778 |
dot_precision@5 | 0.1187 |
dot_precision@10 | 0.0674 |
dot_recall@1 | 0.3964 |
dot_recall@3 | 0.5335 |
dot_recall@5 | 0.5933 |
dot_recall@10 | 0.6744 |
dot_ndcg@10 | 0.5275 |
dot_mrr@10 | 0.4815 |
dot_map@100 | 0.4885 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 88,018 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 7 tokens
- mean: 17.42 tokens
- max: 104 tokens
- min: 6 tokens
- mean: 102.85 tokens
- max: 350 tokens
- min: 6 tokens
- mean: 103.73 tokens
- max: 350 tokens
- Samples:
anchor positive negative What happens if a Party fails to retain records for the required period?
Each Party will retain such records for at least three (3) years following expiration or termination of this Agreement or such longer period as may be required by applicable law or regulation.
Either party hereto may terminate this Agreement after the Initial Period upon at least six (6) months' prior written notice to the other party thereof.
What happens if a Party fails to retain records for the required period?
Each Party will retain such records for at least three (3) years following expiration or termination of this Agreement or such longer period as may be required by applicable law or regulation.
The Agreement may be terminated by both Parties with a notification period of *** before the end of the Initial Term of the Agreement.
What happens if a Party fails to retain records for the required period?
Each Party will retain such records for at least three (3) years following expiration or termination of this Agreement or such longer period as may be required by applicable law or regulation.
For twelve (12) months after delivery of the Master Copy of each Licensed Product to Licensee, Licensor warrants that the media in which the Licensed Products are stored shall be free from defects in materials and workmanship, assuming normal use. Licensee may return any defective media to Licensor for replacement free of charge during such twelve (12) month period.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Evaluation Dataset
Unnamed Dataset
- Size: 1,084 evaluation samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 6 tokens
- mean: 20.24 tokens
- max: 124 tokens
- min: 6 tokens
- mean: 97.01 tokens
- max: 350 tokens
- min: 6 tokens
- mean: 105.03 tokens
- max: 350 tokens
- Samples:
anchor positive negative Are Capital Contributions categorized as either 'Initial' or 'Additional' in the accounts?
Capital Accounts
An individual capital account (the "Capital Accounts") will be maintained for each Participant and their Initial Capital Contribution will be credited to this account. Any Additional Capital Contributions made by any Participant will be credited to that Participant's individual Capital Account.Section 4.3 Deposits and Payments 19
Are Capital Contributions categorized as either 'Initial' or 'Additional' in the accounts?
Capital Accounts
An individual capital account (the "Capital Accounts") will be maintained for each Participant and their Initial Capital Contribution will be credited to this account. Any Additional Capital Contributions made by any Participant will be credited to that Participant's individual Capital Account.Section 2.1 The Fund agrees at its own expense to execute any and all documents, to furnish any and all information, and to take any other actions that may be reasonably necessary in connection with the qualification of the Shares for sale in those states that Integrity may designate.
Are Capital Contributions categorized as either 'Initial' or 'Additional' in the accounts?
Capital Accounts
An individual capital account (the "Capital Accounts") will be maintained for each Participant and their Initial Capital Contribution will be credited to this account. Any Additional Capital Contributions made by any Participant will be credited to that Participant's individual Capital Account.Section 1.9 Integrity shall prepare and deliver reports to the Treasurer of the Fund and to the Investment Adviser on a regular, at least quarterly, basis, showing the distribution expenses incurred pursuant to this Agreement and the Plan and the purposes therefore, as well as any supplemental reports as the Trustees, from time to time, may reasonably request.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 128per_device_eval_batch_size
: 128learning_rate
: 2e-05num_train_epochs
: 6warmup_ratio
: 0.1fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 128per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 6max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss | msmarco-distilbert-base-v2_cosine_map@100 |
---|---|---|---|---|
0 | 0 | - | - | 0.4145 |
0.1453 | 100 | 1.7626 | - | - |
0.2907 | 200 | 0.9595 | - | - |
0.4360 | 300 | 0.7263 | - | - |
0.5814 | 400 | 0.6187 | - | - |
0.7267 | 500 | 0.5571 | - | - |
0.8721 | 600 | 0.4885 | - | - |
1.0131 | 697 | - | 0.3676 | - |
1.0044 | 700 | 0.4283 | - | - |
1.1497 | 800 | 0.3956 | - | - |
1.2951 | 900 | 0.2941 | - | - |
1.4404 | 1000 | 0.2437 | - | - |
1.5858 | 1100 | 0.1988 | - | - |
1.7311 | 1200 | 0.185 | - | - |
1.8765 | 1300 | 0.1571 | - | - |
2.0131 | 1394 | - | 0.2679 | - |
2.0087 | 1400 | 0.1409 | - | - |
2.1541 | 1500 | 0.1368 | - | - |
2.2994 | 1600 | 0.111 | - | - |
2.4448 | 1700 | 0.0994 | - | - |
2.5901 | 1800 | 0.0837 | - | - |
2.7355 | 1900 | 0.076 | - | - |
2.8808 | 2000 | 0.0645 | - | - |
3.0131 | 2091 | - | 0.2412 | - |
3.0131 | 2100 | 0.0607 | - | - |
3.1584 | 2200 | 0.0609 | - | - |
3.3038 | 2300 | 0.0503 | - | - |
3.4491 | 2400 | 0.0483 | - | - |
3.5945 | 2500 | 0.0402 | - | - |
3.7398 | 2600 | 0.0397 | - | - |
3.8852 | 2700 | 0.0305 | - | - |
4.0131 | 2788 | - | 0.2196 | - |
4.0174 | 2800 | 0.0304 | - | - |
4.1628 | 2900 | 0.0307 | - | - |
4.3081 | 3000 | 0.0256 | - | - |
4.4535 | 3100 | 0.0258 | - | - |
4.5988 | 3200 | 0.0212 | - | - |
4.7442 | 3300 | 0.0213 | - | - |
4.8895 | 3400 | 0.0174 | - | - |
5.0131 | 3485 | - | 0.2036 | - |
5.0218 | 3500 | 0.0191 | - | - |
5.1672 | 3600 | 0.0198 | - | - |
5.3125 | 3700 | 0.0161 | - | - |
5.4578 | 3800 | 0.0166 | - | - |
5.6032 | 3900 | 0.0135 | - | - |
5.7485 | 4000 | 0.0145 | - | - |
5.8939 | 4100 | 0.0129 | - | - |
5.9346 | 4128 | - | 0.1966 | 0.489 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.9
- Sentence Transformers: 3.1.0.dev0
- Transformers: 4.41.2
- PyTorch: 2.1.2+cu121
- Accelerate: 0.31.0
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for kperkins411/msmarco-distilbert-base-v2_triplet_legal
Evaluation results
- Cosine Accuracy@1 on msmarco distilbert base v2self-reported0.395
- Cosine Accuracy@3 on msmarco distilbert base v2self-reported0.538
- Cosine Accuracy@5 on msmarco distilbert base v2self-reported0.594
- Cosine Accuracy@10 on msmarco distilbert base v2self-reported0.674
- Cosine Precision@1 on msmarco distilbert base v2self-reported0.395
- Cosine Precision@3 on msmarco distilbert base v2self-reported0.179
- Cosine Precision@5 on msmarco distilbert base v2self-reported0.119
- Cosine Precision@10 on msmarco distilbert base v2self-reported0.067
- Cosine Recall@1 on msmarco distilbert base v2self-reported0.395
- Cosine Recall@3 on msmarco distilbert base v2self-reported0.538