SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-large
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BlackBeenie/ModernBERT-large-msmarco-v3-bpr")
# Run inference
sentences = [
    'what county is phillips wi',
    'Phillips is a city in Price County, Wisconsin, United States. The population was 1,675 at the 2000 census. It is the county seat of Price County. Phillips is located at 45°41â\x80²30â\x80³N 90°24â\x80²7â\x80³W / 45.69167°N 90.40194°W / 45.69167; -90.40194 (45.691560, -90.401915). It is on highway SR 13, 77 miles north of Marshfield, and 74 miles south of Ashland.',
    "Motto: It's not what you show, it's what you grow.. Location within Phillips County and Colorado. Holyoke is the Home Rule Municipality that is the county seat and the most populous municipality of Phillips County, Colorado, United States. The city population was 2,313 at the 2010 census.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 498,970 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 9.24 tokens
    • max: 27 tokens
    • min: 23 tokens
    • mean: 83.71 tokens
    • max: 279 tokens
    • min: 16 tokens
    • mean: 80.18 tokens
    • max: 262 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    what is tongkat ali Tongkat Ali is a very powerful herb that acts as a sex enhancer by naturally increasing the testosterone levels, and revitalizing sexual impotence, performance and pleasure. Tongkat Ali is also effective in building muscular volume & strength resulting to a healthy physique. However, unlike tongkat ali extract, tongkat ali chipped root and root powder are not sterile. Thus, the raw consumption of root powder is not recommended. The traditional preparation in Indonesia and Malaysia is to boil chipped roots as a tea.
    cost to install engineered hardwood flooring Burton says his customers typically spend about $8 per square foot for engineered hardwood flooring; add an additional $2 per square foot for installation. Minion says consumers should expect to pay $7 to $12 per square foot for quality hardwood flooring. “If the homeowner buys the wood and you need somebody to install it, usually an installation goes for about $2 a square foot,” Bill LeBeau, owner of LeBeau’s Hardwood Floors of Huntersville, North Carolina, says. Engineered Wood Flooring Installation - Average Cost Per Square Foot. Expect to pay in the higher end of the price range for a licensed, insured and reputable pro - and for complex or rush projects. To lower Engineered Wood Flooring Installation costs: combine related projects, minimize options/extras and be flexible about project scheduling.
    define pollute pollutes; polluted; polluting. Learner's definition of POLLUTE. [+ object] : to make (land, water, air, etc.) dirty and not safe or suitable to use. Waste from the factory had polluted [=contaminated] the river. Miles of beaches were polluted by the oil spill. Car exhaust pollutes the air. Chemical water pollution. Industrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it.1 Metals and solvents from industrial work can pollute rivers and lakes.2 These are poisonous to many forms of aquatic life and may slow their development, make them infertile or even result in death.ndustrial and agricultural work involves the use of many different chemicals that can run-off into water and pollute it. 1 Metals and solvents from industrial work can pollute rivers and lakes.
  • Loss: beir.losses.bpr_loss.BPRLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 6
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0641 500 1.4036
0.1283 1000 0.36
0.1924 1500 0.3305
0.2565 2000 0.2874
0.3206 2500 0.2732
0.3848 3000 0.2446
0.4489 3500 0.2399
0.5130 4000 0.2302
0.5771 4500 0.231
0.6413 5000 0.2217
0.7054 5500 0.2192
0.7695 6000 0.2087
0.8337 6500 0.2104
0.8978 7000 0.2069
0.9619 7500 0.2071
1.0 7797 -
1.0260 8000 0.1663
1.0902 8500 0.1213
1.1543 9000 0.1266
1.2184 9500 0.1217
1.2825 10000 0.1193
1.3467 10500 0.1198
1.4108 11000 0.1258
1.4749 11500 0.1266
1.5391 12000 0.1334
1.6032 12500 0.1337
1.6673 13000 0.1258
1.7314 13500 0.1268
1.7956 14000 0.1249
1.8597 14500 0.1256
1.9238 15000 0.1238
1.9879 15500 0.1274
2.0 15594 -
2.0521 16000 0.0776
2.1162 16500 0.0615
2.1803 17000 0.0647
2.2445 17500 0.0651
2.3086 18000 0.0695
2.3727 18500 0.0685
2.4368 19000 0.0685
2.5010 19500 0.0707
2.5651 20000 0.073
2.6292 20500 0.0696
2.6933 21000 0.0694
2.7575 21500 0.0701
2.8216 22000 0.0668
2.8857 22500 0.07
2.9499 23000 0.0649
3.0 23391 -
3.0140 23500 0.0589
3.0781 24000 0.0316
3.1422 24500 0.0377
3.2064 25000 0.039
3.2705 25500 0.0335
3.3346 26000 0.0387
3.3987 26500 0.0367
3.4629 27000 0.0383
3.5270 27500 0.0407
3.5911 28000 0.0372
3.6553 28500 0.0378
3.7194 29000 0.0359
3.7835 29500 0.0394
3.8476 30000 0.0388
3.9118 30500 0.0422
3.9759 31000 0.0391
4.0 31188 -
4.0400 31500 0.0251
4.1041 32000 0.0199
4.1683 32500 0.0261
4.2324 33000 0.021
4.2965 33500 0.0196
4.3607 34000 0.0181
4.4248 34500 0.0228
4.4889 35000 0.0195
4.5530 35500 0.02
4.6172 36000 0.0251
4.6813 36500 0.0213
4.7454 37000 0.0208
4.8095 37500 0.0192
4.8737 38000 0.0204
4.9378 38500 0.0176
5.0 38985 -
5.0019 39000 0.0184
5.0661 39500 0.0136
5.1302 40000 0.0102
5.1943 40500 0.0122
5.2584 41000 0.0124
5.3226 41500 0.013
5.3867 42000 0.0105
5.4508 42500 0.0135
5.5149 43000 0.0158
5.5791 43500 0.015
5.6432 44000 0.0128
5.7073 44500 0.0105
5.7715 45000 0.014
5.8356 45500 0.0125
5.8997 46000 0.0139
5.9638 46500 0.0137
6.0 46782 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
24
Safetensors
Model size
395M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for BlackBeenie/ModernBERT-large-msmarco-v3-bpr

Finetuned
(9)
this model