SentenceTransformer based on Alibaba-NLP/gte-modernbert-base
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-modernbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Alibaba-NLP/gte-modernbert-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"How is 'associated undertaking' defined, and what criteria determine the significant influence of one undertaking over another in terms of voting rights?",
"▼B\n\n(6)\n\n‘purchase price’ means the price payable and any incidental expenses minus any incidental reductions in the cost of acquisition;\n\n(7)\n\n‘production cost’ means the purchase price of raw materials, consumables and other costs directly attributable to the item in question. Member States shall permit or require the inclusion of a reasonable proportion of fixed or variable overhead costs indirectly attributable to the item in question, to the extent that they relate to the period of production. Distribution costs shall not be included;\n\n(8)\n\n‘value adjustment’ means the adjustments intended to take account of changes in the values of individual assets established at the balance sheet date, whether the change is final or not;\n\n(9)\n\n‘parent undertaking’ means an undertaking which controls one or more subsidiary undertakings;\n\n(10)\n\n‘subsidiary undertaking’ means an undertaking controlled by a parent undertaking, including any subsidiary undertaking of an ultimate parent undertaking;\n\n(11)\n\n‘group’ means a parent undertaking and all its subsidiary undertakings;\n\n(12)\n\n‘affiliated undertakings’ means any two or more undertakings within a group;\n\n(13)\n\n‘associated undertaking’ means an undertaking in which another undertaking has a participating interest, and over whose operating and financial policies that other undertaking exercises significant influence. An undertaking is presumed to exercise a significant influence over another undertaking where it has 20 % or more of the shareholders' or members' voting rights in that other undertaking;\n\n(14)\n\n‘investment undertakings’ means:\n\n(a)\n\nundertakings the sole object of which is to invest their funds in various securities, real property and other assets, with the sole aim of spreading investment risks and giving their shareholders the benefit of the results of the management of their assets,\n\n(b)\n\nundertakings associated with investment undertakings with fixed capital, if the sole object of those associated undertakings is to acquire fully paid shares issued by those investment undertakings without prejudice to point (h) of Article 22(1) of Directive 2012/30/EU;\n\n(15)",
'and non-European non-financial corporations not subject to the disclosure obligations laid down in Directive 2013/34/EU. That information may be disclosed only once, based on counterparties’ turnover alignment for the general-purpose lending loans, as in the case of the GAR. The first disclosure reference date of this template is as of 31 December 2024. Institutions are not required to disclose this information before 1 January 2025. ---|---|---',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.691 |
cosine_accuracy@3 | 0.9109 |
cosine_accuracy@5 | 0.9461 |
cosine_accuracy@10 | 0.9743 |
cosine_precision@1 | 0.691 |
cosine_precision@3 | 0.3036 |
cosine_precision@5 | 0.1892 |
cosine_precision@10 | 0.0974 |
cosine_recall@1 | 0.691 |
cosine_recall@3 | 0.9109 |
cosine_recall@5 | 0.9461 |
cosine_recall@10 | 0.9743 |
cosine_ndcg@10 | 0.8472 |
cosine_mrr@10 | 0.8048 |
cosine_map@100 | 0.8061 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 46,338 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 13 tokens
- mean: 34.18 tokens
- max: 251 tokens
- min: 7 tokens
- mean: 231.33 tokens
- max: 2146 tokens
- Samples:
sentence_0 sentence_1 How is 'energy efficiency' defined in the context of Directive (EU) 2018/2001?
of Directive (EU) 2018/2001; --- --- (8) ‘energy efficiency’ means the ratio of output of performance, service, goods or energy to input of energy; --- --- (9) ‘energy savings’ means an amount of saved energy determined by measuring or estimating consumption, or both,, before and after the implementation of an energy efficiency improvement measure, whilst ensuring normalisation for external conditions that affect energy consumption; --- --- (10) ‘energy efficiency improvement’ means an increase in energy efficiency as a result of any technological, behavioural or economic changes; --- --- (11) ‘energy service’ means the physical benefit, utility or good derived from a combination of energy with energy-efficient technology or with action,
What are the sources of information that the external experts will use to create the list of conflict-affected and high-risk areas?
2.
The Commission shall call upon external expertise that will provide an indicative, non-exhaustive, regularly updated list of conflict-affected and high-risk areas. That list shall be based on the external experts' analysis of the handbook referred to in paragraph 1 and existing information from, inter alia, academics and supply chain due diligence schemes. Union importers sourcing from areas which are not mentioned on that list shall also maintain their responsibility to comply with the due diligence obligations under this Regulation.
Article 15
Committee procedure
1.
The Commission shall be assisted by a committee. That committee shall be a committee within the meaning of Regulation (EU) No 182/2011.
2.What is the maximum time frame for completing the undertaking according to the technical specifications set out in Annexes II and III after the Directive enters into force?
is undertaken according to the technical specifications set out in Annexes II and III and that it is completed at the latest four years after the date of entry into force of this Directive.
2. The analyses and reviews mentioned under paragraph 1 shall be reviewed, and if necessary updated at the latest 13 years after the date of entry into force of this Directive and every six years thereafter.
Article 6
Register of protected areas - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 4per_device_eval_batch_size
: 4num_train_epochs
: 4multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 4per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | cosine_ndcg@10 |
---|---|---|---|
0.0432 | 500 | 0.358 | - |
0.0863 | 1000 | 0.1048 | - |
0.1295 | 1500 | 0.0827 | - |
0.1726 | 2000 | 0.067 | 0.7969 |
0.2158 | 2500 | 0.0491 | - |
0.2590 | 3000 | 0.0831 | - |
0.3021 | 3500 | 0.062 | - |
0.3453 | 4000 | 0.0657 | 0.8050 |
0.3884 | 4500 | 0.0522 | - |
0.4316 | 5000 | 0.049 | - |
0.4748 | 5500 | 0.0426 | - |
0.5179 | 6000 | 0.0708 | 0.8215 |
0.5611 | 6500 | 0.0236 | - |
0.6042 | 7000 | 0.024 | - |
0.6474 | 7500 | 0.0256 | - |
0.6905 | 8000 | 0.041 | 0.8105 |
0.7337 | 8500 | 0.0285 | - |
0.7769 | 9000 | 0.0249 | - |
0.8200 | 9500 | 0.0368 | - |
0.8632 | 10000 | 0.0588 | 0.8118 |
0.9063 | 10500 | 0.0386 | - |
0.9495 | 11000 | 0.0456 | - |
0.9927 | 11500 | 0.0399 | - |
1.0 | 11585 | - | 0.8184 |
1.0358 | 12000 | 0.0424 | 0.8239 |
1.0790 | 12500 | 0.0107 | - |
1.1221 | 13000 | 0.0279 | - |
1.1653 | 13500 | 0.0236 | - |
1.2085 | 14000 | 0.024 | 0.8193 |
1.2516 | 14500 | 0.0143 | - |
1.2948 | 15000 | 0.0118 | - |
1.3379 | 15500 | 0.0078 | - |
1.3811 | 16000 | 0.023 | 0.8217 |
1.4243 | 16500 | 0.0239 | - |
1.4674 | 17000 | 0.0335 | - |
1.5106 | 17500 | 0.0119 | - |
1.5537 | 18000 | 0.0411 | 0.8292 |
1.5969 | 18500 | 0.0168 | - |
1.6401 | 19000 | 0.0059 | - |
1.6832 | 19500 | 0.0234 | - |
1.7264 | 20000 | 0.0184 | 0.8366 |
1.7695 | 20500 | 0.0128 | - |
1.8127 | 21000 | 0.0166 | - |
1.8558 | 21500 | 0.0181 | - |
1.8990 | 22000 | 0.0148 | 0.8353 |
1.9422 | 22500 | 0.0225 | - |
1.9853 | 23000 | 0.0158 | - |
2.0 | 23170 | - | 0.8360 |
2.0285 | 23500 | 0.0123 | - |
2.0716 | 24000 | 0.0173 | 0.8329 |
2.1148 | 24500 | 0.0167 | - |
2.1580 | 25000 | 0.0125 | - |
2.2011 | 25500 | 0.013 | - |
2.2443 | 26000 | 0.0079 | 0.8338 |
2.2874 | 26500 | 0.007 | - |
2.3306 | 27000 | 0.0171 | - |
2.3738 | 27500 | 0.0058 | - |
2.4169 | 28000 | 0.0048 | 0.8405 |
2.4601 | 28500 | 0.005 | - |
2.5032 | 29000 | 0.0141 | - |
2.5464 | 29500 | 0.0132 | - |
2.5896 | 30000 | 0.006 | 0.8461 |
2.6327 | 30500 | 0.0095 | - |
2.6759 | 31000 | 0.0061 | - |
2.7190 | 31500 | 0.0107 | - |
2.7622 | 32000 | 0.0157 | 0.8451 |
2.8054 | 32500 | 0.005 | - |
2.8485 | 33000 | 0.0087 | - |
2.8917 | 33500 | 0.0064 | - |
2.9348 | 34000 | 0.005 | 0.8449 |
2.9780 | 34500 | 0.0115 | - |
3.0 | 34755 | - | 0.8451 |
3.0211 | 35000 | 0.0079 | - |
3.0643 | 35500 | 0.0045 | - |
3.1075 | 36000 | 0.0029 | 0.8443 |
3.1506 | 36500 | 0.0161 | - |
3.1938 | 37000 | 0.0144 | - |
3.2369 | 37500 | 0.0076 | - |
3.2801 | 38000 | 0.0157 | 0.8500 |
3.3233 | 38500 | 0.0039 | - |
3.3664 | 39000 | 0.0045 | - |
3.4096 | 39500 | 0.0033 | - |
3.4527 | 40000 | 0.0064 | 0.8434 |
3.4959 | 40500 | 0.0054 | - |
3.5391 | 41000 | 0.0061 | - |
3.5822 | 41500 | 0.0051 | - |
3.6254 | 42000 | 0.0019 | 0.8472 |
Framework Versions
- Python: 3.10.15
- Sentence Transformers: 3.4.1
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu126
- Accelerate: 1.5.2
- Datasets: 3.4.1
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 4,527
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for amentaphd/gte-modernbert-base
Base model
answerdotai/ModernBERT-base
Finetuned
Alibaba-NLP/gte-modernbert-base
Evaluation results
- Cosine Accuracy@1 on Unknownself-reported0.691
- Cosine Accuracy@3 on Unknownself-reported0.911
- Cosine Accuracy@5 on Unknownself-reported0.946
- Cosine Accuracy@10 on Unknownself-reported0.974
- Cosine Precision@1 on Unknownself-reported0.691
- Cosine Precision@3 on Unknownself-reported0.304
- Cosine Precision@5 on Unknownself-reported0.189
- Cosine Precision@10 on Unknownself-reported0.097
- Cosine Recall@1 on Unknownself-reported0.691
- Cosine Recall@3 on Unknownself-reported0.911