BGE base En v1.5 Phase 5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("RishuD7/bge-base-en-v1.5-65-keys-phase-5-exp_v1")
# Run inference
sentences = [
    'SECTION THREE RENT A. Base Rent. Tenant shall pay the following Base Rent, without reduction or set-off, during the Term: (i) Initial Term. For the one (1) year period commencing on the Commencement Date the Base Rent shall be Eleven Thousand Seven Hundred and Four Dollars and Fifty Cents ($11,704.50) per month, payable in advance on the first (1") day of each month (the "due date"), provided however, that Rent for the first (1") month shall be due and payable immediately upon Tenant\'s execution of this Lease. During each subsequent one (I) year period during the Initial Term, the monthly Base Rent shall be increased by three percent (3%). Rent received after the tenth (IOlh) day of the month it is due shall result in an 2 additional late charge equal to five percent (5%) of the monthly Base Rent that is late, and such late charge shall be due and payable as Additional Rent with the Base Rent payment for the following calendar month, or if the Initial Term has expired, within fifteen (15) days of Landlord\'s written demand for payment. (ii) Extension Term. For the one (1) year period commencing on the first (1 ") day of the First Extension Term, the Base Rent shall be Twenty-One Thousand One Hundred Thirty-Nine Dollars and Sixty-Three Cents ($21,139.63) per month, payable in advance on the first (1 ") day of each month (the "due date"). During each subsequent one (1) year period during the First Extension Term and the Second Extension Term, the monthly Base Rent shall be increased by three percent (3%). Rent received after the tenth (1oth) day of the month it is due shall result in an additional late charge equal to five percent (5%) of the monthly Base Rent that is late, and such late charge shall be due and payable as Additional Rent with the Base Rent payment for the following calendar month, or if the applicable Extension Term has expired, within fifteen (15) days of Landlord\'s written demand for payment. (iii) Holdover Tenancy.\nAny holdover tenancy shall be month to month, and can be terminated by either party upon thirty (30) days advanced written notice. The monthly Base Rent during the period of any holdover tenancy shall be an amount equal to two (2) times the monthly Base Rent provided in (i) or (ii) above for the most recently completed month of the Initial Term or any applicable Extension Term, payable in advance on the first (1 ") day of each month. Rent received after the tenth (10th) day of the month it is due shall result in an additional late charge equal to five percent (5%) of the monthly Base Rent that is late, and such late charge shall be due and payable as Additional Rent with the Base Rent payment for the following calendar month, or at Landlord\'s option, which Landlord may exercise at any time, within fifteen (15) days of Landlord\'s written demand for payment. Tenant shall be responsible for paying all items of Additional Rent, as specified below (and described elsewhere in this Lease), during the period of any holdover tenancy. (iv) Rent Exhibit. The Base Rent during the Initial Term and Extension Terms is set forth on Exhibit A attached hereto and incorporated herein by this reference. B. Additional Rent. The following items shall be paid by Tenant as Additional Rent during the entire Term, and during the period of any holdover tenancy. In the event this Lease is terminated in a manner permitted by this Lease, or in the event of a holdover tenancy, annual items of Additional Rent shall be pro-rated, and Tenant shall only be responsible for its proportionate share of such Additional Rent for periods prior to the termination of the Lease andlor any holdover tenancy. (i) Late Rent. The late charges specified in A. above for rental payments that are late shall be an item of Additional Rent due with the rental payment for the calendar month immediately following the month in which the late payment was incurred.',
    'Late Payment Trigger Period (days)',
    'Late Payment Trigger Period Details',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0056
cosine_accuracy@3 0.0152
cosine_accuracy@5 0.028
cosine_accuracy@10 0.0583
cosine_precision@1 0.0056
cosine_precision@3 0.0051
cosine_precision@5 0.0056
cosine_precision@10 0.0058
cosine_recall@1 0.0056
cosine_recall@3 0.0152
cosine_recall@5 0.028
cosine_recall@10 0.0583
cosine_ndcg@10 0.0261
cosine_mrr@10 0.0166
cosine_map@100 0.0297

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,315 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 63 tokens
    • mean: 345.31 tokens
    • max: 512 tokens
    • min: 4 tokens
    • mean: 6.58 tokens
    • max: 10 tokens
  • Samples:
    positive anchor
    · 12/14/2005 11.28 IFAX Chwkfaxlslimanstander.com "* Kelly Ramsden lg]001/013
    .
    THIS LEASE, dated for reference the 1st day of December, 2004, is
    .
    BETWEEN:
    .
    OXFORD DEVELOPMENTS LTD. a company duly incorporated under the
    laws of the Province of British Columbia under nwnber 640355 and having it>
    registered and records office at 201-45793 Luckakuck Way, Chilliwack, B.C.
    .
    V2R5P9
    (hereinafter called the "Landlord")
    .
    OF THE FIRST PART
    .
    AND:
    .
    HUB INTERNATIONAL BARTON LIMITED having an office
    at 45710 Airport Road, Chilliwack, B.C. V2P 6Z9
    (hcrcinatlcr called the "Tenant")
    .
    ...
    Lessee Legal Name
    Tenant shall pay to Landlord, as a fee (the "Termination Fee"), an amount equal to the Unamortized Portion (as hereinafter defined) of the following amounts, plus interest thereon at the rate of 7.5% per annum, compounded monthly (collectively, in the aggregate "Transaction Costs"): (1) brokerage commissions incurred by Landlord, and (2) Landlord's reasonable attorney's fees, in each case in connection with entering into this Lease. Tenant shall pay fifty percent (50%) of the Termination Fee to Landlord within thirty (30) days following Tenant's delivery of Tenant's termination notice, and the remaining fifty percent (50%) of the Termination Fee shall be paid by Tenant on or before July 1, 2024 Early Termination Costs for Lessee
    Any expenses related to subsequent revisions will be the expense of Tenant Landlord(s):__________ 16 of 20 PA Tenant(s):__________ C. Termination Option Tenant shall have the right to cancel the lease (the “Termination Option”) effective on February 28, 2019 (the “Termination Date”). Tenant shall provide Landlord notice of its intent to exercise this Termination Option no later than May 31, 2018, and shall pay Landlord by the Termination Date a termination fee equal to the unamortized portion of the leasing commission and Tenant Improvement Expenses (as defined in Exhibit B) incurred by Landlord as a result of this lease transaction, plus a “remarketing fee” of $5,000.00. IN WITNESS WHEREOF, the parties have executed this Lease as of the date hereof..
    LANDLORD:
    S and S Crossroads, LLC
    By:
    ...
    Early Termination Notice
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 30
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 30
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10
0.6987 10 2.5547 -
1.3974 20 1.0737 -
2.0961 30 0.0724 -
2.7948 40 0.0 -
3.4934 50 0.0 -
3.7729 54 - 0.0239
1.3537 60 0.9932 -
2.0524 70 1.193 -
2.7511 80 0.0518 -
3.4498 90 0.0009 -
4.1485 100 0.0 -
4.7773 109 - 0.0228
2.0087 110 0.0154 -
2.7074 120 1.0959 -
3.4061 130 0.2585 -
4.1048 140 0.0006 -
4.8035 150 0.0 -
5.5022 160 0.0 -
5.7817 164 - 0.0274
3.3624 170 0.5192 -
4.0611 180 0.5537 -
4.7598 190 0.0037 -
5.4585 200 0.0 -
6.1572 210 0.0 -
6.786 219 - 0.0283
4.0175 220 0.0219 -
4.7162 230 0.756 -
5.4148 240 0.156 -
6.1135 250 0.0002 -
6.8122 260 0.0 -
7.5109 270 0.0 -
7.7904 274 - 0.0264
5.3712 280 0.4501 -
6.0699 290 0.4103 -
6.7686 300 0.0009 -
7.4672 310 0.0 -
8.1659 320 0.0 -
8.7948 329 - 0.0280
6.0262 330 0.0287 -
6.7249 340 0.6199 -
7.4236 350 0.1078 -
8.1223 360 0.0001 -
8.8210 370 0.0 -
9.5197 380 0.0 -
9.7991 384 - 0.0263
7.3799 390 0.3923 -
8.0786 400 0.3161 -
8.7773 410 0.0006 -
9.4760 420 0.0 0.0261
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
38,733
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for RishuD7/bge-base-en-v1.5-65-keys-phase-5-exp_v1

Finetuned
(330)
this model

Evaluation results