SentenceTransformer based on BAAI/bge-small-en

This is a sentence-transformers model finetuned from BAAI/bge-small-en. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Areeb-02/bge-small-en-MultiplrRankingLoss-Tax-dataset")
# Run inference
sentences = [
    'Based on the context information provided, what are the different gross receipts tax rates for businesses in San Francisco for tax years 2022, 2023, and 2024?',
    '$9.75 per $1,000) for taxable gross receipts over $25,000,000\n44SANCO\n2024 NAY LO\n(D) For tax year 2024 if the Controller certifies under Section 953.10 that the\nDEPARTMENT OF\n95% gross receipts threshold has been met for tax year 2024, and for tax years beginning on or after\nJanuary 1, 2025:\n0.814% (e.g. $8.14 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.853% (e.g. $8.53 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.93% (e.g. $9.30 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n1.008% (e.g. $10.08 per $1,000) for taxable gross receipts over $25,000,000\n(3) For all business activities not otherwise exempt and not elsewhere\nsubjected to a gross receipts tax rate or an administrative office tax by this Article 12-A-1:\n(B) For tax years 2022 and, if the Controller does not certify under\nSection 953.10 that the 90% gross receipts threshold has been met for tax year 2023, for tax\nyear 2023:\n0.788% (e.g. $7.88 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.825% (e.g. $8.25 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.9% (e.g. $9 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n0.975% (e.g. $9.75 per $1,000) for taxable gross receipts over $25,000,000\n(C) For tax year 2023 if the Controller certifies under Section 953.10 that the\n90% gross receipts threshold has been met for tax year 2023,',
    '(d) In no event shall the credit under this Section 960.4 reduce a person or combined group\'s\nGross Receipts Tax liability to less than $0 for any tax year. The credit under this Section shall not be\nrefundable and may not be carried forward to a subsequent year.\nSEC. 966. CONTROLLER REPORTS.\nThe Controller shall prepare reports by September 1, 2026, and September 1, 2027,\nrespectively, that discuss current economic conditions in the City and the performance of the tax system\nrevised by the voters in the ordinance adding this Section 966.\nSection 6. Article 21 of the Business and Tax Regulations Code is hereby amended by\nrevising Section 2106 to read as follows:\nSEC. 2106. SMALL BUSINESS EXEMPTION.\n(a) For tax years ending on or before December 31, 2024, nNotwithstanding any other\nprovision of this Article 21, a person or combined group exempt from payment of the gross\nreceipts tax under Section 954.1 of Article 12-A-1, as amended from time to time, shall also\nbe exempt from payment of the Early Care and Education Commercial Rents Tax.\n79SAN\nDL W(b) For tax years beginning on or after January 1, 2025, notwithstanding any other provision\nof this Article 21, a "small business enterprise" shall be exempt from payment of the Early Care and\nEducation Commercial Rents Tax. For purposes of this subsection (b), the term "small business\nenterprise" shall mean any person or combined group whose gross receipts within the City, determined\nunder Article 12-A-1, did not exceed $2,325,000, adjusted annually in accordance with the increase in\nthe Consumer Price Index: All Urban Consumers for the San Francisco/Oakland/Hayward Area for All\nItems as reported by the United States Bureau of Labor Statistics, or any successor to that index, as of\nDecember 31 of the calendar year two years prior to the tax year, beginning with tax year 2026, and\nrounded to the nearest $10,000. This subsection (b) shall not apply to a person or combined group\nsubject to a tax on administrative office business activities in Section 953.8 of Article 12-A-1.\nSection 7.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6408
cosine_accuracy@3 0.8155
cosine_accuracy@5 0.8641
cosine_accuracy@10 0.932
cosine_precision@1 0.6408
cosine_precision@3 0.2718
cosine_precision@5 0.1728
cosine_precision@10 0.0932
cosine_recall@1 0.6408
cosine_recall@3 0.8155
cosine_recall@5 0.8641
cosine_recall@10 0.932
cosine_ndcg@10 0.7826
cosine_mrr@10 0.7351
cosine_map@100 0.7398
dot_accuracy@1 0.6408
dot_accuracy@3 0.8155
dot_accuracy@5 0.8641
dot_accuracy@10 0.932
dot_precision@1 0.6408
dot_precision@3 0.2718
dot_precision@5 0.1728
dot_precision@10 0.0932
dot_recall@1 0.6408
dot_recall@3 0.8155
dot_recall@5 0.8641
dot_recall@10 0.932
dot_ndcg@10 0.7826
dot_mrr@10 0.7351
dot_map@100 0.7398

Training Details

Training Dataset

Unnamed Dataset

  • Size: 238 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 5 tokens
    • mean: 41.95 tokens
    • max: 219 tokens
    • min: 63 tokens
    • mean: 426.3 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    What types of businesses are subject to the gross receipts tax in San Francisco, and how is their San Francisco gross receipts calculated? What are the current rates for this tax, and are there any exemptions or scheduled increases? The Way It Is Now
    CHANGES TO BUSINESS TAXES
    The City collects various business taxes on an annual basis including:
    O

    SAN FRANCISCO
    FILED
    2024 MAY 15 PM 3:10
    DEPARTMENT OF ELECTIONS
    A gross receipts tax that is a percentage of a business's San Francisco gross receipts.
    Depending on business type, the City determines a business's San Francisco gross
    receipts based on sales in San Francisco, payroll expenses for employees working there,
    or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming
    years. Rates depend on business type, and higher rates apply as a business generates
    more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million
    are exempt.
    A homelessness gross receipts tax that is an additional tax on businesses with San
    Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%.
    An overpaid executive gross receipts tax that is an additional tax on businesses that pay
    their highest-paid managerial employee much higher than the median compensation they
    pay their San Francisco employees. Rates are between 0.1% and 0.6%.
    A business registration fee that is an additional tax. For most businesses the fee is
    currently between $47 and $45,150, based on business type and amount of gross receipts.
    • An administrative office tax on payroll expenses that certain large businesses pay instead
    of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%,
    and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for
    these businesses currently range from $19,682 to $45,928.
    State law limits the total revenue, including tax revenue, the City may spend each year. The
    voters may approve increases to this limit for up to four years.
    What is the homelessness gross receipts tax, and which businesses are required to pay it? What are the current rates for this tax, and how do they vary based on the amount of San Francisco gross receipts? Are there any exemptions or scheduled increases for this tax? The Way It Is Now
    CHANGES TO BUSINESS TAXES
    The City collects various business taxes on an annual basis including:
    O

    SAN FRANCISCO
    FILED
    2024 MAY 15 PM 3:10
    DEPARTMENT OF ELECTIONS
    A gross receipts tax that is a percentage of a business's San Francisco gross receipts.
    Depending on business type, the City determines a business's San Francisco gross
    receipts based on sales in San Francisco, payroll expenses for employees working there,
    or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming
    years. Rates depend on business type, and higher rates apply as a business generates
    more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million
    are exempt.
    A homelessness gross receipts tax that is an additional tax on businesses with San
    Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%.
    An overpaid executive gross receipts tax that is an additional tax on businesses that pay
    their highest-paid managerial employee much higher than the median compensation they
    pay their San Francisco employees. Rates are between 0.1% and 0.6%.
    A business registration fee that is an additional tax. For most businesses the fee is
    currently between $47 and $45,150, based on business type and amount of gross receipts.
    • An administrative office tax on payroll expenses that certain large businesses pay instead
    of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%,
    and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for
    these businesses currently range from $19,682 to $45,928.
    State law limits the total revenue, including tax revenue, the City may spend each year. The
    voters may approve increases to this limit for up to four years.
    What is the proposed measure that voters may approve to change the City's business taxes in San Francisco? The
    voters may approve increases to this limit for up to four years.
    The Proposal
    The proposed measure would change the City's business taxes to:

    For the gross receipts tax:
    ○ recategorize business types, reducing the number from 14 to seven;
    determine San Francisco gross receipts for some businesses based less on payroll
    expenses and more on sales;
    o change rates to between 0.1% and 3.716%; and
    exempt most businesses with gross receipts up to $5 million (increased by
    inflation).
    Apply the homelessness gross receipts tax on business activities with San Francisco gross
    receipts over $25 million, at rates between 0.162% and 1.64%.
    Modify how the City calculates the overpaid executive gross receipts tax and who pays
    that tax, and set rates between 0.02% and 0.129%.
    Adjust business registration fees to between $55 and $60,000 (increased by inflation).Adjust the administrative office tax rates for certain large businesses to range from 2.97%
    to 3.694%, and the business registration fees for these taxpayers to between $500 and
    $35,000 (increased by inflation).
    Make administrative and other changes to the City's business taxes.
    The homelessness gross receipts tax would continue to fund services for people experiencing
    homelessness and homelessness prevention. The City would use the other taxes for general
    government purposes.
    All these taxes would apply indefinitely until repealed.
    This proposal would increase the City's spending limit for four years.SALITA CO
    2024 MAY 10 PH 1:27
    DEPARTMENT OF ELECTI
    "Local Small Business Tax Cut Ordinance"
    Be it ordained by the People of the City and County of San Francisco:
    NOTE:
    Unchanged Code text and uncodified text are in plain font.
    Additions to Codes are in single-underline italics Times New Roman font.
    Deletions to Codes are in strikethrough italics Times New Roman font.
    Asterisks (* * * *) indicate the omission of unchanged Code
    subsections or parts of tables.
    Section 1. Title. This initiative is known and may be referred to as the "Local Small
    Business Tax Cut Ordinance."
    Section 2. Article 2 of the Business and Tax Regulations Code is hereby amended by
    revising Section 76.3 to read as follows:
    SEC. 76.3.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step cosine_map@100
0 0 0.7167
1.0 24 0.7352
2.0 48 0.7564
3.0 72 0.7669
4.0 96 0.7456
5.0 120 0.7225
6.0 144 0.7319
7.0 168 0.7499
8.0 192 0.7302
9.0 216 0.7254
10.0 240 0.7398

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.0.dev0
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
25
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Areeb-02/bge-small-en-MultiplrRankingLoss-Tax-dataset

Base model

BAAI/bge-small-en
Finetuned
(6)
this model

Evaluation results