SentenceTransformer based on microsoft/mpnet-base
This is a sentence-transformers model finetuned from microsoft/mpnet-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/mpnet-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Areeb-02/mpnet-base-GISTEmbedLoss-MSEE_Evaluator-salestax-docs")
# Run inference
sentences = [
'Based on the context information provided, what are the different gross receipts tax rates for businesses in San Francisco for tax years 2022, 2023, and 2024?',
'$9.75 per $1,000) for taxable gross receipts over $25,000,000\n44SANCO\n2024 NAY LO\n(D) For tax year 2024 if the Controller certifies under Section 953.10 that the\nDEPARTMENT OF\n95% gross receipts threshold has been met for tax year 2024, and for tax years beginning on or after\nJanuary 1, 2025:\n0.814% (e.g. $8.14 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.853% (e.g. $8.53 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.93% (e.g. $9.30 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n1.008% (e.g. $10.08 per $1,000) for taxable gross receipts over $25,000,000\n(3) For all business activities not otherwise exempt and not elsewhere\nsubjected to a gross receipts tax rate or an administrative office tax by this Article 12-A-1:\n(B) For tax years 2022 and, if the Controller does not certify under\nSection 953.10 that the 90% gross receipts threshold has been met for tax year 2023, for tax\nyear 2023:\n0.788% (e.g. $7.88 per $1,000) for taxable gross receipts between $0 and $1,000,000\n0.825% (e.g. $8.25 per $1,000) for taxable gross receipts between $1,000,000.01 and\n$2,500,000\n0.9% (e.g. $9 per $1,000) for taxable gross receipts between $2,500,000.01 and\n$25,000,000\n0.975% (e.g. $9.75 per $1,000) for taxable gross receipts over $25,000,000\n(C) For tax year 2023 if the Controller certifies under Section 953.10 that the\n90% gross receipts threshold has been met for tax year 2023,',
'(d) In no event shall the credit under this Section 960.4 reduce a person or combined group\'s\nGross Receipts Tax liability to less than $0 for any tax year. The credit under this Section shall not be\nrefundable and may not be carried forward to a subsequent year.\nSEC. 966. CONTROLLER REPORTS.\nThe Controller shall prepare reports by September 1, 2026, and September 1, 2027,\nrespectively, that discuss current economic conditions in the City and the performance of the tax system\nrevised by the voters in the ordinance adding this Section 966.\nSection 6. Article 21 of the Business and Tax Regulations Code is hereby amended by\nrevising Section 2106 to read as follows:\nSEC. 2106. SMALL BUSINESS EXEMPTION.\n(a) For tax years ending on or before December 31, 2024, nNotwithstanding any other\nprovision of this Article 21, a person or combined group exempt from payment of the gross\nreceipts tax under Section 954.1 of Article 12-A-1, as amended from time to time, shall also\nbe exempt from payment of the Early Care and Education Commercial Rents Tax.\n79SAN\nDL W(b) For tax years beginning on or after January 1, 2025, notwithstanding any other provision\nof this Article 21, a "small business enterprise" shall be exempt from payment of the Early Care and\nEducation Commercial Rents Tax. For purposes of this subsection (b), the term "small business\nenterprise" shall mean any person or combined group whose gross receipts within the City, determined\nunder Article 12-A-1, did not exceed $2,325,000, adjusted annually in accordance with the increase in\nthe Consumer Price Index: All Urban Consumers for the San Francisco/Oakland/Hayward Area for All\nItems as reported by the United States Bureau of Labor Statistics, or any successor to that index, as of\nDecember 31 of the calendar year two years prior to the tax year, beginning with tax year 2026, and\nrounded to the nearest $10,000. This subsection (b) shall not apply to a person or combined group\nsubject to a tax on administrative office business activities in Section 953.8 of Article 12-A-1.\nSection 7.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Knowledge Distillation
- Dataset:
stsb-dev
- Evaluated with
MSEEvaluator
Metric | Value |
---|---|
negative_mse | -2.4282 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 238 training samples
- Columns:
sentence1
andsentence2
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 type string string details - min: 5 tokens
- mean: 41.95 tokens
- max: 219 tokens
- min: 63 tokens
- mean: 426.3 tokens
- max: 512 tokens
- Samples:
sentence1 sentence2 What types of businesses are subject to the gross receipts tax in San Francisco, and how is their San Francisco gross receipts calculated? What are the current rates for this tax, and are there any exemptions or scheduled increases?
The Way It Is Now
CHANGES TO BUSINESS TAXES
The City collects various business taxes on an annual basis including:
O
•
SAN FRANCISCO
FILED
2024 MAY 15 PM 3:10
DEPARTMENT OF ELECTIONS
A gross receipts tax that is a percentage of a business's San Francisco gross receipts.
Depending on business type, the City determines a business's San Francisco gross
receipts based on sales in San Francisco, payroll expenses for employees working there,
or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming
years. Rates depend on business type, and higher rates apply as a business generates
more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million
are exempt.
A homelessness gross receipts tax that is an additional tax on businesses with San
Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%.
An overpaid executive gross receipts tax that is an additional tax on businesses that pay
their highest-paid managerial employee much higher than the median compensation they
pay their San Francisco employees. Rates are between 0.1% and 0.6%.
A business registration fee that is an additional tax. For most businesses the fee is
currently between $47 and $45,150, based on business type and amount of gross receipts.
• An administrative office tax on payroll expenses that certain large businesses pay instead
of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%,
and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for
these businesses currently range from $19,682 to $45,928.
State law limits the total revenue, including tax revenue, the City may spend each year. The
voters may approve increases to this limit for up to four years.What is the homelessness gross receipts tax, and which businesses are required to pay it? What are the current rates for this tax, and how do they vary based on the amount of San Francisco gross receipts? Are there any exemptions or scheduled increases for this tax?
The Way It Is Now
CHANGES TO BUSINESS TAXES
The City collects various business taxes on an annual basis including:
O
•
SAN FRANCISCO
FILED
2024 MAY 15 PM 3:10
DEPARTMENT OF ELECTIONS
A gross receipts tax that is a percentage of a business's San Francisco gross receipts.
Depending on business type, the City determines a business's San Francisco gross
receipts based on sales in San Francisco, payroll expenses for employees working there,
or both. Rates range from 0.053% to 1.008% and are scheduled to increase in coming
years. Rates depend on business type, and higher rates apply as a business generates
more gross receipts. For 2023, most businesses with gross receipts up to $2.19 million
are exempt.
A homelessness gross receipts tax that is an additional tax on businesses with San
Francisco gross receipts over $50 million. Rates range from 0.175% to 0.69%.
An overpaid executive gross receipts tax that is an additional tax on businesses that pay
their highest-paid managerial employee much higher than the median compensation they
pay their San Francisco employees. Rates are between 0.1% and 0.6%.
A business registration fee that is an additional tax. For most businesses the fee is
currently between $47 and $45,150, based on business type and amount of gross receipts.
• An administrative office tax on payroll expenses that certain large businesses pay instead
of these other business taxes. The combined rates in 2024 range from 3.04% to 5.44%,
and in 2025 are scheduled to range from 3.11% to 5.51%. Business registration fees for
these businesses currently range from $19,682 to $45,928.
State law limits the total revenue, including tax revenue, the City may spend each year. The
voters may approve increases to this limit for up to four years.What is the proposed measure that voters may approve to change the City's business taxes in San Francisco?
The
voters may approve increases to this limit for up to four years.
The Proposal
The proposed measure would change the City's business taxes to:
•
For the gross receipts tax:
○ recategorize business types, reducing the number from 14 to seven;
determine San Francisco gross receipts for some businesses based less on payroll
expenses and more on sales;
o change rates to between 0.1% and 3.716%; and
exempt most businesses with gross receipts up to $5 million (increased by
inflation).
Apply the homelessness gross receipts tax on business activities with San Francisco gross
receipts over $25 million, at rates between 0.162% and 1.64%.
Modify how the City calculates the overpaid executive gross receipts tax and who pays
that tax, and set rates between 0.02% and 0.129%.
Adjust business registration fees to between $55 and $60,000 (increased by inflation).Adjust the administrative office tax rates for certain large businesses to range from 2.97%
to 3.694%, and the business registration fees for these taxpayers to between $500 and
$35,000 (increased by inflation).
Make administrative and other changes to the City's business taxes.
The homelessness gross receipts tax would continue to fund services for people experiencing
homelessness and homelessness prevention. The City would use the other taxes for general
government purposes.
All these taxes would apply indefinitely until repealed.
This proposal would increase the City's spending limit for four years.SALITA CO
2024 MAY 10 PH 1:27
DEPARTMENT OF ELECTI
"Local Small Business Tax Cut Ordinance"
Be it ordained by the People of the City and County of San Francisco:
NOTE:
Unchanged Code text and uncodified text are in plain font.
Additions to Codes are in single-underline italics Times New Roman font.
Deletions to Codes are in strikethrough italics Times New Roman font.
Asterisks (* * * *) indicate the omission of unchanged Code
subsections or parts of tables.
Section 1. Title. This initiative is known and may be referred to as the "Local Small
Business Tax Cut Ordinance."
Section 2. Article 2 of the Business and Tax Regulations Code is hereby amended by
revising Section 76.3 to read as follows:
SEC. 76.3. - Loss:
GISTEmbedLoss
with these parameters:{'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.01}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 1warmup_ratio
: 0.1
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | stsb-dev_negative_mse |
---|---|---|
0 | 0 | -2.4282 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.3.0+cu121
- Accelerate: 0.31.0
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
GISTEmbedLoss
@misc{solatorio2024gistembed,
title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
author={Aivin V. Solatorio},
year={2024},
eprint={2402.16829},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 22
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Areeb-02/mpnet-base-GISTEmbedLoss-MSEE_Evaluator-salestax-docs
Base model
microsoft/mpnet-base