jebish7's picture
Add new SentenceTransformer model.
a103801 verified
metadata
base_model: BAAI/bge-small-en-v1.5
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:29545
  - loss:MultipleNegativesSymmetricRankingLoss
widget:
  - source_sentence: >-
      In terms of audited accounts submission for an Applicant, could you
      clarify the scenarios in which the Regulator might agree that a reviewed
      pro forma statement of financial position is not needed, and what factors
      would be considered in making that determination?
    sentences:
      - "DocumentID: 1 | PassageID: 4.2.1.(3) | Passage: Where the regulator in another jurisdiction does not permit the implementation of policies, procedures, systems and controls consistent with these Rules, the Relevant Person must:\n(a)\tinform the Regulator in writing immediately; and\n(b)\tapply appropriate additional measures to manage the money laundering risks posed by the relevant branch or subsidiary."
      - "DocumentID: 11 | PassageID: 2.3.15.(4) | Passage: The Applicant must submit to the Regulator the following records, as applicable:\n(a)\tAudited accounts, for the purposes of this Rule and Rule 2.3.2(1), for the last three full financial years, noting that:\n(i)\tif the Applicant applies for admission less than ninety days after the end of its last financial year, unless the Applicant has audited accounts for its latest full financial year, the accounts may be for the three years to the end of the previous financial year, but must also include audited or reviewed accounts for its most recent semi-annual financial reporting period; and\n(ii)\tif the Applicant applies for admission more than six months and seventy-five days after the end of its last financial year, audited or reviewed accounts for its most recent semi-annual financial reporting period (or longer period if available).\n(b)\tUnless the Regulator agrees it is not needed, a reviewed pro forma statement of financial position. The review must be conducted by an accredited professional auditor of the company or an independent accountant."
      - >
        DocumentID: 36 | PassageID: D.1.3. | Passage: Principle 1 – Oversight
        and responsibility of climate-related financial risk exposures.Certain
        functions related to the management of climate-related financial risks
        may be delegated, but, as with other risks, the board is ultimately
        responsible and accountable for monitoring, managing and overseeing
        climate-related risks for the financial firm.
  - source_sentence: >-
      A financial institution is interested in multiple designations, including
      the ADGM Green Fund and ADGM Green Bond. For each application, what fee
      will the institution incur?
    sentences:
      - >
        DocumentID: 31 | PassageID: 63) | Passage: INITIAL DISCLOSURE OF
        MATERIAL ESTIMATES.

        Disclosure of material estimates of Contingent Resources

        Section 2.3 of the PRMS Guidelines states that Contingent Resources may
        be assigned for Petroleum Projects that are dependent on ‘technology
        under development’, and further recommended that a number of guidelines
        are followed in order to distinguish these estimates from those that
        should be classified as Unrecoverable Petroleum.  By way of Rule
        12.10.1(3), the FSRA fully supports and requires compliance with what is
        set out in the PRMS Guidelines.
      - >
        DocumentID: 19 | PassageID: 40) | Passage: REGULATORY REQUIREMENTS FOR
        AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO
        VIRTUAL ASSETS

        Anti-Money Laundering and Countering Financing of Terrorism

        On 21 June 2019, FATF released a revised Guidance for a Risk-Based
        Approach (RBA) for VAs and VASPs, as well as an Interpretative Note for
        Recommendation 15. This built upon previous FATF statements by
        clarifying a RBA for Anti-Money Laundering and Countering the Financing
        of Terrorism (“AML/CFT”) purposes.   The basic principle underlying the
        FATF Guidelines is that VASPs are expected to “identify, assess, and
        take effective action to mitigate their ML/TF risks” with respect to
        VAs.
      - "DocumentID: 4 | PassageID: 10.1.1 | Passage: A Person applying to the Regulator for any of the following designations:\n(a)\tADGM Green Fund;\n(b)\tADGM Climate Transition Fund;\n(c)\tADGM Green Portfolio;\n(d)\tADGM Climate Transition Portfolio;\n(e)\tADGM Green Bond; or\n(f)\tADGM Sustainability Linked Bond\nmust pay to the Regulator an application fee of $2,000."
  - source_sentence: >-
      How does the ADGM expect Authorised Persons to incorporate the eligibility
      of collateral types into their overall risk management framework,
      particularly concerning Islamic finance principles?
    sentences:
      - >-
        DocumentID: 17 | PassageID: Schedule 1.Part 2.Chapter 5.42.(2) |
        Passage: In determining for the purposes of sub-paragraph ‎(1)‎(b)
        whether Deposits are accepted only on particular occasions, regard is to
        be had to the frequency of those occasions and to any characteristics
        distinguishing them from each other.
      - "DocumentID: 9 | PassageID: 6.8.5 | Passage: \n(a)\tA Fund Manager of an Islamic REIT may obtain financing either directly or through its Special Purpose Vehicle up to 65% of the total gross asset value of the Fund provided that such financing is provided in a Shari'a-compliant manner.\n(b)\tUpon becoming aware that the borrowing limit set out in 6.8.5(a) has been exceeded, the Fund Manager shall:\n(c)\timmediately inform Unitholders and the Regulator of the details of the breach and the proposed remedial action;\n(d)\tuse its best endeavours to reduce the excess borrowings;\n(e)\tnot permit the Fund to engage in additional borrowing; and\n(f)\tinform Unitholders and the Regulator on a regular basis as to the progress of the remedial action."
      - >-
        DocumentID: 9 | PassageID: 5.1.1.Guidance.(ii) | Passage: The prudential
        Category for Islamic Financial Institutions and other Authorised Persons
        (acting through an Islamic Window) undertaking the Regulated Activity of
        Managing PSIAs (which may be either a Restricted PSIA or an Unrestricted
        PSIA) is determined in accordance with PRU Rule 1.3.  An Authorised
        Person which Manages PSIAs (whether as an Islamic Financial Institution
        or through an Islamic Window) must comply with the requirements in PRU
        in relation to specific prudential requirements relating to Trading Book
        and Non-Trading Book activities, including Credit Risk, Market Risk,
        Liquidity Risk and Group Risk.
  - source_sentence: >-
      Can you please detail the specific Anti-Money Laundering (AML) and
      Countering Financing of Terrorism (CFT) measures and controls that our
      firm must have in place when dealing with Spot Commodities as per the
      FSRA's requirements?
    sentences:
      - >
        DocumentID: 34 | PassageID: 65) | Passage: REGULATORY REQUIREMENTS -
        SPOT COMMODITY ACTIVITIES

        Sanctions

        Pursuant to AML Rule 11.2.1(1), an Authorised Person must have
        arrangements in place to ensure that only Spot Commodities that are not
        subject to sanctions or associated with an entity in the supply chain
        that is itself subject to a sanction, are used as part of its Regulated
        Activities, or utilised as part of a delivery and/or storage facility
        operated by itself (or by any third parties it uses).  In demonstrating
        compliance with the Rule, an Authorised Person must have powers to
        resolve any breach in a timely fashion, such as taking emergency action
        itself or by compelling the delivery and/or storage facility to take
        appropriate action.  The FSRA expects this to include the Authorised
        Person having the ability to sanction a Member, market participant or
        the delivery and/or storage facility for acts or omissions that
        compromise compliance with applicable sanctions.
      - "DocumentID: 18 | PassageID: 3.2 | Passage: Financial Services Permissions. VC Managers operating in ADGM require a Financial Services Permission (“FSP”) to undertake any Regulated Activity pertaining to VC Funds and/or co-investments by third parties in VC Funds. The Regulated Activities covered by the FSP will be dependent on the VC Managers’ investment strategy and business model.\n(a)\tManaging a Collective Investment Fund: this includes carrying out fund management activities in respect of a VC Fund.\n(b)\tAdvising on Investments or Credit : for VC Managers these activities will be restricted to activities related to co-investment alongside a VC Fund which the VC Manager manages, such as recommending that a client invest in an investee company alongside the VC Fund and on the strategy and structure required to make the investment.\n(c)\tArranging Deals in Investments: VC Managers may also wish to make arrangements to facilitate co-investments in the investee company.\nAuthorisation fees and supervision fees for a VC Manager are capped at USD 10,000 regardless of whether one or both of the additional Regulated Activities in b) and c) above in relation to co-investments are included in its FSP. The FSP will include restrictions appropriate to the business model of a VC Manager."
      - >-
        DocumentID: 24 | PassageID: 3.9 | Passage: Principle 2 – High Standards
        for Authorisation. This discerning approach is shown by the FSRA’s power
        to only permit VAs that it deems ‘acceptable’, as determined by risk
        factors such as security and traceability, in order to prevent the
        build-up of risk from illiquid or immature assets. Additionally, we do
        not permit stablecoins based on the algorithmic model of valuation to
        the underlying fiat currency.
  - source_sentence: >-
      What are the common scenarios or instances where assets and liabilities
      are not covered by the bases of accounting in Rule 5.3.2, and how should
      an Insurer address these in their reporting?
    sentences:
      - >-
        DocumentID: 1 | PassageID: 14.4.1.Guidance.1. | Passage: Relevant
        Persons are reminded that in accordance with Federal AML Legislation,
        Relevant Persons or any of their Employees must not tip off any Person,
        that is, inform any Person that he is being scrutinised, or investigated
        by any other competent authority, for possible involvement in suspicious
        Transactions or activity related to money laundering or terrorist
        financing.
      - "DocumentID: 12 | PassageID: 5.3.1.Guidance | Passage: \nThe exceptions provided in this Chapter relate to the following:\na.\tspecific Rules in respect of certain assets and liabilities, intended to achieve a regulatory objective not achieved by application of either or both of the bases of accounting set out in Rule ‎5.3.2;\nb.\tassets and liabilities that are not dealt with in either or both of the bases of accounting set out in Rule ‎5.3.2; and\nc.\tthe overriding power of the Regulator, set out in Rule ‎5.1.6, to require an Insurer to adopt a particular measurement for a specific asset or liability."
      - >+
        DocumentID: 1 | PassageID: 6.2.1.Guidance.2. | Passage: The risk
        assessment under Rule ‎6.2.1(c) should identify actions to mitigate
        risks associated with undertaking NFTF business generally, and the use
        of eKYC specifically. This is because distinct risks are often likely to
        arise where business is conducted entirely in an NFTF manner, compared
        to when the business relationship includes a mix of face-to-face and
        NFTF interactions. The assessment should make reference to risk
        mitigation measures recommended by the Regulator, a competent authority
        of the U.A.E., FATF, and other relevant bodies.

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jebish7/bge-small-en-v1.5_MNSR_10")
# Run inference
sentences = [
    'What are the common scenarios or instances where assets and liabilities are not covered by the bases of accounting in Rule 5.3.2, and how should an Insurer address these in their reporting?',
    'DocumentID: 12 | PassageID: 5.3.1.Guidance | Passage: \nThe exceptions provided in this Chapter relate to the following:\na.\tspecific Rules in respect of certain assets and liabilities, intended to achieve a regulatory objective not achieved by application of either or both of the bases of accounting set out in Rule \u200e5.3.2;\nb.\tassets and liabilities that are not dealt with in either or both of the bases of accounting set out in Rule \u200e5.3.2; and\nc.\tthe overriding power of the Regulator, set out in Rule \u200e5.1.6, to require an Insurer to adopt a particular measurement for a specific asset or liability.',
    'DocumentID: 1 | PassageID: 14.4.1.Guidance.1. | Passage: Relevant Persons are reminded that in accordance with Federal AML Legislation, Relevant Persons or any of their Employees must not tip off any Person, that is, inform any Person that he is being scrutinised, or investigated by any other competent authority, for possible involvement in suspicious Transactions or activity related to money laundering or terrorist financing.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 29,545 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 16 tokens
    • mean: 34.95 tokens
    • max: 68 tokens
    • min: 35 tokens
    • mean: 132.0 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    If a financial institution offers Money Remittance as one of its services, under what circumstances is it deemed to be holding Relevant Money and therefore subject to regulatory compliance (a)? DocumentID: 13
    What are the consequences for a Recognised Body or Authorised Person if they fail to comply with ADGM's requirements regarding severance payments? DocumentID: 7
    If a Public Fund is structured as an Investment Trust, to whom should the Fund Manager report the review findings regarding delegated Regulated Activities or outsourced functions? DocumentID: 6
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.2165 100 1.4357
0.4329 200 0.9589
0.6494 300 0.9193
0.8658 400 0.8542
1.0823 500 0.8643
1.2987 600 0.8135
1.5152 700 0.7658
1.7316 800 0.7454
1.9481 900 0.7477
2.1645 1000 0.7586
2.3810 1100 0.6978
2.5974 1200 0.7152
2.8139 1300 0.6866
0.2165 100 0.7049
0.4329 200 0.6651
0.6494 300 0.6942
0.8658 400 0.6695
1.0823 500 0.7048
1.2987 600 0.636
1.5152 700 0.5984
1.7316 800 0.6001
1.9481 900 0.6096
2.1645 1000 0.6313
2.3810 1100 0.5437
2.5974 1200 0.5716
2.8139 1300 0.5634

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}