leelandzhang's picture
Add new SentenceTransformer model.
0992837 verified
|
raw
history blame
150 kB
metadata
base_model: SQAI/bge-embedding-model
datasets: []
language:
  - en
library_name: sentence-transformers
license: apache-2.0
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1865
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: threshold.highLuxThreshold
    sentences:
      - >-
        "Can you provide the timestamp of the last update to the threshold
        settings, and detail any faults in the lux module related to light level
        sensing and control for the streetlight on this specific street name? I
        also want to know the longitude of the streetlight. And also, can you
        tell me what type of dimming schedule is applied to the streetlight, the
        type of port used for its dimming controls, and the total energy it has
        consumed, recorded in kilowatt-hours. Lastly, could you also provide the
        timestamp of the recorded streetlighting error, and confirm the status
        of the relay responsible for turning this streetlight on and off, as I
        am suspecting it might be sticking?"
      - >-
        "Can you provide me with the unique streetlight identifier, upper lux
        level for managing light intensity, a brief description, and the delta
        or height of the grid area occupied by a group of streetlights? Also,
        can you note the AC voltage supply for these streetlights, any issues
        with communication related to their lux sensors, and the count of how
        many times each streetlight has been switched on? Please ensure that the
        data is constrained to just those that can be determined with the unique
        streetlight identifier I provided."
      - >-
        "What was the last recorded data or action timestamp of the streetlight
        located at the specific longitude, and in which time zone is it
        situated? Could you also provide information on its default dimming
        level and the maximum power usage threshold above which indicates
        potential faults? Are there any identified faults in the lux module
        impacting light level sensing and control? Additionally, what are the
        minimum longitude and delta or height for the grid area occupied by this
        group of streetlights and could you specify the network time received
        from the central control for synchronization purposes?"
  - source_sentence: asset.geoZone
    sentences:
      - >-
        "Could you check the status of the streetlight with the unique
        identifier, located on the named street, specifically looking at any
        records of complete loss of power which could indicate supply issues or
        damage? Also, could you provide details on the instances where the
        voltage under load is lower than expected, as well as instances of lower
        than expected power consumption, which could signal potential electrical
        or hardware issues? I'm also interested in understanding if there are
        any faults in our link control mechanism managing multiple streetlights.
        Additionally, could you tell me the current drawn by this specific
        streetlight when it was lower than expected and the current dimming
        level of the streetlight in operation? Lastly, could you specify the
        maximum safe voltage under load conditions for this light and verify
        whether its broadcast subscription used for receiving control signals is
        doing fine?"
      - >-
        "Can you provide me with the details regarding a specific streetlight on
        Main Street, particularly the minimum current level below which it's
        considered abnormal, its power factor indicating efficient power usage,
        total operational hours logged, any incidences where power consumption
        was higher than expected possibly due to potential faults, its geoZone,
        X-coordinate in the grid layout, minimum operational voltage under load
        conditions, minimum load current that indicates suboptimal performance,
        and the timestamp of the last update made to the threshold settings?"
      - >-
        "What is the width and height of the grid area occupied by the group of
        streetlights, type of port used for dimming controls, power consumption
        levels, and what is the safety of the current exceeded on the
        streetlight? Besides, could you explain the high power factor indicating
        potential overloads or capacitive imbalances?"
  - source_sentence: errors.deviceId
    sentences:
      - >-
        "Can you show me a report of all the streetlights with a unique
        identifier, which have an internal temperature indicating abnormal
        operating conditions such as voltage supplied being below the safe
        level, and operating temperature below expected limit possibly due to
        environmental conditions? Can this report also include instances of
        faults in link control mechanism managing multiple streetlights and
        cases of open circuit in the relay preventing normal operation?"
      - >-
        "Could you provide information about the streetlight on 'specific street
        name', specifically concerning its current drawn which appears to be
        lower than expected, potential issues in the link control mechanism that
        manages multiple streetlights, whether its operating temperature exceeds
        safe limits thus risking damage, and if its power output is lower than
        expected? Also, could you let me know at what interval this streetlight
        sends data reports and inform about any other issues detected,
        particularly when the current is below the expected range?"
      - >-
        "What is the minimum power usage level below which it is considered
        abnormal for our 'Main Street Lamps' group of streetlights, which are
        described as a series of LED lamps installed along the main town
        stretch, and what could be the reasons if the power consumption is lower
        than expected, possibly due to hardware issues? Also, could you give me
        the description on what means when intermittent flashing of the
        streetlight occurs, indicating instability and tell me about the
        strength of the wireless signal received by the streetlight's
        communication module. Could you confirm what control mode switch
        identifier we should use for changing streetlight settings and the
        highest power factor that is considered optimal for streetlight
        efficiency? Additionally, we discovered issues with group management of
        streetlights via our central control system, and we would like to know
        the time taken for the streetlight to activate or light up from the
        command."
  - source_sentence: threshold.lowLoadVoltage
    sentences:
      - >
        "Could you please show me the latest data recorded or action performed
        by the streetlight, specifically highlighting the control mode switch
        identifier used for changing its settings, the type of DALI dimming
        protocol it uses, and the type of port used for its dimming controls?
        Furthermore, has there been any intermittent flashing indicating
        instability? Also, could you provide data on its minimum operational
        voltage under load conditions, and let me know if its power consumption
        is lower than expected due to potential hardware issues?"
      - >-
        "Can the operator managing the streetlight provide the timestamp of the
        latest data recorded or action performed by the streetlight, details on
        the minimum operational voltage under load conditions, the current
        issues with the driver that powers and controls the streetlight, why the
        power output is lower than expected for the streetlight, and what is the
        maximum latitude of the geographic area covered by this group of
        streetlights?"
      - >-
        "Can you provide a report that shows all the streetlights in a grid
        layout with Y-coordinate information, indicating whether their control
        mode setting is on automated or manual, their minimum current level, and
        instances of communication issues between the streetlight's driver and
        the control system, as well as instances when the operating temperature
        fell below expected limits, possibly due to environmental conditions?"
  - source_sentence: errors.controllerFault.lowLoadCurrent
    sentences:
      - >-
        "Can you provide me with the current status of the streetlight on
        'street name', specifically in relation to its voltage under load,
        whether it's lower than expected and how that might be indicating
        potential electrical issues? Could you also give me insight into the
        current drawn by the streetlight, whether or not the relay is currently
        on or off, and if there are any faults in the lux module that may affect
        light level sensing and control? Moreover, could you tell me the type of
        dimming schedule applied, the ambient light level detected in lux, the
        total energy consumed so far recorded in kilowatt-hours, and the lower
        voltage threshold for this streetlight's efficient operation?"
      - >-
        "Can you provide a detailed report for the streetlight on [Name of the
        street for the streetlight in error]? The report should include the
        timestamp of the last recorded error, synchronization time received from
        the central control, the dimming schedule type we're currently using,
        and both minimum operational and maximum safe voltage under load
        conditions. Also, indicate the time of the last action was recorded and
        if there are any reported faults in the metering components affecting
        data reporting. Can you also specify the port type used for dimming
        controls and whether the power consumption has been unusually low due to
        potential hardware issues?"
      - >-
        "Can you show me the current status of the relay in the streetlights
        located at the X-coordinate grid, highlighting any faults in the lux
        module that might be affecting light level sensing and control? Also,
        could you provide information on the current dimming level of these
        streetlights in operation, the type of dimming schedule applied, and
        whether the voltage is within the upper limit considered safe and
        efficient for their operation?"
model-index:
  - name: BGE base Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.014423076923076924
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0014423076923076926
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.014423076923076924
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.004284253930989665
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.001549145299145299
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.005857063109582476
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.014423076923076924
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0014423076923076926
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.014423076923076924
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.004284253930989665
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.001549145299145299
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.005857063109582476
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.014423076923076924
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0014423076923076926
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.014423076923076924
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.0043536523979211435
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.0016159188034188035
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.005708010488423065
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.009615384615384616
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0009615384615384616
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.009615384615384616
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.0030498236971024735
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.001221001221001221
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.005185692544152747
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.019230769230769232
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0019230769230769232
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.019230769230769232
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.005956216500485246
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.0023027319902319903
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.0051874402718147935
            name: Cosine Map@100

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from SQAI/bge-embedding-model. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: SQAI/bge-embedding-model
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SQAI/bge-embedding-model2")
# Run inference
sentences = [
    'errors.controllerFault.lowLoadCurrent',
    '"Can you provide me with the current status of the streetlight on \'street name\', specifically in relation to its voltage under load, whether it\'s lower than expected and how that might be indicating potential electrical issues? Could you also give me insight into the current drawn by the streetlight, whether or not the relay is currently on or off, and if there are any faults in the lux module that may affect light level sensing and control? Moreover, could you tell me the type of dimming schedule applied, the ambient light level detected in lux, the total energy consumed so far recorded in kilowatt-hours, and the lower voltage threshold for this streetlight\'s efficient operation?"',
    '"Can you show me the current status of the relay in the streetlights located at the X-coordinate grid, highlighting any faults in the lux module that might be affecting light level sensing and control? Also, could you provide information on the current dimming level of these streetlights in operation, the type of dimming schedule applied, and whether the voltage is within the upper limit considered safe and efficient for their operation?"',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0
cosine_accuracy@10 0.0144
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0
cosine_precision@10 0.0014
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0
cosine_recall@10 0.0144
cosine_ndcg@10 0.0043
cosine_mrr@10 0.0015
cosine_map@100 0.0059

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0
cosine_accuracy@10 0.0144
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0
cosine_precision@10 0.0014
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0
cosine_recall@10 0.0144
cosine_ndcg@10 0.0043
cosine_mrr@10 0.0015
cosine_map@100 0.0059

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0
cosine_accuracy@10 0.0144
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0
cosine_precision@10 0.0014
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0
cosine_recall@10 0.0144
cosine_ndcg@10 0.0044
cosine_mrr@10 0.0016
cosine_map@100 0.0057

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0
cosine_accuracy@10 0.0096
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0
cosine_precision@10 0.001
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0
cosine_recall@10 0.0096
cosine_ndcg@10 0.003
cosine_mrr@10 0.0012
cosine_map@100 0.0052

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0
cosine_accuracy@10 0.0192
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0
cosine_precision@10 0.0019
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0
cosine_recall@10 0.0192
cosine_ndcg@10 0.006
cosine_mrr@10 0.0023
cosine_map@100 0.0052

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,865 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 5 tokens
    • mean: 7.68 tokens
    • max: 14 tokens
    • min: 17 tokens
    • mean: 89.79 tokens
    • max: 187 tokens
  • Samples:
    positive anchor
    threshold.lowLoadVoltage "What is the maximum current level above which it is considered unsafe for a specific streetlight in my area, what is the minimum longitude of the geographic area this streetlight covers, is this streetlight's control mode automated or manually controlled, also, can you provide the delta or width of the grid area occupied by this group of streetlights, what is the level of AC voltage supply to this streetlight, what's the lower voltage threshold below which this streetlight may not operate efficiently, how many times has this streetlight been switched on, what is the minimum operational voltage under load conditions, and finally, what is the latitude of this streetlight?"
    asset.id "Could you please tell me the scheduled dimming settings for the string stored streetlights, troubleshoot why these streetlights remain on during daylight hours, and confirm if this could be due to sensor faults? Also, I'd like to know the identifier for the parent group to which this group of streetlights belongs, and the IMEI number of the streetlight device."
    errors.controllerFault.highPower "Can you provide an analysis of the efficiency of power usage by examining the power factor of the streetlights, especially in areas of the grid with high Y-coordinates, highlight instances where power consumption is significantly higher than expected which may indicate faults, identify situations where voltage under load is above safe levels, and assess if there are any problems with our central control system's ability to manage streetlight groups?"
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 208 evaluation samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 5 tokens
    • mean: 7.55 tokens
    • max: 14 tokens
    • min: 19 tokens
    • mean: 90.69 tokens
    • max: 187 tokens
  • Samples:
    positive anchor
    log.controlModeSwitch "Can you provide the control mode switch identifier used for changing the default dimming level set for a specific group of streetlights, identified by their unique identifier, considering the time taken for the streetlight to activate or light up from the command, and possibly troubleshoot why the power consumption is lower than expected which could be due to hardware issues, quite possibly due to the relay responsible for turning the streetlight on and off sticking?"
    errors.controllerFault.luxModuleFault "Can you provide the timestamp of the last update to the threshold settings, and detail any faults in the lux module related to light level sensing and control for the streetlight on this specific street name? I also want to know the longitude of the streetlight. And also, can you tell me what type of dimming schedule is applied to the streetlight, the type of port used for its dimming controls, and the total energy it has consumed, recorded in kilowatt-hours. Lastly, could you also provide the timestamp of the recorded streetlighting error, and confirm the status of the relay responsible for turning this streetlight on and off, as I am suspecting it might be sticking?"
    threshold.lowLoadCurrent "What is the maximum safe voltage under load conditions for the city's streetlights, and do we possess the necessary rights to link these streetlights for synchronized control? Could you provide me with the timestamp of the latest data or action performed by our streetlights, and tell me the lower lux level threshold at which we would need to consider additional lighting? How often does each streetlight send a data report in normal operation, and what is the minimum load current level where we might start seeing suboptimal functioning? Have we been experiencing any problems with managing groups of streetlights via the central control system? Also, has there been any instances where the current under load was excessively high, indicating possible overloads, or situations where the operation temperature was belo normal limits due to environmental conditions? Lastly, have there been any noted communication issues between the streetlight's driver and the control system?"
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-06
  • weight_decay: 0.03
  • num_train_epochs: 200
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0.03
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 200
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.2712 1 13.2713 - - - - - -
0.5424 2 13.2895 - - - - - -
0.8136 3 9.9139 - - - - - -
1.0847 4 5.6117 - - - - - -
1.3559 5 4.7571 - - - - - -
1.6271 6 5.5215 - - - - - -
1.8983 7 5.7945 - - - - - -
2.1695 8 5.7064 - - - - - -
2.4407 9 5.6794 - - - - - -
2.7119 10 5.7384 - - - - - -
2.9831 11 5.6081 - - - - - -
3.2542 12 5.5278 - - - - - -
3.5254 13 5.149 - - - - - -
3.7966 14 5.5904 5.6043 0.0081 0.0072 0.0079 0.0055 0.0079
1.0169 15 3.9458 - - - - - -
1.2881 16 13.3653 - - - - - -
1.5593 17 13.4413 - - - - - -
1.8305 18 9.4188 - - - - - -
2.1017 19 5.717 - - - - - -
2.3729 20 5.2455 - - - - - -
2.6441 21 5.2117 - - - - - -
2.9153 22 5.5217 - - - - - -
3.1864 23 5.6725 - - - - - -
3.4576 24 5.786 - - - - - -
3.7288 25 5.6507 - - - - - -
4.0 26 5.7215 - - - - - -
4.2712 27 5.3999 - - - - - -
4.5424 28 5.4275 - - - - - -
4.8136 29 5.7143 5.5718 0.0082 0.0071 0.0077 0.0052 0.0077
2.0339 30 4.478 - - - - - -
2.3051 31 13.1821 - - - - - -
2.5763 32 13.2473 - - - - - -
2.8475 33 8.8654 - - - - - -
3.1186 34 5.3181 - - - - - -
3.3898 35 5.2091 - - - - - -
3.6610 36 5.6027 - - - - - -
3.9322 37 5.6839 - - - - - -
4.2034 38 5.5955 - - - - - -
4.4746 39 5.5786 - - - - - -
4.7458 40 5.4509 - - - - - -
5.0169 41 5.3361 - - - - - -
5.2881 42 5.1608 - - - - - -
5.5593 43 5.4896 - - - - - -
5.8305 44 5.6466 5.5241 0.0062 0.0070 0.0076 0.0095 0.0076
3.0508 45 4.5617 - - - - - -
3.3220 46 13.0665 - - - - - -
3.5932 47 13.1848 - - - - - -
3.8644 48 8.4053 - - - - - -
4.1356 49 5.2706 - - - - - -
4.4068 50 5.4269 - - - - - -
4.6780 51 5.3645 - - - - - -
4.9492 52 5.3587 - - - - - -
5.2203 53 5.1047 - - - - - -
5.4915 54 5.743 - - - - - -
5.7627 55 5.3754 - - - - - -
6.0339 56 5.3021 - - - - - -
6.3051 57 5.6983 - - - - - -
6.5763 58 5.302 - - - - - -
6.8475 59 5.4545 5.4638 0.0060 0.0070 0.0077 0.0094 0.0077
4.0678 60 5.2213 - - - - - -
4.3390 61 12.9854 - - - - - -
4.6102 62 13.207 - - - - - -
4.8814 63 7.7493 - - - - - -
5.1525 64 5.3787 - - - - - -
5.4237 65 4.9406 - - - - - -
5.6949 66 5.3963 - - - - - -
5.9661 67 5.3429 - - - - - -
6.2373 68 5.292 - - - - - -
6.5085 69 5.6738 - - - - - -
6.7797 70 5.5927 - - - - - -
7.0508 71 5.5245 - - - - - -
7.3220 72 4.8334 - - - - - -
7.5932 73 5.2015 - - - - - -
7.8644 74 5.5393 5.3954 0.0060 0.0071 0.0078 0.0094 0.0078
5.0847 75 5.6168 - - - - - -
5.3559 76 12.8678 - - - - - -
5.6271 77 13.2377 - - - - - -
5.8983 78 7.1882 - - - - - -
6.1695 79 5.1293 - - - - - -
6.4407 80 4.9413 - - - - - -
6.7119 81 5.1763 - - - - - -
6.9831 82 4.9512 - - - - - -
7.2542 83 5.2744 - - - - - -
7.5254 84 5.0573 - - - - - -
7.7966 85 5.1938 - - - - - -
8.0678 86 5.1514 - - - - - -
8.3390 87 4.9808 - - - - - -
8.6102 88 4.9983 - - - - - -
8.8814 89 5.3211 5.3268 0.0062 0.0067 0.0075 0.0095 0.0075
6.1017 90 6.1513 - - - - - -
6.3729 91 12.7972 - - - - - -
6.6441 92 13.0051 - - - - - -
6.9153 93 6.551 - - - - - -
7.1864 94 4.6644 - - - - - -
7.4576 95 4.8619 - - - - - -
7.7288 96 5.0812 - - - - - -
8.0 97 4.758 - - - - - -
8.2712 98 5.1362 - - - - - -
8.5424 99 5.5405 - - - - - -
8.8136 100 5.228 - - - - - -
9.0847 101 5.1084 - - - - - -
9.3559 102 5.1574 - - - - - -
9.6271 103 5.3326 - - - - - -
9.8983 104 5.34 5.2658 0.0060 0.0066 0.0076 0.0052 0.0076
7.1186 105 6.5789 - - - - - -
7.3898 106 12.7557 - - - - - -
7.6610 107 13.0203 - - - - - -
7.9322 108 5.7148 - - - - - -
8.2034 109 4.7945 - - - - - -
8.4746 110 4.5926 - - - - - -
8.7458 111 4.6727 - - - - - -
9.0169 112 5.0886 - - - - - -
9.2881 113 5.0562 - - - - - -
9.5593 114 5.2167 - - - - - -
9.8305 115 5.048 - - - - - -
10.1017 116 4.7765 - - - - - -
10.3729 117 4.9875 - - - - - -
10.6441 118 4.9501 - - - - - -
10.9153 119 4.756 5.2124 0.0057 0.0065 0.0075 0.0054 0.0075
8.1356 120 6.9381 - - - - - -
8.4068 121 12.7916 - - - - - -
8.6780 122 12.8517 - - - - - -
8.9492 123 5.51 - - - - - -
9.2203 124 4.686 - - - - - -
9.4915 125 4.6611 - - - - - -
9.7627 126 5.2767 - - - - - -
10.0339 127 4.6103 - - - - - -
10.3051 128 4.957 - - - - - -
10.5763 129 5.0236 - - - - - -
10.8475 130 5.0894 - - - - - -
11.1186 131 4.7025 - - - - - -
11.3898 132 5.0765 - - - - - -
11.6610 133 4.6601 - - - - - -
11.9322 134 4.9064 5.1731 0.0056 0.0060 0.0070 0.0054 0.0070
9.1525 135 7.5884 - - - - - -
9.4237 136 12.679 - - - - - -
9.6949 137 12.417 - - - - - -
9.9661 138 5.1632 - - - - - -
10.2373 139 4.9486 - - - - - -
10.5085 140 4.6341 - - - - - -
10.7797 141 4.9664 - - - - - -
11.0508 142 4.9567 - - - - - -
11.3220 143 4.7532 - - - - - -
11.5932 144 5.2556 - - - - - -
11.8644 145 4.9652 - - - - - -
12.1356 146 4.8118 - - - - - -
12.4068 147 4.704 - - - - - -
12.6780 148 4.8922 - - - - - -
12.9492 149 4.6571 5.1441 0.0061 0.0055 0.0064 0.0053 0.0064
10.1695 150 8.1284 - - - - - -
10.4407 151 12.5703 - - - - - -
10.7119 152 11.8696 - - - - - -
10.9831 153 4.8543 - - - - - -
11.2542 154 4.8099 - - - - - -
11.5254 155 4.7009 - - - - - -
11.7966 156 4.7986 - - - - - -
12.0678 157 4.7973 - - - - - -
12.3390 158 4.5529 - - - - - -
12.6102 159 5.0275 - - - - - -
12.8814 160 4.6675 - - - - - -
13.1525 161 4.6538 - - - - - -
13.4237 162 4.8355 - - - - - -
13.6949 163 4.6304 - - - - - -
13.9661 164 4.7047 5.1242 0.0064 0.0054 0.0064 0.0095 0.0064
11.1864 165 8.6549 - - - - - -
11.4576 166 12.4788 - - - - - -
11.7288 167 11.6425 - - - - - -
12.0 168 4.5654 - - - - - -
12.2712 169 4.7016 - - - - - -
12.5424 170 4.3306 - - - - - -
12.8136 171 4.9692 - - - - - -
13.0847 172 4.7557 - - - - - -
13.3559 173 4.8665 - - - - - -
13.6271 174 4.8338 - - - - - -
13.8983 175 4.9221 - - - - - -
14.1695 176 4.4968 - - - - - -
14.4407 177 4.6104 - - - - - -
14.7119 178 4.8449 - - - - - -
14.9831 179 4.2392 5.1123 0.0059 0.0055 0.0065 0.0094 0.0065
12.2034 180 9.4893 - - - - - -
12.4746 181 12.4241 - - - - - -
12.7458 182 11.0389 - - - - - -
13.0169 183 4.7595 - - - - - -
13.2881 184 4.5408 - - - - - -
13.5593 185 4.6108 - - - - - -
13.8305 186 4.5832 - - - - - -
14.1017 187 4.6741 - - - - - -
14.3729 188 4.9353 - - - - - -
14.6441 189 5.0511 - - - - - -
14.9153 190 4.6575 - - - - - -
15.1864 191 4.648 - - - - - -
15.4576 192 4.6224 - - - - - -
15.7288 193 4.9292 - - - - - -
16.0 194 3.7805 5.1058 0.0063 0.0057 0.0062 0.0094 0.0062
13.2203 195 10.2695 - - - - - -
13.4915 196 12.5043 - - - - - -
13.7627 197 10.4795 - - - - - -
14.0339 198 4.6901 - - - - - -
14.3051 199 4.6538 - - - - - -
14.5763 200 4.4736 - - - - - -
14.8475 201 4.4383 - - - - - -
15.1186 202 5.0382 - - - - - -
15.3898 203 4.5636 - - - - - -
15.6610 204 4.8089 - - - - - -
15.9322 205 4.4746 - - - - - -
16.2034 206 4.5876 - - - - - -
16.4746 207 4.4972 - - - - - -
16.7458 208 4.8569 - - - - - -
17.0169 209 3.5883 5.1031 0.0059 0.0057 0.0061 0.0095 0.0061
14.2373 210 10.8988 - - - - - -
14.5085 211 12.4944 - - - - - -
14.7797 212 10.1041 - - - - - -
15.0508 213 4.8811 - - - - - -
15.3220 214 4.6292 - - - - - -
15.5932 215 4.4828 - - - - - -
15.8644 216 4.7588 - - - - - -
16.1356 217 4.26 - - - - - -
16.4068 218 4.9124 - - - - - -
16.6780 219 4.8098 - - - - - -
16.9492 220 4.4439 - - - - - -
17.2203 221 4.4824 - - - - - -
17.4915 222 4.7771 - - - - - -
17.7627 223 4.5966 - - - - - -
18.0339 224 3.1409 5.1009 0.0055 0.0057 0.0062 0.0052 0.0062
15.2542 225 11.657 - - - - - -
15.5254 226 12.5032 - - - - - -
15.7966 227 9.4495 - - - - - -
16.0678 228 4.7099 - - - - - -
16.3390 229 4.6049 - - - - - -
16.6102 230 4.6311 - - - - - -
16.8814 231 4.7562 - - - - - -
17.1525 232 4.7195 - - - - - -
17.4237 233 4.8557 - - - - - -
17.6949 234 4.8423 - - - - - -
17.9661 235 4.5764 - - - - - -
18.2373 236 4.5081 - - - - - -
18.5085 237 4.7974 - - - - - -
18.7797 238 4.871 - - - - - -
19.0508 239 2.8558 5.1020 0.0054 0.0057 0.0061 0.0054 0.0061
16.2712 240 12.4297 - - - - - -
16.5424 241 12.5186 - - - - - -
16.8136 242 8.8827 - - - - - -
17.0847 243 4.8406 - - - - - -
17.3559 244 4.4367 - - - - - -
17.6271 245 4.5996 - - - - - -
17.8983 246 4.6692 - - - - - -
18.1695 247 4.6498 - - - - - -
18.4407 248 4.7211 - - - - - -
18.7119 249 4.7675 - - - - - -
18.9831 250 4.4405 - - - - - -
19.2542 251 4.6778 - - - - - -
19.5254 252 4.6674 - - - - - -
19.7966 253 4.735 5.1036 0.0054 0.0056 0.0060 0.0054 0.0060
17.0169 254 3.6188 - - - - - -
17.2881 255 12.4112 - - - - - -
17.5593 256 12.5261 - - - - - -
17.8305 257 8.3408 - - - - - -
18.1017 258 4.6496 - - - - - -
18.3729 259 4.7177 - - - - - -
18.6441 260 4.7956 - - - - - -
18.9153 261 4.7228 - - - - - -
19.1864 262 4.6083 - - - - - -
19.4576 263 4.7985 - - - - - -
19.7288 264 4.6675 - - - - - -
20.0 265 4.6353 - - - - - -
20.2712 266 4.5717 - - - - - -
20.5424 267 4.4358 - - - - - -
20.8136 268 4.8288 5.1030 0.0056 0.0057 0.0062 0.0053 0.0062
18.0339 269 3.7877 - - - - - -
18.3051 270 12.4042 - - - - - -
18.5763 271 12.4793 - - - - - -
18.8475 272 7.9475 - - - - - -
19.1186 273 4.5502 - - - - - -
19.3898 274 4.5565 - - - - - -
19.6610 275 4.4172 - - - - - -
19.9322 276 4.5319 - - - - - -
20.2034 277 4.5635 - - - - - -
20.4746 278 4.5233 - - - - - -
20.7458 279 4.6766 - - - - - -
21.0169 280 4.5863 - - - - - -
21.2881 281 4.5784 - - - - - -
21.5593 282 4.7198 - - - - - -
21.8305 283 4.7383 5.1065 0.0054 0.0056 0.0061 0.0050 0.0061
19.0508 284 4.4257 - - - - - -
19.3220 285 12.3475 - - - - - -
19.5932 286 12.5168 - - - - - -
19.8644 287 7.3671 - - - - - -
20.1356 288 4.3771 - - - - - -
20.4068 289 4.542 - - - - - -
20.6780 290 4.3629 - - - - - -
20.9492 291 4.5474 - - - - - -
21.2203 292 4.7436 - - - - - -
21.4915 293 4.5915 - - - - - -
21.7627 294 4.5522 - - - - - -
22.0339 295 4.6591 - - - - - -
22.3051 296 4.6179 - - - - - -
22.5763 297 4.6086 - - - - - -
22.8475 298 4.8808 5.1083 0.0054 0.0057 0.0062 0.0055 0.0062
20.0678 299 4.7358 - - - - - -
20.3390 300 12.3209 - - - - - -
20.6102 301 12.6406 - - - - - -
20.8814 302 6.7971 - - - - - -
21.1525 303 4.3723 - - - - - -
21.4237 304 4.61 - - - - - -
21.6949 305 4.4624 - - - - - -
21.9661 306 4.6145 - - - - - -
22.2373 307 4.5794 - - - - - -
22.5085 308 4.6625 - - - - - -
22.7797 309 4.5499 - - - - - -
23.0508 310 4.5657 - - - - - -
23.3220 311 4.5896 - - - - - -
23.5932 312 4.5692 - - - - - -
23.8644 313 4.93 5.1119 0.0055 0.0057 0.0061 0.0056 0.0061
21.0847 314 5.3829 - - - - - -
21.3559 315 12.3422 - - - - - -
21.6271 316 12.601 - - - - - -
21.8983 317 6.5062 - - - - - -
22.1695 318 4.4681 - - - - - -
22.4407 319 4.4244 - - - - - -
22.7119 320 4.4514 - - - - - -
22.9831 321 4.5469 - - - - - -
23.2542 322 4.6924 - - - - - -
23.5254 323 4.682 - - - - - -
23.7966 324 4.6403 - - - - - -
24.0678 325 4.6272 - - - - - -
24.3390 326 4.3605 - - - - - -
24.6102 327 4.5992 - - - - - -
24.8814 328 4.6776 5.1126 0.0053 0.0057 0.0061 0.0056 0.0061
22.1017 329 5.8504 - - - - - -
22.3729 330 12.335 - - - - - -
22.6441 331 12.5779 - - - - - -
22.9153 332 5.7261 - - - - - -
23.1864 333 4.5411 - - - - - -
23.4576 334 4.4783 - - - - - -
23.7288 335 4.5589 - - - - - -
24.0 336 4.6305 - - - - - -
24.2712 337 4.674 - - - - - -
24.5424 338 4.7455 - - - - - -
24.8136 339 4.6011 - - - - - -
25.0847 340 4.5899 - - - - - -
25.3559 341 4.3981 - - - - - -
25.6271 342 4.7031 - - - - - -
25.8983 343 4.68 5.1182 0.0054 0.0057 0.0059 0.0056 0.0059
23.1186 344 6.3521 - - - - - -
23.3898 345 12.2283 - - - - - -
23.6610 346 12.533 - - - - - -
23.9322 347 5.2654 - - - - - -
24.2034 348 4.3667 - - - - - -
24.4746 349 4.4718 - - - - - -
24.7458 350 4.6212 - - - - - -
25.0169 351 4.447 - - - - - -
25.2881 352 4.6247 - - - - - -
25.5593 353 5.0093 - - - - - -
25.8305 354 4.6316 - - - - - -
26.1017 355 4.6655 - - - - - -
26.3729 356 4.5964 - - - - - -
26.6441 357 4.682 - - - - - -
26.9153 358 4.6375 5.1205 0.0051 0.0056 0.0059 0.0055 0.0059
24.1356 359 6.727 - - - - - -
24.4068 360 12.3706 - - - - - -
24.6780 361 12.4755 - - - - - -
24.9492 362 4.623 - - - - - -
25.2203 363 4.2947 - - - - - -
25.4915 364 4.3993 - - - - - -
25.7627 365 4.4148 - - - - - -
26.0339 366 4.2376 - - - - - -
26.3051 367 4.6334 - - - - - -
26.5763 368 4.7007 - - - - - -
26.8475 369 4.3542 - - - - - -
27.1186 370 4.7036 - - - - - -
27.3898 371 4.2382 - - - - - -
27.6610 372 4.5011 - - - - - -
27.9322 373 4.6292 5.1241 0.0051 0.0056 0.0059 0.0056 0.0059
25.1525 374 7.3562 - - - - - -
25.4237 375 12.2926 - - - - - -
25.6949 376 12.1694 - - - - - -
25.9661 377 4.7183 - - - - - -
26.2373 378 4.4099 - - - - - -
26.5085 379 4.3366 - - - - - -
26.7797 380 4.4848 - - - - - -
27.0508 381 4.6947 - - - - - -
27.3220 382 4.5683 - - - - - -
27.5932 383 4.7691 - - - - - -
27.8644 384 4.3879 - - - - - -
28.1356 385 4.3461 - - - - - -
28.4068 386 4.4756 - - - - - -
28.6780 387 4.5355 - - - - - -
28.9492 388 4.4837 5.1278 0.0052 0.0056 0.0059 0.0054 0.0059
26.1695 389 7.9407 - - - - - -
26.4407 390 12.3054 - - - - - -
26.7119 391 11.6158 - - - - - -
26.9831 392 4.5724 - - - - - -
27.2542 393 4.467 - - - - - -
27.5254 394 4.4395 - - - - - -
27.7966 395 4.4111 - - - - - -
28.0678 396 4.5565 - - - - - -
28.3390 397 4.6063 - - - - - -
28.6102 398 4.5312 - - - - - -
28.8814 399 4.5436 - - - - - -
29.1525 400 4.5366 - - - - - -
29.4237 401 4.4488 - - - - - -
29.6949 402 4.5641 - - - - - -
29.9661 403 4.2491 5.1303 0.0053 0.0057 0.0060 0.0055 0.0060
27.1864 404 8.574 - - - - - -
27.4576 405 12.2836 - - - - - -
27.7288 406 11.1935 - - - - - -
28.0 407 4.5464 - - - - - -
28.2712 408 4.3132 - - - - - -
28.5424 409 4.3553 - - - - - -
28.8136 410 4.4679 - - - - - -
29.0847 411 4.7705 - - - - - -
29.3559 412 4.5667 - - - - - -
29.6271 413 4.6547 - - - - - -
29.8983 414 4.6709 - - - - - -
30.1695 415 4.784 - - - - - -
30.4407 416 4.4368 - - - - - -
30.7119 417 4.6159 - - - - - -
30.9831 418 4.0117 5.1322 0.0050 0.0057 0.0059 0.0054 0.0059
28.2034 419 9.2905 - - - - - -
28.4746 420 12.2439 - - - - - -
28.7458 421 10.722 - - - - - -
29.0169 422 4.6608 - - - - - -
29.2881 423 4.5196 - - - - - -
29.5593 424 4.4313 - - - - - -
29.8305 425 4.513 - - - - - -
30.1017 426 4.5812 - - - - - -
30.3729 427 4.5275 - - - - - -
30.6441 428 4.8022 - - - - - -
30.9153 429 4.5171 - - - - - -
31.1864 430 4.5968 - - - - - -
31.4576 431 4.2145 - - - - - -
31.7288 432 4.7041 - - - - - -
32.0 433 3.6187 5.1356 0.0051 0.0057 0.0059 0.0055 0.0059
29.2203 434 10.0897 - - - - - -
29.4915 435 12.2909 - - - - - -
29.7627 436 10.1362 - - - - - -
30.0339 437 4.5172 - - - - - -
30.3051 438 4.3273 - - - - - -
30.5763 439 4.5272 - - - - - -
30.8475 440 4.376 - - - - - -
31.1186 441 4.5803 - - - - - -
31.3898 442 4.5654 - - - - - -
31.6610 443 4.5024 - - - - - -
31.9322 444 4.5889 - - - - - -
32.2034 445 4.6489 - - - - - -
32.4746 446 4.4505 - - - - - -
32.7458 447 4.7026 - - - - - -
33.0169 448 3.4719 5.1368 0.0050 0.0056 0.0059 0.0052 0.0059
30.2373 449 10.7633 - - - - - -
30.5085 450 12.3203 - - - - - -
30.7797 451 9.7535 - - - - - -
31.0508 452 4.7462 - - - - - -
31.3220 453 4.4271 - - - - - -
31.5932 454 4.4347 - - - - - -
31.8644 455 4.6443 - - - - - -
32.1356 456 4.6344 - - - - - -
32.4068 457 4.6518 - - - - - -
32.6780 458 4.6437 - - - - - -
32.9492 459 4.6168 - - - - - -
33.2203 460 4.4948 - - - - - -
33.4915 461 4.5268 - - - - - -
33.7627 462 4.4844 - - - - - -
34.0339 463 3.276 5.1384 0.0051 0.0057 0.0060 0.0053 0.0060
31.2542 464 11.5311 - - - - - -
31.5254 465 12.3812 - - - - - -
31.7966 466 9.1499 - - - - - -
32.0678 467 4.7032 - - - - - -
32.3390 468 4.2429 - - - - - -
32.6102 469 4.549 - - - - - -
32.8814 470 4.7083 - - - - - -
33.1525 471 4.5348 - - - - - -
33.4237 472 4.472 - - - - - -
33.6949 473 4.5818 - - - - - -
33.9661 474 4.5534 - - - - - -
34.2373 475 4.5743 - - - - - -
34.5085 476 4.54 - - - - - -
34.7797 477 4.681 - - - - - -
35.0508 478 2.9902 5.1397 0.0052 0.0057 0.0059 0.0053 0.0059
32.2712 479 12.3174 - - - - - -
32.5424 480 12.2996 - - - - - -
32.8136 481 8.7153 - - - - - -
33.0847 482 4.5692 - - - - - -
33.3559 483 4.3255 - - - - - -
33.6271 484 4.4515 - - - - - -
33.8983 485 4.6708 - - - - - -
34.1695 486 4.2648 - - - - - -
34.4407 487 4.6268 - - - - - -
34.7119 488 4.703 - - - - - -
34.9831 489 4.6269 - - - - - -
35.2542 490 4.6464 - - - - - -
35.5254 491 4.4952 - - - - - -
35.7966 492 4.6097 5.1406 0.0052 0.0058 0.0058 0.0054 0.0058
33.0169 493 3.2718 - - - - - -
33.2881 494 12.3329 - - - - - -
33.5593 495 12.3503 - - - - - -
33.8305 496 8.1544 - - - - - -
34.1017 497 4.4684 - - - - - -
34.3729 498 4.4062 - - - - - -
34.6441 499 4.2644 - - - - - -
34.9153 500 4.5294 - - - - - -
35.1864 501 4.673 - - - - - -
35.4576 502 4.4884 - - - - - -
35.7288 503 4.5989 - - - - - -
36.0 504 4.6182 - - - - - -
36.2712 505 4.6487 - - - - - -
36.5424 506 4.6436 - - - - - -
36.8136 507 4.6059 5.1417 0.0051 0.0057 0.0059 0.0052 0.0059
34.0339 508 3.7589 - - - - - -
34.3051 509 12.2815 - - - - - -
34.5763 510 12.5481 - - - - - -
34.8475 511 7.6339 - - - - - -
35.1186 512 4.5528 - - - - - -
35.3898 513 4.3266 - - - - - -
35.6610 514 4.3093 - - - - - -
35.9322 515 4.7401 - - - - - -
36.2034 516 4.523 - - - - - -
36.4746 517 4.5255 - - - - - -
36.7458 518 4.5058 - - - - - -
37.0169 519 4.5614 - - - - - -
37.2881 520 4.5323 - - - - - -
37.5593 521 4.5739 - - - - - -
37.8305 522 4.6501 5.1427 0.0052 0.0058 0.0059 0.0053 0.0059
35.0508 523 4.2083 - - - - - -
35.3220 524 12.2888 - - - - - -
35.5932 525 12.4709 - - - - - -
35.8644 526 7.3926 - - - - - -
36.1356 527 4.4719 - - - - - -
36.4068 528 4.5033 - - - - - -
36.6780 529 4.388 - - - - - -
36.9492 530 4.5606 - - - - - -
37.2203 531 4.6936 - - - - - -
37.4915 532 4.6008 - - - - - -
37.7627 533 4.6973 - - - - - -
38.0339 534 4.4194 - - - - - -
38.3051 535 4.5616 - - - - - -
38.5763 536 4.6307 - - - - - -
38.8475 537 4.8322 5.1442 0.0051 0.0057 0.0059 0.0053 0.0059
36.0678 538 4.8388 - - - - - -
36.3390 539 12.2334 - - - - - -
36.6102 540 12.4205 - - - - - -
36.8814 541 6.9051 - - - - - -
37.1525 542 4.6011 - - - - - -
37.4237 543 4.4701 - - - - - -
37.6949 544 4.421 - - - - - -
37.9661 545 4.6877 - - - - - -
38.2373 546 4.6348 - - - - - -
38.5085 547 4.5822 - - - - - -
38.7797 548 4.5697 - - - - - -
39.0508 549 4.3118 - - - - - -
39.3220 550 4.5131 - - - - - -
39.5932 551 4.4879 - - - - - -
39.8644 552 4.5945 5.1429 0.0052 0.0056 0.0059 0.0054 0.0059
37.0847 553 5.4083 - - - - - -
37.3559 554 12.2092 - - - - - -
37.6271 555 12.5043 - - - - - -
37.8983 556 6.1239 - - - - - -
38.1695 557 4.2932 - - - - - -
38.4407 558 4.3845 - - - - - -
38.7119 559 4.5619 - - - - - -
38.9831 560 4.6936 - - - - - -
39.2542 561 4.6636 - - - - - -
39.5254 562 4.7964 - - - - - -
39.7966 563 4.613 - - - - - -
40.0678 564 4.5856 - - - - - -
40.3390 565 4.4605 - - - - - -
40.6102 566 4.5461 - - - - - -
40.8814 567 4.7145 5.1454 0.0052 0.0056 0.0059 0.0052 0.0059
38.1017 568 5.8311 - - - - - -
38.3729 569 12.2142 - - - - - -
38.6441 570 12.4489 - - - - - -
38.9153 571 5.7328 - - - - - -
39.1864 572 4.4402 - - - - - -
39.4576 573 4.1806 - - - - - -
39.7288 574 4.6327 - - - - - -
40.0 575 4.2768 - - - - - -
40.2712 576 4.4669 - - - - - -
40.5424 577 4.8094 - - - - - -
40.8136 578 4.5773 - - - - - -
41.0847 579 4.439 - - - - - -
41.3559 580 4.5718 - - - - - -
41.6271 581 4.5955 - - - - - -
41.8983 582 4.5043 5.1443 0.0051 0.0056 0.0059 0.0054 0.0059
39.1186 583 6.359 - - - - - -
39.3898 584 12.212 - - - - - -
39.6610 585 12.538 - - - - - -
39.9322 586 5.0971 - - - - - -
40.2034 587 4.4783 - - - - - -
40.4746 588 4.394 - - - - - -
40.7458 589 4.4847 - - - - - -
41.0169 590 4.4116 - - - - - -
41.2881 591 4.3979 - - - - - -
41.5593 592 4.6652 - - - - - -
41.8305 593 4.3939 - - - - - -
42.1017 594 4.5555 - - - - - -
42.3729 595 4.4966 - - - - - -
42.6441 596 4.6267 - - - - - -
42.9153 597 4.5834 5.1446 0.0051 0.0057 0.0058 0.0052 0.0058
40.1356 598 6.7009 - - - - - -
40.4068 599 12.2755 - - - - - -
40.6780 600 12.4465 5.1447 0.0052 0.0057 0.0059 0.0052 0.0059
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}