SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cahya/last-sts")
# Run inference
sentences = [
    'While Queen may refer to both Queen regent (sovereign) or Queen consort, the King has always been the sovereign.',
    'There is a very good reason not to refer to the Queen\'s spouse as "King" - because they aren\'t the King.',
    'A man plays the guitar.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev sts-test
pearson_cosine 0.7982 0.7554
spearman_cosine 0.813 0.7644

Training Details

Training Dataset

Unnamed Dataset

  • Size: 353,831 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 38.53 tokens
    • max: 151 tokens
    • min: 5 tokens
    • mean: 38.78 tokens
    • max: 145 tokens
    • min: 0.0
    • mean: 0.91
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A long-term researcher into diabetes, he achieved significant notability with his 1988 Banting Lecture (organized annually by the American Diabetes Association in memory of Frederick Banting). A renowned expert on diabetes, he gained widespread acclaim for his 1988 Banting Lecture, which is presented annually by the American Diabetes Association to commemorate Frederick Banting. 0.926345705986023
    investigators claim the british company was a cia cover. russian investigators stated that the british company was a cia cover. 0.88
    Albert Weber (21 November 1888, in Berlin – 17 September 1940) was a German amateur football (soccer) player who competed in the 1912 Summer Olympics. Albert Weber (21 November 1888, in Berlin – 17 September 1940) was a German amateur footballer who participated in the 1912 Summer Olympics. 0.904914379119873
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.44 tokens
    • max: 44 tokens
    • min: 6 tokens
    • mean: 15.43 tokens
    • max: 58 tokens
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.0362 100 0.0019 0.1114 0.8115 -
0.0724 200 0.0021 0.0882 0.8177 -
0.1085 300 0.0015 0.0748 0.8125 -
0.1447 400 0.0012 0.0679 0.8086 -
0.1809 500 0.0012 0.0608 0.8069 -
0.2171 600 0.001 0.0596 0.7986 -
0.2533 700 0.0011 0.0547 0.7946 -
0.2894 800 0.0011 0.0492 0.7870 -
0.3256 900 0.0009 0.0522 0.7862 -
0.3618 1000 0.0008 0.0519 0.7880 -
0.3980 1100 0.0009 0.0529 0.7962 -
0.4342 1200 0.0008 0.0469 0.7954 -
0.4703 1300 0.0009 0.0506 0.7928 -
0.5065 1400 0.0009 0.0466 0.7873 -
0.5427 1500 0.001 0.0495 0.7999 -
0.5789 1600 0.0008 0.0506 0.7861 -
0.6151 1700 0.0008 0.0522 0.7873 -
0.6512 1800 0.0009 0.0582 0.7843 -
0.6874 1900 0.0009 0.0585 0.7888 -
0.7236 2000 0.001 0.0508 0.8040 -
0.7598 2100 0.001 0.0483 0.8018 -
0.7959 2200 0.0008 0.0520 0.7841 -
0.8321 2300 0.0009 0.0519 0.7896 -
0.8683 2400 0.001 0.0514 0.7906 -
0.9045 2500 0.0009 0.0521 0.7946 -
0.9407 2600 0.0009 0.0496 0.7920 -
0.9768 2700 0.001 0.0566 0.7956 -
1.0130 2800 0.0009 0.0511 0.8044 -
1.0492 2900 0.0009 0.0622 0.8197 -
1.0854 3000 0.001 0.0504 0.8113 -
1.1216 3100 0.001 0.0550 0.8005 -
1.1577 3200 0.001 0.0549 0.7821 -
1.1939 3300 0.0009 0.0578 0.7758 -
1.2301 3400 0.0009 0.0543 0.7860 -
1.2663 3500 0.0008 0.0575 0.7891 -
1.3025 3600 0.0009 0.0567 0.7995 -
1.3386 3700 0.001 0.0488 0.7985 -
1.3748 3800 0.0009 0.0514 0.7789 -
1.4110 3900 0.001 0.0584 0.7765 -
1.4472 4000 0.001 0.0554 0.7888 -
1.4834 4100 0.001 0.0659 0.7959 -
1.5195 4200 0.0009 0.0511 0.7816 -
1.5557 4300 0.0009 0.0555 0.7826 -
1.5919 4400 0.001 0.0525 0.7944 -
1.6281 4500 0.0009 0.0553 0.7941 -
1.6643 4600 0.001 0.0588 0.7984 -
1.7004 4700 0.001 0.0579 0.8004 -
1.7366 4800 0.0009 0.0540 0.7916 -
1.7728 4900 0.0009 0.0557 0.7963 -
1.8090 5000 0.0008 0.0536 0.8044 -
1.8452 5100 0.0009 0.0541 0.7870 -
1.8813 5200 0.0009 0.0594 0.7989 -
1.9175 5300 0.001 0.0558 0.8000 -
1.9537 5400 0.0009 0.0538 0.7905 -
1.9899 5500 0.0008 0.0555 0.7944 -
2.0260 5600 0.0009 0.0557 0.8127 -
2.0622 5700 0.0007 0.0542 0.8146 -
2.0984 5800 0.0008 0.0517 0.7990 -
2.1346 5900 0.0009 0.0500 0.8051 -
2.1708 6000 0.0009 0.0521 0.8019 -
2.2069 6100 0.0009 0.0511 0.8101 -
2.2431 6200 0.0008 0.0578 0.8087 -
2.2793 6300 0.0008 0.0585 0.8012 -
2.3155 6400 0.0008 0.0566 0.8083 -
2.3517 6500 0.0007 0.0535 0.8036 -
2.3878 6600 0.0008 0.0531 0.7988 -
2.4240 6700 0.0007 0.0574 0.8102 -
2.4602 6800 0.0007 0.0566 0.7944 -
2.4964 6900 0.0008 0.0528 0.8058 -
2.5326 7000 0.0007 0.0528 0.8056 -
2.5687 7100 0.0007 0.0506 0.8002 -
2.6049 7200 0.0007 0.0526 0.8038 -
2.6411 7300 0.0007 0.0554 0.8054 -
2.6773 7400 0.0007 0.0505 0.7928 -
2.7135 7500 0.0007 0.0505 0.8070 -
2.7496 7600 0.0007 0.0535 0.7977 -
2.7858 7700 0.0007 0.0536 0.8019 -
2.8220 7800 0.0006 0.0546 0.7989 -
2.8582 7900 0.0007 0.0543 0.8042 -
2.8944 8000 0.0007 0.0542 0.8105 -
2.9305 8100 0.0007 0.0541 0.8053 -
2.9667 8200 0.0007 0.0545 0.8135 -
3.0029 8300 0.0007 0.0598 0.8201 -
3.0391 8400 0.0008 0.0558 0.8050 -
3.0753 8500 0.0007 0.0510 0.7965 -
3.1114 8600 0.0006 0.0564 0.8042 -
3.1476 8700 0.0006 0.0559 0.7932 -
3.1838 8800 0.0006 0.0529 0.8028 -
3.2200 8900 0.0006 0.0542 0.8142 -
3.2562 9000 0.0006 0.0532 0.8055 -
3.2923 9100 0.0006 0.0506 0.7930 -
3.3285 9200 0.0007 0.0542 0.7927 -
3.3647 9300 0.0006 0.0523 0.8033 -
3.4009 9400 0.0006 0.0530 0.8079 -
3.4370 9500 0.0006 0.0544 0.7977 -
3.4732 9600 0.0005 0.0515 0.8019 -
3.5094 9700 0.0006 0.0481 0.8037 -
3.5456 9800 0.0005 0.0557 0.8007 -
3.5818 9900 0.0006 0.0495 0.8087 -
3.6179 10000 0.0006 0.0555 0.7991 -
3.6541 10100 0.0005 0.0560 0.7973 -
3.6903 10200 0.0007 0.0581 0.7945 -
3.7265 10300 0.0006 0.0546 0.8098 -
3.7627 10400 0.0006 0.0539 0.8074 -
3.7988 10500 0.0005 0.0501 0.8051 -
3.8350 10600 0.0005 0.0531 0.8032 -
3.8712 10700 0.0005 0.0502 0.8077 -
3.9074 10800 0.0006 0.0537 0.8131 -
3.9436 10900 0.0005 0.0510 0.8115 -
3.9797 11000 0.0006 0.0525 0.8173 -
4.0159 11100 0.0005 0.0513 0.8106 -
4.0521 11200 0.0006 0.0594 0.8061 -
4.0883 11300 0.0005 0.0514 0.8150 -
4.1245 11400 0.0005 0.0537 0.8168 -
4.1606 11500 0.0005 0.0571 0.8176 -
4.1968 11600 0.0005 0.0546 0.8159 -
4.2330 11700 0.0005 0.0496 0.8115 -
4.2692 11800 0.0005 0.0526 0.8072 -
4.3054 11900 0.0005 0.0512 0.8081 -
4.3415 12000 0.0005 0.0517 0.8025 -
4.3777 12100 0.0005 0.0533 0.8128 -
4.4139 12200 0.0005 0.0501 0.8121 -
4.4501 12300 0.0005 0.0507 0.8079 -
4.4863 12400 0.0005 0.0501 0.8070 -
4.5224 12500 0.0004 0.0537 0.8019 -
4.5586 12600 0.0004 0.0541 0.8005 -
4.5948 12700 0.0005 0.0525 0.8117 -
4.6310 12800 0.0004 0.0523 0.8070 -
4.6671 12900 0.0005 0.0526 0.8099 -
4.7033 13000 0.0004 0.0518 0.8166 -
4.7395 13100 0.0004 0.0547 0.8129 -
4.7757 13200 0.0005 0.0523 0.8130 -
4.8119 13300 0.0004 0.0504 0.8129 -
4.8480 13400 0.0005 0.0539 0.8113 -
4.8842 13500 0.0004 0.0523 0.8169 -
4.9204 13600 0.0005 0.0521 0.8164 -
4.9566 13700 0.0004 0.0575 0.8115 -
4.9928 13800 0.0004 0.0538 0.8186 -
5.0289 13900 0.0004 0.0530 0.8095 -
5.0651 14000 0.0003 0.0537 0.8162 -
5.1013 14100 0.0004 0.0560 0.8112 -
5.1375 14200 0.0004 0.0528 0.8125 -
5.1737 14300 0.0004 0.0533 0.8137 -
5.2098 14400 0.0003 0.0537 0.8198 -
5.2460 14500 0.0004 0.0530 0.8102 -
5.2822 14600 0.0004 0.0562 0.8099 -
5.3184 14700 0.0004 0.0522 0.8084 -
5.3546 14800 0.0004 0.0515 0.8128 -
5.3907 14900 0.0004 0.0555 0.8107 -
5.4269 15000 0.0004 0.0533 0.8113 -
5.4631 15100 0.0003 0.0538 0.8135 -
5.4993 15200 0.0004 0.0552 0.8139 -
5.5355 15300 0.0003 0.0513 0.8102 -
5.5716 15400 0.0004 0.0542 0.8108 -
5.6078 15500 0.0003 0.0541 0.8041 -
5.6440 15600 0.0004 0.0512 0.8074 -
5.6802 15700 0.0003 0.0553 0.8100 -
5.7164 15800 0.0003 0.0539 0.8088 -
5.7525 15900 0.0004 0.0527 0.8094 -
5.7887 16000 0.0004 0.0524 0.8080 -
5.8249 16100 0.0003 0.0525 0.8112 -
5.8611 16200 0.0003 0.0537 0.8109 -
5.8973 16300 0.0003 0.0539 0.8129 -
5.9334 16400 0.0003 0.0543 0.8052 -
5.9696 16500 0.0003 0.0544 0.8093 -
6.0058 16600 0.0004 0.0532 0.8109 -
6.0420 16700 0.0002 0.0558 0.8108 -
6.0781 16800 0.0002 0.0529 0.8089 -
6.1143 16900 0.0003 0.0539 0.8074 -
6.1505 17000 0.0003 0.0534 0.8118 -
6.1867 17100 0.0003 0.0539 0.8048 -
6.2229 17200 0.0003 0.0537 0.8049 -
6.2590 17300 0.0003 0.0553 0.8102 -
6.2952 17400 0.0002 0.0533 0.8053 -
6.3314 17500 0.0003 0.0550 0.8071 -
6.3676 17600 0.0002 0.0530 0.8128 -
6.4038 17700 0.0003 0.0547 0.8159 -
6.4399 17800 0.0002 0.0539 0.8120 -
6.4761 17900 0.0003 0.0540 0.8107 -
6.5123 18000 0.0003 0.0535 0.8069 -
6.5485 18100 0.0003 0.0541 0.8129 -
6.5847 18200 0.0003 0.0522 0.8132 -
6.6208 18300 0.0002 0.0539 0.8135 -
6.6570 18400 0.0002 0.0542 0.8142 -
6.6932 18500 0.0003 0.0529 0.8101 -
6.7294 18600 0.0003 0.0533 0.8073 -
6.7656 18700 0.0003 0.0525 0.8095 -
6.8017 18800 0.0003 0.0534 0.8089 -
6.8379 18900 0.0002 0.0519 0.8134 -
6.8741 19000 0.0002 0.0536 0.8141 -
6.9103 19100 0.0002 0.0535 0.8115 -
6.9465 19200 0.0002 0.0519 0.8107 -
6.9826 19300 0.0002 0.0546 0.8093 -
7.0188 19400 0.0002 0.0532 0.8112 -
7.0550 19500 0.0002 0.0526 0.8145 -
7.0912 19600 0.0002 0.0529 0.8111 -
7.1274 19700 0.0002 0.0540 0.8090 -
7.1635 19800 0.0002 0.0525 0.8116 -
7.1997 19900 0.0002 0.0534 0.8115 -
7.2359 20000 0.0002 0.0526 0.8123 -
7.2721 20100 0.0002 0.0524 0.8143 -
7.3082 20200 0.0002 0.0526 0.8059 -
7.3444 20300 0.0002 0.0535 0.8091 -
7.3806 20400 0.0002 0.0532 0.8094 -
7.4168 20500 0.0002 0.0529 0.8108 -
7.4530 20600 0.0002 0.0542 0.8108 -
7.4891 20700 0.0002 0.0525 0.8102 -
7.5253 20800 0.0002 0.0541 0.8106 -
7.5615 20900 0.0002 0.0538 0.8095 -
7.5977 21000 0.0003 0.0523 0.8136 -
7.6339 21100 0.0002 0.0544 0.8108 -
7.6700 21200 0.0002 0.0525 0.8090 -
7.7062 21300 0.0002 0.0528 0.8108 -
7.7424 21400 0.0002 0.0531 0.8115 -
7.7786 21500 0.0002 0.0541 0.8107 -
7.8148 21600 0.0001 0.0525 0.8117 -
7.8509 21700 0.0002 0.0534 0.8115 -
7.8871 21800 0.0002 0.0541 0.8105 -
7.9233 21900 0.0002 0.0538 0.8094 -
7.9595 22000 0.0002 0.0530 0.8106 -
7.9957 22100 0.0002 0.0527 0.8104 -
8.0318 22200 0.0001 0.0534 0.8098 -
8.0680 22300 0.0002 0.0537 0.8090 -
8.1042 22400 0.0001 0.0533 0.8103 -
8.1404 22500 0.0002 0.0528 0.8099 -
8.1766 22600 0.0001 0.0531 0.8106 -
8.2127 22700 0.0001 0.0534 0.8116 -
8.2489 22800 0.0001 0.0538 0.8102 -
8.2851 22900 0.0001 0.0530 0.8108 -
8.3213 23000 0.0002 0.0529 0.8112 -
8.3575 23100 0.0001 0.0533 0.8099 -
8.3936 23200 0.0001 0.0534 0.8107 -
8.4298 23300 0.0002 0.0535 0.8110 -
8.4660 23400 0.0001 0.0543 0.8108 -
8.5022 23500 0.0001 0.0530 0.8119 -
8.5384 23600 0.0001 0.0530 0.8132 -
8.5745 23700 0.0001 0.0531 0.8128 -
8.6107 23800 0.0002 0.0532 0.8119 -
8.6469 23900 0.0002 0.0531 0.8120 -
8.6831 24000 0.0001 0.0531 0.8121 -
8.7192 24100 0.0001 0.0525 0.8134 -
8.7554 24200 0.0002 0.0524 0.8133 -
8.7916 24300 0.0001 0.0535 0.8141 -
8.8278 24400 0.0002 0.0529 0.8118 -
8.8640 24500 0.0001 0.0529 0.8115 -
8.9001 24600 0.0001 0.0528 0.8127 -
8.9363 24700 0.0002 0.0527 0.8111 -
8.9725 24800 0.0001 0.0536 0.8114 -
9.0087 24900 0.0001 0.0531 0.8124 -
9.0449 25000 0.0001 0.0532 0.8123 -
9.0810 25100 0.0001 0.0534 0.8130 -
9.1172 25200 0.0001 0.0533 0.8121 -
9.1534 25300 0.0002 0.0534 0.8119 -
9.1896 25400 0.0001 0.0532 0.8118 -
9.2258 25500 0.0001 0.0532 0.8112 -
9.2619 25600 0.0001 0.0532 0.8121 -
9.2981 25700 0.0002 0.0537 0.8120 -
9.3343 25800 0.0001 0.0535 0.8127 -
9.3705 25900 0.0001 0.0529 0.8133 -
9.4067 26000 0.0001 0.0529 0.8138 -
9.4428 26100 0.0001 0.0534 0.8131 -
9.4790 26200 0.0001 0.0529 0.8137 -
9.5152 26300 0.0002 0.0529 0.8135 -
9.5514 26400 0.0001 0.0528 0.8129 -
9.5876 26500 0.0001 0.0530 0.8124 -
9.6237 26600 0.0001 0.0529 0.8132 -
9.6599 26700 0.0001 0.0530 0.8128 -
9.6961 26800 0.0001 0.0530 0.8132 -
9.7323 26900 0.0001 0.0529 0.8129 -
9.7685 27000 0.0002 0.0528 0.8131 -
9.8046 27100 0.0001 0.0529 0.8131 -
9.8408 27200 0.0002 0.0531 0.8128 -
9.8770 27300 0.0001 0.0532 0.8130 -
9.9132 27400 0.0001 0.0531 0.8129 -
9.9493 27500 0.0001 0.0531 0.8129 -
9.9855 27600 0.0001 0.0531 0.8130 -
-1 -1 - - - 0.7644

Framework Versions

  • Python: 3.10.16
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 0.34.2
  • Datasets: 2.19.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
160M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train cahya/last-sts

Evaluation results