SentenceTransformer based on thenlper/gte-base

This is a sentence-transformers model finetuned from thenlper/gte-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: thenlper/gte-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("neel2306/gte-cp-base")
# Run inference
sentences = [
    'Mineral Fuels, Lubricants Etc.',
    'Crude oil',
    'Coal',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,932 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 9.91 tokens
    • max: 48 tokens
    • min: 3 tokens
    • mean: 6.05 tokens
    • max: 17 tokens
    • min: 3 tokens
    • mean: 5.08 tokens
    • max: 14 tokens
  • Samples:
    anchor positive negative
    Clay Floor And Wall Tile, Glazed And Unglazed (Including Quarry Tile And Ceramic Mosaic Tile) Ceramic mosaic tiles Natural stone tiles
    Electrical Relay/Conductor Relay switches Electrical insulators
    Plasterer (Kelowna, British Columbia 5 13) (Union Rate) Labor costs for plasterers Painting supplies
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,733 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 10.09 tokens
    • max: 53 tokens
    • min: 3 tokens
    • mean: 6.06 tokens
    • max: 21 tokens
    • min: 3 tokens
    • mean: 4.95 tokens
    • max: 14 tokens
  • Samples:
    anchor positive negative
    Asphalt Paving Mixture and Block Manufacturing Recycled asphalt pavement (RAP) Asphalt shingles
    Air Conditioning Plant Refrigerant gases Heating elements
    Oak Lumber Oak plywood Pine lumber
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 6e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • optim: adamw_hf
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 6e-05
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_hf
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0731 50 1.9026 1.5169
0.1462 100 1.5479 1.0813
0.2193 150 1.0239 0.7291
0.2924 200 0.6914 0.6372
0.3655 250 0.653 0.5887
0.4386 300 0.5469 0.5605
0.5117 350 0.5312 0.5408
0.5848 400 0.4996 0.5100
0.6579 450 0.4445 0.4830
0.7310 500 0.5092 0.4734
0.8041 550 0.532 0.4476
0.8772 600 0.4147 0.4714
0.9503 650 0.477 0.4400
1.0234 700 0.4243 0.4466
1.0965 750 0.485 0.4172
1.1696 800 0.3717 0.4271
1.2427 850 0.3716 0.4369
1.3158 900 0.3742 0.4104
1.3889 950 0.3157 0.4436
1.4620 1000 0.3035 0.4444
1.5351 1050 0.2797 0.4558
1.6082 1100 0.2639 0.4248
1.6813 1150 0.2286 0.4308
1.7544 1200 0.2753 0.4098
1.8275 1250 0.1904 0.4415
1.9006 1300 0.2175 0.4503
1.9737 1350 0.1806 0.4245
2.0468 1400 0.1826 0.4418
2.1199 1450 0.1952 0.4138
2.1930 1500 0.1612 0.4061
2.2661 1550 0.1604 0.3910
2.3392 1600 0.1199 0.3852
2.4123 1650 0.1439 0.4082
2.4854 1700 0.1402 0.4352
2.5585 1750 0.1116 0.4338
2.6316 1800 0.1113 0.4189
2.7047 1850 0.1159 0.4013
2.7778 1900 0.1241 0.3853
2.8509 1950 0.0977 0.3919
2.9240 2000 0.0953 0.4022
2.9971 2050 0.1159 0.4073
3.0702 2100 0.0923 0.3903
3.1433 2150 0.0958 0.3833
3.2164 2200 0.0787 0.3875
3.2895 2250 0.083 0.3807
3.3626 2300 0.0714 0.3806
3.4357 2350 0.0748 0.3997
3.5088 2400 0.0779 0.4027
3.5819 2450 0.0709 0.3921
3.6550 2500 0.0482 0.3905
3.7281 2550 0.0784 0.3760
3.8012 2600 0.0694 0.3809
3.8743 2650 0.0725 0.3957
3.9474 2700 0.0718 0.3897
4.0205 2750 0.05 0.3894
4.0936 2800 0.0597 0.4014
4.1667 2850 0.0445 0.3929
4.2398 2900 0.039 0.3856
4.3129 2950 0.0405 0.3723
4.3860 3000 0.0456 0.3764
4.4591 3050 0.0493 0.3876
4.5322 3100 0.036 0.3866
4.6053 3150 0.0517 0.3791
4.6784 3200 0.0383 0.3724
4.7515 3250 0.0453 0.3886
4.8246 3300 0.0469 0.3897
4.8977 3350 0.0385 0.3940
4.9708 3400 0.0427 0.3877
5.0439 3450 0.0212 0.3914
5.1170 3500 0.0452 0.3899
5.1901 3550 0.0252 0.3925
5.2632 3600 0.0228 0.3895
5.3363 3650 0.0219 0.3792
5.4094 3700 0.0275 0.3882
5.4825 3750 0.0246 0.3892
5.5556 3800 0.0226 0.3895
5.6287 3850 0.0219 0.3912
5.7018 3900 0.027 0.3800
5.7749 3950 0.0268 0.3667
5.8480 4000 0.0313 0.3687
5.9211 4050 0.0233 0.3675
5.9942 4100 0.0201 0.3649
6.0673 4150 0.0207 0.3727
6.1404 4200 0.0175 0.3802
6.2135 4250 0.0117 0.3760
6.2865 4300 0.0124 0.3731
6.3596 4350 0.0164 0.3713
6.4327 4400 0.0149 0.3782
6.5058 4450 0.0127 0.3747
6.5789 4500 0.013 0.3746
6.6520 4550 0.0078 0.3756
6.7251 4600 0.0171 0.3741
6.7982 4650 0.0211 0.3680
6.8713 4700 0.0186 0.3686
6.9444 4750 0.0213 0.3688
7.0175 4800 0.0107 0.3647
7.0906 4850 0.011 0.3677
7.1637 4900 0.0098 0.3671
7.2368 4950 0.0091 0.3708
7.3099 5000 0.0074 0.3673
7.3830 5050 0.0101 0.3672
7.4561 5100 0.0115 0.3676
7.5292 5150 0.0054 0.3656
7.6023 5200 0.0076 0.3657
7.6754 5250 0.0054 0.3639
7.7485 5300 0.0115 0.3600
7.8216 5350 0.0105 0.3657
7.8947 5400 0.0175 0.3649
7.9678 5450 0.0091 0.3634
8.0409 5500 0.0043 0.3646
8.1140 5550 0.0078 0.3650
8.1871 5600 0.004 0.3683
8.2602 5650 0.0045 0.3669
8.3333 5700 0.005 0.3661
8.4064 5750 0.0074 0.3652
8.4795 5800 0.0042 0.3662
8.5526 5850 0.0039 0.3696
8.6257 5900 0.004 0.3724
8.6988 5950 0.008 0.3714
8.7719 6000 0.0057 0.3711
8.8450 6050 0.0045 0.3702
8.9181 6100 0.0122 0.3715
8.9912 6150 0.0064 0.3703
9.0643 6200 0.0039 0.3689
9.1374 6250 0.0034 0.3680
9.2105 6300 0.0022 0.3680
9.2836 6350 0.0021 0.3684
9.3567 6400 0.0025 0.3685
9.4298 6450 0.0041 0.3679
9.5029 6500 0.0018 0.3679
9.5760 6550 0.0039 0.3686
9.6491 6600 0.0021 0.3691
9.7222 6650 0.0056 0.3689
9.7953 6700 0.0025 0.3691
9.8684 6750 0.0063 0.3692
9.9415 6800 0.0074 0.3692

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cpu
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for neel2306/gte-cp-base

Base model

thenlper/gte-base
Finetuned
(11)
this model