SentenceTransformer based on sentence-transformers/LaBSE

This is a sentence-transformers model finetuned from sentence-transformers/LaBSE. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/LaBSE
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/FaLaBSE-v7")
# Run inference
sentences = [
    'مرزهای صفحه چیست؟برخی از انواع چیست؟',
    'مرزهای صفحه چیست؟',
    'اتانول چند ایزومر دارد؟',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 142,964 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 15.36 tokens
    • max: 82 tokens
    • min: 6 tokens
    • mean: 14.69 tokens
    • max: 50 tokens
  • Samples:
    anchor positive
    گاو یونجه می خورد گاو در حال چریدن است
    ماشینی به شکلی خطرناک از روی دختری می‌پرد. دختر با بی‌احتیاطی روی ماشین می‌پرد.
    چگونه می توانم کارتهای هدیه iTunes رایگان را در هند دریافت کنم؟ چگونه می توانم کارتهای هدیه iTunes رایگان دریافت کنم؟
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • learning_rate: 3e-05
  • weight_decay: 0.15
  • num_train_epochs: 4
  • warmup_ratio: 0.15
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.15
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.15
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0448 100 0.1819
0.0895 200 0.0985
0.1343 300 0.0879
0.1791 400 0.0601
0.2238 500 0.0644
0.2686 600 0.0586
0.3133 700 0.0731
0.3581 800 0.0636
0.4029 900 0.0622
0.4476 1000 0.0504
0.4924 1100 0.0603
0.5372 1200 0.0613
0.5819 1300 0.0546
0.6267 1400 0.0525
0.6714 1500 0.0606
0.7162 1600 0.0523
0.7610 1700 0.0581
0.8057 1800 0.0534
0.8505 1900 0.0531
0.8953 2000 0.0526
0.9400 2100 0.0498
0.9848 2200 0.0462
1.0295 2300 0.0555
1.0743 2400 0.0553
1.1191 2500 0.0505
1.1638 2600 0.0441
1.2086 2700 0.0365
1.2534 2800 0.0348
1.2981 2900 0.0406
1.3429 3000 0.0403
1.3876 3100 0.0409
1.4324 3200 0.0324
1.4772 3300 0.0285
1.5219 3400 0.0362
1.5667 3500 0.026
1.6115 3600 0.0271
1.6562 3700 0.0285
1.7010 3800 0.028
1.7457 3900 0.032
1.7905 4000 0.0324
1.8353 4100 0.0236
1.8800 4200 0.0267
1.9248 4300 0.0343
1.9696 4400 0.0234
2.0143 4500 0.0281
2.0591 4600 0.0272
2.1038 4700 0.0295
2.1486 4800 0.0251
2.1934 4900 0.0235
2.2381 5000 0.0219
2.2829 5100 0.0237
2.3277 5200 0.0283
2.3724 5300 0.0262
2.4172 5400 0.0218
2.4620 5500 0.0174
2.5067 5600 0.024
2.5515 5700 0.0185
2.5962 5800 0.019
2.6410 5900 0.0208
2.6858 6000 0.0188
2.7305 6100 0.0213
2.7753 6200 0.0251
2.8201 6300 0.0193
2.8648 6400 0.0175
2.9096 6500 0.0234
2.9543 6600 0.0172
2.9991 6700 0.0171
3.0439 6800 0.0215
3.0886 6900 0.0206
3.1334 7000 0.019
3.1782 7100 0.0166
3.2229 7200 0.0154
3.2677 7300 0.0178
3.3124 7400 0.0203
3.3572 7500 0.0174
3.4020 7600 0.0159
3.4467 7700 0.0149
3.4915 7800 0.0184
3.5363 7900 0.017
3.5810 8000 0.0133
3.6258 8100 0.0146
3.6705 8200 0.0148
3.7153 8300 0.0131
3.7601 8400 0.0184
3.8048 8500 0.0143
3.8496 8600 0.0137
3.8944 8700 0.0156
3.9391 8800 0.0171
3.9839 8900 0.0119

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.3.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
471M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for codersan/FaLaBSE-v7

Finetuned
(43)
this model