pritamdeka's picture
Add new SentenceTransformer model.
e1bd9f5 verified
|
raw
history blame
20.7 kB
metadata
base_model: distilbert/distilbert-base-multilingual-cased
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:654495
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      সম্পূৰ্ণৰূপে ভিন্ন ধৰণৰ পেৰাচুট আৰু এটা উড়ন্ত পক্ষীৰ মাজত, আহ্, শব্দৰ
      তিনিগুণ বেগত, ঘণ্টাৰ ২২, ০০০ মাইলত।
    sentences:
      - ঘণ্টাৰ ২০, ০০০ কিলোমিটাৰতকৈ অধিক গতিত উড়ে।
      - মোৰ ঘৰত দুটা কম্পিউটাৰ আছে।
      - >-
        সকলো ক্ৰীড়াৰ নাম ক্ৰীড়াত ব্যৱহাৰ কৰা এটা সঁজুলিৰ নামেৰে নামকৰণ কৰা
        হয়।
  - source_sentence: >-
      আৰু তাৰ পিছত মই তেওঁক যাবলৈ শুনিছিলোঁ, সেয়েহে মই এতিয়াও মোৰ কাম শেষ কৰি
      আছো।
    sentences:
      - মই আজি যিটো কৰিব লাগিব সেয়া কৰি আছো।
      - >-
        "Bato (বা" "vato" ") এটা স্পেনিছ শব্দ যাৰ অৰ্থ হৈছে" "পুৰুষ" "বা"
        "বন্ধু" "।"
      - পিতৃ-মাতৃয়ে ঘৰত থাকিল।
  - source_sentence: মই কেৱল বুজাবলৈ চেষ্টা কৰিছিলোঁ।
    sentences:
      - মই বুজিবলৈ চেষ্টা কৰিছিলোঁ।
      - মই আন কেইবাটাও প্ৰস্তাৱ দিবলৈ আহিছিলোঁ।
      - >-
        প্ৰেমিক নামৰ এজন খেতিয়কে নিজৰ হত্যাৰ আঁচনি তৈয়াৰ কৰোতে ঘাসপূৰ্ণ স্থানত
        লুকুৱাই থৈ যায়।
  - source_sentence: >-
      আৰু, উম, যদি এইটো বাঢ়ি আহিব আৰু কেৱল বাঢ়ি আহিব তেতিয়াহ 'লে' whish 'হ'
      ব, আৰু যেনেকৈ ই আপোনাৰ মূৰটো বন্ধ কৰি দিব।
    sentences:
      - >-
        প্ৰাৰম্ভিক শিক্ষা লাভ কৰা আৰু বয়সস্থ ল 'ৰা-ছোৱালীয়ে প্ৰায়ে ভৱিষ্যতৰ
        বিষয়ে সপোন দেখে।
      - তেওঁলোকে মোৰ ওচৰলৈ কিয় আহিছে বুলি প্ৰশ্ন কৰিলে।
      - যদি কোনো ধৰণৰ পৰিৱৰ্তন হয়, তেনেহ 'লে তাৰ লগত এক শব্দ বাঢ়িব পাৰে।
  - source_sentence: মই ভালদৰে জানিব নোৱাৰোঁ আপোনালোকৰ সৈতে কথা বতৰা আৰু এক ভাল সন্ধ্যা আছিল
    sentences:
      - >-
        মই নিশ্চিত নহয় কিন্তু মই অলপ ভাল, আজি ৰাতি আপোনালোকৰ সৈতে কথা পাতিবলৈ
        পাই ভাল লাগিল।
      - Shannon এ বাৰ্তা উপেক্ষা কৰিছে।
      - মানুহজনে ষ্টক এক্সচেঞ্জত লেনদেনৰ বিষয়ে জানিবলৈ চেষ্টা কৰিছিল।
model-index:
  - name: SentenceTransformer based on distilbert/distilbert-base-multilingual-cased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: pritamdeka/stsb assamese translated dev
          type: pritamdeka/stsb-assamese-translated-dev
        metrics:
          - type: pearson_cosine
            value: 0.7169579983340281
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7220987460972806
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7380110422340219
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7452082040848071
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7386577662108481
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7458961406429292
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6480820840127198
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6478256799308721
            name: Spearman Dot
          - type: pearson_max
            value: 0.7386577662108481
            name: Pearson Max
          - type: spearman_max
            value: 0.7458961406429292
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: pritamdeka/stsb assamese translated test
          type: pritamdeka/stsb-assamese-translated-test
        metrics:
          - type: pearson_cosine
            value: 0.656822131496386
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6621886312595516
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.6675496858061083
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6722470705036974
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.6681862838868354
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6727345795749732
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5691955650489428
            name: Pearson Dot
          - type: spearman_dot
            value: 0.570867962692759
            name: Spearman Dot
          - type: pearson_max
            value: 0.6681862838868354
            name: Pearson Max
          - type: spearman_max
            value: 0.6727345795749732
            name: Spearman Max

SentenceTransformer based on distilbert/distilbert-base-multilingual-cased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pritamdeka/distilbert-base-multilingual-cased-indicxnli-random-negatives-v1")
# Run inference
sentences = [
    'মই ভালদৰে জানিব নোৱাৰোঁ আপোনালোকৰ সৈতে কথা বতৰা আৰু এক ভাল সন্ধ্যা আছিল',
    'মই নিশ্চিত নহয় কিন্তু মই অলপ ভাল, আজি ৰাতি আপোনালোকৰ সৈতে কথা পাতিবলৈ পাই ভাল লাগিল।',
    'Shannon এ বাৰ্তা উপেক্ষা কৰিছে।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.717
spearman_cosine 0.7221
pearson_manhattan 0.738
spearman_manhattan 0.7452
pearson_euclidean 0.7387
spearman_euclidean 0.7459
pearson_dot 0.6481
spearman_dot 0.6478
pearson_max 0.7387
spearman_max 0.7459

Semantic Similarity

Metric Value
pearson_cosine 0.6568
spearman_cosine 0.6622
pearson_manhattan 0.6675
spearman_manhattan 0.6722
pearson_euclidean 0.6682
spearman_euclidean 0.6727
pearson_dot 0.5692
spearman_dot 0.5709
pearson_max 0.6682
spearman_max 0.6727

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pritamdeka/stsb-assamese-translated-dev_spearman_cosine pritamdeka/stsb-assamese-translated-test_spearman_cosine
0 0 - - 0.5489 -
0.0489 500 1.9387 1.7308 0.6808 -
0.0978 1000 1.0503 1.7373 0.6689 -
0.1467 1500 0.92 1.5838 0.6761 -
0.1956 2000 0.8754 1.4807 0.6518 -
0.2445 2500 0.7988 1.3797 0.6853 -
0.2933 3000 0.7606 1.3713 0.7108 -
0.3422 3500 0.7228 1.2510 0.6677 -
0.3911 4000 0.688 1.2374 0.6734 -
0.4400 4500 0.6992 1.2173 0.6891 -
0.4889 5000 0.6108 1.1638 0.7017 -
0.5378 5500 0.612 1.0815 0.7102 -
0.5867 6000 0.6259 1.0664 0.7202 -
0.6356 6500 0.5863 1.0464 0.7047 -
0.6845 7000 0.5941 1.0111 0.7101 -
0.7334 7500 0.5436 1.0023 0.7171 -
0.7822 8000 0.555 0.9633 0.7202 -
0.8311 8500 0.5466 0.9651 0.7279 -
0.8800 9000 0.5326 0.9611 0.7262 -
0.9289 9500 0.5055 0.9313 0.7276 -
0.9778 10000 0.4828 0.9172 0.7221 -
1.0 10227 - - - 0.6622
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}