luanafelbarros's picture
Add new SentenceTransformer model
777f300 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:3560698
  - loss:ModifiedMatryoshkaLoss
base_model: google-bert/bert-base-multilingual-cased
widget:
  - source_sentence: And then finally, turn it back to the real world.
    sentences:
      - Y luego, finalmente, devolver eso al mundo real.
      - Parece que el único rasgo que sobrevive a la decapitación es la vanidad.
      - y yo digo que no estoy seguro. Voy a pensarlo a groso modo.
  - source_sentence: Figure out some of the other options that are much better.
    sentences:
      - Piensen en otras de las opciones que son mucho mejores.
      - >-
        Éste solía ser un tema bipartidista, y sé que en este grupo realmente lo
        es.
      - >-
        El acuerdo general de paz para Sudán firmado en 2005 resultó ser menos
        amplio que lo previsto, y sus disposiciones aún podrían engendrar un
        retorno a gran escala de la guerra entre el norte y el sur.
  - source_sentence: >-
      The call to action I offer today -- my TED wish -- is this: Honor the
      treaties.
    sentences:
      - Esta es la intersección más directa, obvia, de las dos cosas.
      - >-
        El llamado a la acción que propongo hoy, mi TED Wish, es el siguiente:
        Honrar los tratados.
      - >-
        Los restaurantes del condado se pueden contar con los dedos de una
        mano... Barbacoa Bunn es mi favorito.
  - source_sentence: So for us, this was a graphic public campaign called Connect Bertie.
    sentences:
      - Para nosotros esto era una campaña gráfica llamada Conecta a Bertie.
      - >-
        En cambio, los líderes locales se comprometieron a revisarlos más
        adelante.
      - Con el tiempo, la gente hace lo que se le paga por hacer.
  - source_sentence: >-
      And in the audio world that's when the microphone gets too close to its
      sound source, and then it gets in this self-destructive loop that creates
      a very unpleasant sound.
    sentences:
      - Esta es una mina de Zimbabwe en este momento.
      - Estábamos en la I-40.
      - >-
        Y, en el mundo del audio, es cuando el micrófono se acerca demasiado a
        su fuente de sonido, y entra en este bucle autodestructivo que crea un
        sonido muy desagradable.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - negative_mse
model-index:
  - name: SentenceTransformer based on google-bert/bert-base-multilingual-cased
    results:
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: MSE val en es
          type: MSE-val-en-es
        metrics:
          - type: negative_mse
            value: -29.5114666223526
            name: Negative Mse
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: MSE val en pt
          type: MSE-val-en-pt
        metrics:
          - type: negative_mse
            value: -29.913604259490967
            name: Negative Mse
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: MSE val en pt br
          type: MSE-val-en-pt-br
        metrics:
          - type: negative_mse
            value: -27.732226252555847
            name: Negative Mse

SentenceTransformer based on google-bert/bert-base-multilingual-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-multilingual-cased
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("luanafelbarros/TriLingual-BERT-Distil")
# Run inference
sentences = [
    "And in the audio world that's when the microphone gets too close to its sound source, and then it gets in this self-destructive loop that creates a very unpleasant sound.",
    'Y, en el mundo del audio, es cuando el micrófono se acerca demasiado a su fuente de sonido, y entra en este bucle autodestructivo que crea un sonido muy desagradable.',
    'Esta es una mina de Zimbabwe en este momento.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Knowledge Distillation

  • Datasets: MSE-val-en-es, MSE-val-en-pt and MSE-val-en-pt-br
  • Evaluated with MSEEvaluator
Metric MSE-val-en-es MSE-val-en-pt MSE-val-en-pt-br
negative_mse -29.5115 -29.9136 -27.7322

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,560,698 training samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 4 tokens
    • mean: 25.46 tokens
    • max: 128 tokens
    • min: 4 tokens
    • mean: 26.67 tokens
    • max: 128 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    And then there are certain conceptual things that can also benefit from hand calculating, but I think they're relatively small in number. Y luego hay ciertas aspectos conceptuales que pueden beneficiarse del cálculo a mano pero creo que son relativamente pocos. [-0.04180986061692238, 0.12620249390602112, -0.14501447975635529, 0.09695684909820557, -0.10850819200277328, ...]
    One thing I often ask about is ancient Greek and how this relates. Algo que pregunto a menudo es sobre el griego antiguo y cómo se relaciona. [0.0034368489868938923, -0.02741478756070137, -0.09426739811897278, 0.04873204976320267, -0.008266829885542393, ...]
    See, the thing we're doing right now is we're forcing people to learn mathematics. Vean, lo que estamos haciendo ahora es forzar a la gente a aprender matemáticas. [-0.05048828944563866, 0.2713043689727783, 0.024581076577305794, -0.07316197454929352, -0.044288791716098785, ...]
  • Loss: main.ModifiedMatryoshkaLoss with these parameters:
    {
        "loss": "MSELoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 6,974 evaluation samples
  • Columns: english, non_english, and label
  • Approximate statistics based on the first 1000 samples:
    english non_english label
    type string string list
    details
    • min: 4 tokens
    • mean: 25.68 tokens
    • max: 128 tokens
    • min: 4 tokens
    • mean: 27.31 tokens
    • max: 128 tokens
    • size: 768 elements
  • Samples:
    english non_english label
    Thank you so much, Chris. Muchas gracias Chris. [-0.1432434469461441, -0.10335833579301834, -0.07549277693033218, -0.1542435735464096, 0.009247343055903912, ...]
    And it's truly a great honor to have the opportunity to come to this stage twice; I'm extremely grateful. Y es en verdad un gran honor tener la oportunidad de venir a este escenario por segunda vez. Estoy extremadamente agradecido. [0.02740730345249176, -0.0601208470761776, -0.023767368867993355, 0.02245006151497364, 0.007412586361169815, ...]
    I have been blown away by this conference, and I want to thank all of you for the many nice comments about what I had to say the other night. He quedado conmovido por esta conferencia, y deseo agradecer a todos ustedes sus amables comentarios acerca de lo que tenía que decir la otra noche. [-0.09117366373538971, 0.08627621084451675, -0.05912208557128906, -0.007647979073226452, 0.0008422975661233068, ...]
  • Loss: main.ModifiedMatryoshkaLoss with these parameters:
    {
        "loss": "MSELoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 200
  • per_device_eval_batch_size: 200
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • label_names: ['label']

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 200
  • per_device_eval_batch_size: 200
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: ['label']
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss MSE-val-en-es_negative_mse MSE-val-en-pt_negative_mse MSE-val-en-pt-br_negative_mse
0.0562 1000 0.0626 0.0513 -21.2968 -20.7534 -24.2460
0.1123 2000 0.0478 0.0432 -22.1192 -21.8663 -23.2775
0.1685 3000 0.0423 0.0391 -21.6697 -21.5869 -21.6856
0.0562 1000 0.0396 0.0376 -21.7666 -21.7181 -21.6779
0.1123 2000 0.0381 0.0358 -23.4969 -23.5022 -22.9817
0.1685 3000 0.0362 0.0339 -24.7639 -24.8878 -23.8888
0.2247 4000 0.0347 0.0323 -26.5721 -26.7422 -25.4072
0.2808 5000 0.0332 0.0310 -27.6024 -27.8268 -26.4132
0.3370 6000 0.0321 0.0299 -27.7974 -28.0294 -26.6213
0.3932 7000 0.0312 0.0292 -28.2719 -28.4834 -27.0468
0.4493 8000 0.0305 0.0285 -28.2561 -28.5574 -26.8752
0.5055 9000 0.0299 0.0280 -28.6342 -28.9112 -27.2933
0.5617 10000 0.0294 0.0275 -28.5512 -28.8469 -27.1072
0.6178 11000 0.029 0.0271 -28.6788 -28.9608 -27.2056
0.6740 12000 0.0286 0.0267 -29.0159 -29.3281 -27.4770
0.7302 13000 0.0283 0.0264 -28.9224 -29.2461 -27.3500
0.7863 14000 0.028 0.0261 -29.1044 -29.4303 -27.4377
0.8425 15000 0.0277 0.0259 -29.2340 -29.5758 -27.6223
0.8987 16000 0.0275 0.0257 -29.1356 -29.4699 -27.4667
0.9548 17000 0.0273 0.0255 -29.3281 -29.6671 -27.7174
1.0110 18000 0.0271 0.0253 -29.2991 -29.6635 -27.6675
1.0672 19000 0.0268 0.0251 -29.3581 -29.7326 -27.6587
1.1233 20000 0.0266 0.0250 -29.4233 -29.7941 -27.7913
1.1795 21000 0.0265 0.0248 -29.3941 -29.7583 -27.6951
1.2357 22000 0.0264 0.0247 -29.5963 -29.9737 -27.9191
1.2918 23000 0.0262 0.0245 -29.4587 -29.8472 -27.7702
1.3480 24000 0.0262 0.0244 -29.4977 -29.8868 -27.8142
1.4042 25000 0.026 0.0244 -29.5356 -29.9184 -27.8426
1.4603 26000 0.0259 0.0243 -29.5614 -29.9388 -27.8360
1.5165 27000 0.0259 0.0242 -29.5362 -29.9353 -27.8223
1.5727 28000 0.0258 0.0241 -29.5088 -29.9043 -27.7884
1.6288 29000 0.0258 0.0241 -29.4550 -29.8543 -27.6788
1.6850 30000 0.0257 0.0240 -29.5373 -29.9282 -27.7855
1.7412 31000 0.0256 0.0239 -29.5195 -29.9096 -27.7866
1.7973 32000 0.0256 0.0239 -29.5292 -29.9266 -27.7579
1.8535 33000 0.0256 0.0239 -29.5202 -29.9196 -27.7408
1.9097 34000 0.0255 0.0239 -29.5090 -29.9126 -27.7311
1.9659 35000 0.0255 0.0238 -29.5115 -29.9136 -27.7322

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 3.2.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}