tomaarsen's picture
tomaarsen HF staff
Add new SentenceTransformer model.
247f25d verified
|
raw
history blame
21.7 kB
metadata
base_model: prajjwal1/bert-tiny
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:277277
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: Tall man being stopped by an officer.
    sentences:
      - The man is short.
      - There is a tall man.
      - >-
        Male in brown leather jacket and tight black slacks, looking down at his
        phone
  - source_sentence: Man relaxing on a bench at the bus stop.
    sentences:
      - The man stood next to the bench.
      - The man relaxes on a bench.
      - A dog running outside.
  - source_sentence: Police officer with riot shield stands in front of crowd.
    sentences:
      - A police officer teaches two children something.
      - The kid is at the beach.
      - A police officer stands in front of a crowd.
  - source_sentence: >-
      A woman in a red shirt and blue jeans is walking outside while a man in a
      khaki jacket is right behind her.
    sentences:
      - A man and a woman are walking outside.
      - A woman is outside.
      - A man in an army jacket is  following a woman in a pink dress.
  - source_sentence: >-
      A waitress with a pink shirt and black pants walking through a restaurant
      carrying bowls of soup.
    sentences:
      - Nobody has pants
      - A person with pants
      - a young kid jumps into the water
co2_eq_emissions:
  emissions: 1.9590621986924506
  energy_consumed: 0.005040010596015587
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.029
  hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
  - name: SentenceTransformer based on prajjwal1/bert-tiny
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.7526013757467193
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7614153421868329
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7622035611835871
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7597498090089608
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7632410201154781
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7614153421868329
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7526013835604672
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7614153421868329
            name: Spearman Dot
          - type: pearson_max
            value: 0.7632410201154781
            name: Pearson Max
          - type: spearman_max
            value: 0.7614153421868329
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.69132863091579
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6775246001958918
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.6993315331718462
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.6760860789893309
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7005700491110102
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.6775246001958918
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6913286275793098
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6775246001958918
            name: Spearman Dot
          - type: pearson_max
            value: 0.7005700491110102
            name: Pearson Max
          - type: spearman_max
            value: 0.6775246001958918
            name: Spearman Max

SentenceTransformer based on prajjwal1/bert-tiny

This is a sentence-transformers model finetuned from prajjwal1/bert-tiny. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: prajjwal1/bert-tiny
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 256 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 128, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 128, 'out_features': 256, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence-transformers-testing/all-nli-bert-tiny-dense")
# Run inference
sentences = [
    'A waitress with a pink shirt and black pants walking through a restaurant carrying bowls of soup.',
    'A person with pants',
    'Nobody has pants',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 256]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7526
spearman_cosine 0.7614
pearson_manhattan 0.7622
spearman_manhattan 0.7597
pearson_euclidean 0.7632
spearman_euclidean 0.7614
pearson_dot 0.7526
spearman_dot 0.7614
pearson_max 0.7632
spearman_max 0.7614

Semantic Similarity

Metric Value
pearson_cosine 0.6913
spearman_cosine 0.6775
pearson_manhattan 0.6993
spearman_manhattan 0.6761
pearson_euclidean 0.7006
spearman_euclidean 0.6775
pearson_dot 0.6913
spearman_dot 0.6775
pearson_max 0.7006
spearman_max 0.6775

Training Details

Training Dataset

Unnamed Dataset

  • Size: 277,277 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 15.84 tokens
    • max: 64 tokens
    • min: 4 tokens
    • mean: 9.45 tokens
    • max: 23 tokens
    • min: 4 tokens
    • mean: 10.23 tokens
    • max: 28 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,875 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 17.85 tokens
    • max: 63 tokens
    • min: 4 tokens
    • mean: 9.68 tokens
    • max: 29 tokens
    • min: 5 tokens
    • mean: 10.36 tokens
    • max: 26 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.0923 100 3.4021 2.1678 0.7247 -
0.1845 200 2.3398 1.7482 0.7480 -
0.2768 300 2.0893 1.6365 0.7537 -
0.3690 400 2.0035 1.5782 0.7552 -
0.4613 500 1.9023 1.5376 0.7587 -
0.5535 600 1.8647 1.5059 0.7597 -
0.6458 700 1.8511 1.4836 0.7605 -
0.7380 800 1.8094 1.4698 0.7613 -
0.8303 900 1.8338 1.4593 0.7609 -
0.9225 1000 1.7951 1.4553 0.7614 -
1.0 1084 - - - 0.6775

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Energy Consumed: 0.005 kWh
  • Carbon Emitted: 0.002 kg of CO2
  • Hours Used: 0.029 hours

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.1.0.dev0
  • Transformers: 4.43.4
  • PyTorch: 2.5.0.dev20240807+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}