SentenceTransformer based on indobenchmark/indobert-large-p2

This is a sentence-transformers model finetuned from indobenchmark/indobert-large-p2. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

STSB Test

Model Spearman Correlation
quarkss/indobert-large-stsb 0.8366
quarkss/indobert-base-stsb 0.8123
sentence-transformers/all-MiniLM-L6-v2 0.5952
indobenchmark/indobert-large-p2 0.5673
sentence-transformers/all-mpnet-base-v2 0.5531
sentence-transformers/stsb-bert-base 0.5349
indobenchmark/indobert-base-p2 0.5309

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: indobenchmark/indobert-large-p2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("quarkss/indobert-large-stsb")
# Run inference
sentences = [
    'Seorang pria sedang berjalan dengan seekor kuda.',
    'Seorang pria sedang menuntun seekor kuda dengan tali kekang.',
    'Seorang pria sedang menembakkan pistol.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8692
spearman_cosine 0.8677
pearson_manhattan 0.8592
spearman_manhattan 0.8626
pearson_euclidean 0.8599
spearman_euclidean 0.8633
pearson_dot 0.8441
spearman_dot 0.8392
pearson_max 0.8692
spearman_max 0.8677

Semantic Similarity

Metric Value
pearson_cosine 0.8402
spearman_cosine 0.8366
pearson_manhattan 0.8276
spearman_manhattan 0.8316
pearson_euclidean 0.8278
spearman_euclidean 0.8316
pearson_dot 0.817
spearman_dot 0.8083
pearson_max 0.8402
spearman_max 0.8366

Training Details

Training Dataset

Unnamed Dataset

  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 9.65 tokens
    • max: 25 tokens
    • min: 6 tokens
    • mean: 9.59 tokens
    • max: 24 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Sebuah pesawat sedang lepas landas. Sebuah pesawat terbang sedang lepas landas. 1.0
    Seorang pria sedang memainkan seruling besar. Seorang pria sedang memainkan seruling. 0.76
    Seorang pria sedang mengoleskan keju parut di atas pizza. Seorang pria sedang mengoleskan keju parut di atas pizza yang belum matang. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss spearman_cosine spearman_max
0.2778 100 0.0867 - -
0.5556 200 0.0351 - -
0.8333 300 0.0303 - -
1.1111 400 0.0202 - -
1.3889 500 0.0154 0.8612 -
1.6667 600 0.0136 - -
1.9444 700 0.0145 - -
2.2222 800 0.0082 - -
2.5 900 0.0072 - -
2.7778 1000 0.0068 0.8660 -
3.0556 1100 0.0065 - -
3.3333 1200 0.0044 - -
3.6111 1300 0.0044 - -
3.8889 1400 0.0045 - -
4.1667 1500 0.0038 0.8677 -
4.4444 1600 0.0038 - -
4.7222 1700 0.0035 - -
5.0 1800 0.0034 - 0.8366

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.0.1+cu117
  • Accelerate: 0.32.1
  • Datasets: 2.17.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
24
Safetensors
Model size
335M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for quarkss/indobert-large-stsb

Finetuned
(8)
this model

Dataset used to train quarkss/indobert-large-stsb

Evaluation results