SentenceTransformer based on indobenchmark/indobert-base-p2

This is a sentence-transformers model finetuned from indobenchmark/indobert-base-p2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: indobenchmark/indobert-base-p2
  • Maximum Sequence Length: 200 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 200, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Penduduk kabupaten Raja Ampat mayoritas memeluk agama Kristen.',
    'Masyarakat kabupaten Raja Ampat mayoritas memeluk agama Islam.',
    'Gereja Baptis biasanya cenderung membentuk kelompok sendiri.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine -0.0979
spearman_cosine -0.1037
pearson_manhattan -0.0987
spearman_manhattan -0.1005
pearson_euclidean -0.0981
spearman_euclidean -0.0998
pearson_dot -0.0822
spearman_dot -0.0821
pearson_max -0.0822
spearman_max -0.0821

Semantic Similarity

Metric Value
pearson_cosine -0.0278
spearman_cosine -0.035
pearson_manhattan -0.0355
spearman_manhattan -0.0387
pearson_euclidean -0.0356
spearman_euclidean -0.0389
pearson_dot -0.0092
spearman_dot -0.0066
pearson_max -0.0092
spearman_max -0.0066

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,330 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 10 tokens
    • mean: 30.59 tokens
    • max: 128 tokens
    • min: 6 tokens
    • mean: 11.93 tokens
    • max: 37 tokens
    • 0: ~33.50%
    • 1: ~32.70%
    • 2: ~33.80%
  • Samples:
    sentence_0 sentence_1 label
    Ini adalah coup de grâce dan dorongan yang dibutuhkan oleh para pendatang untuk mendapatkan kemerdekaan mereka. Pendatang tidak mendapatkan kemerdekaan. 2
    Dua bayi almarhum Raja, Diana dan Suharna, diculik. Jumlah bayi raja yang diculik sudah mencapai 2 bayi. 1
    Sebuah penelitian menunjukkan bahwa mengkonsumsi makanan yang tinggi kadar gulanya bisa meningkatkan rasa haus. Tidak ada penelitian yang bertopik makanan yang kadar gulanya tinggi. 2
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 20
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss sts-dev_spearman_max
0.0998 129 - -0.0821
0.0999 258 - -0.0541
0.1936 500 0.0322 -
0.1998 516 - -0.0474
0.2997 774 - -0.0369
0.3871 1000 0.0157 -
0.3995 1032 - -0.0371
0.4994 1290 - -0.0388
0.5807 1500 0.0109 -
0.5993 1548 - -0.0284
0.6992 1806 - -0.0293
0.7743 2000 0.0112 -
0.7991 2064 - -0.0176
0.8990 2322 - -0.0290
0.9679 2500 0.0104 -
0.9988 2580 - -0.0128
1.0 2583 - -0.0123
1.0987 2838 - -0.0200
1.1614 3000 0.0091 -
1.1986 3096 - -0.0202
1.2985 3354 - -0.0204
1.3550 3500 0.0052 -
1.3984 3612 - -0.0231
1.4983 3870 - -0.0312
1.5486 4000 0.0017 -
1.5981 4128 - -0.0277
1.6980 4386 - -0.0366
1.7422 4500 0.0054 -
1.7979 4644 - -0.0192
1.8978 4902 - -0.0224
1.9357 5000 0.0048 -
1.9977 5160 - -0.0240
2.0 5166 - -0.0248
2.0976 5418 - -0.0374
2.1293 5500 0.0045 -
2.1974 5676 - -0.0215
2.2973 5934 - -0.0329
2.3229 6000 0.0047 -
2.3972 6192 - -0.0284
2.4971 6450 - -0.0370
2.5165 6500 0.0037 -
2.5970 6708 - -0.0390
2.6969 6966 - -0.0681
2.7100 7000 0.0128 -
2.7967 7224 - -0.0343
2.8966 7482 - -0.0413
2.9036 7500 0.0055 -
2.9965 7740 - -0.0416
3.0 7749 - -0.0373
3.0964 7998 - -0.0630
3.0972 8000 0.0016 -
3.1963 8256 - -0.0401
3.2907 8500 0.0018 -
3.2962 8514 - -0.0303
3.3961 8772 - -0.0484
3.4843 9000 0.0017 -
3.4959 9030 - -0.0619
3.5958 9288 - -0.0411
3.6779 9500 0.007 -
3.6957 9546 - -0.0408
3.7956 9804 - -0.0368
3.8715 10000 0.0029 -
3.8955 10062 - -0.0429
3.9954 10320 - -0.0526
4.0 10332 - -0.0494
4.0650 10500 0.0004 -
4.0952 10578 - -0.0385
4.1951 10836 - -0.0467
4.2586 11000 0.0004 -
4.2950 11094 - -0.0500
4.3949 11352 - -0.0458
4.4522 11500 0.0011 -
4.4948 11610 - -0.0389
4.5947 11868 - -0.0401
4.6458 12000 0.0046 -
4.6945 12126 - -0.0370
4.7944 12384 - -0.0495
4.8393 12500 0.0104 -
4.8943 12642 - -0.0504
4.9942 12900 - -0.0377
5.0 12915 - -0.0379
5.0329 13000 0.0005 -
5.0941 13158 - -0.0617
5.1940 13416 - -0.0354
5.2265 13500 0.0006 -
5.2938 13674 - -0.0514
5.3937 13932 - -0.0615
5.4201 14000 0.0014 -
5.4936 14190 - -0.0574
5.5935 14448 - -0.0503
5.6136 14500 0.0025 -
5.6934 14706 - -0.0512
5.7933 14964 - -0.0316
5.8072 15000 0.0029 -
5.8931 15222 - -0.0475
5.9930 15480 - -0.0429
6.0 15498 - -0.0377
6.0008 15500 0.0003 -
6.0929 15738 - -0.0486
6.1928 15996 - -0.0512
6.1943 16000 0.0002 -
6.2927 16254 - -0.0383
6.3879 16500 0.0017 -
6.3926 16512 - -0.0460
6.4925 16770 - -0.0439
6.5815 17000 0.0046 -
6.5923 17028 - -0.0378
6.6922 17286 - -0.0289
6.7751 17500 0.0081 -
6.7921 17544 - -0.0415
6.8920 17802 - -0.0451
6.9686 18000 0.0021 -
6.9919 18060 - -0.0386
7.0 18081 - -0.0390
7.0918 18318 - -0.0460
7.1622 18500 0.0001 -
7.1916 18576 - -0.0510
7.2915 18834 - -0.0566
7.3558 19000 0.0009 -
7.3914 19092 - -0.0479
7.4913 19350 - -0.0456
7.5494 19500 0.0019 -
7.5912 19608 - -0.0371
7.6911 19866 - -0.0184
7.7429 20000 0.003 -
7.7909 20124 - -0.0312
7.8908 20382 - -0.0307
7.9365 20500 0.0008 -
7.9907 20640 - -0.0291
8.0 20664 - -0.0298
8.0906 20898 - -0.0452
8.1301 21000 0.0001 -
8.1905 21156 - -0.0405
8.2904 21414 - -0.0417
8.3237 21500 0.0007 -
8.3902 21672 - -0.0430
8.4901 21930 - -0.0487
8.5172 22000 0.0 -
8.5900 22188 - -0.0471
8.6899 22446 - -0.0361
8.7108 22500 0.0037 -
8.7898 22704 - -0.0443
8.8897 22962 - -0.0404
8.9044 23000 0.0009 -
8.9895 23220 - -0.0421
9.0 23247 - -0.0425
9.0894 23478 - -0.0451
9.0979 23500 0.0001 -
9.1893 23736 - -0.0458
9.2892 23994 - -0.0479
9.2915 24000 0.0 -
9.3891 24252 - -0.0400
9.4851 24500 0.0014 -
9.4890 24510 - -0.0374
9.5889 24768 - -0.0454
9.6787 25000 0.0075 -
9.6887 25026 - -0.0230
9.7886 25284 - -0.0345
9.8722 25500 0.0007 -
9.8885 25542 - -0.0301
9.9884 25800 - -0.0363
10.0 25830 - -0.0375
10.0658 26000 0.0001 -
10.0883 26058 - -0.0381
10.1882 26316 - -0.0386
10.2594 26500 0.0 -
10.2880 26574 - -0.0390
10.3879 26832 - -0.0366
10.4530 27000 0.0007 -
10.4878 27090 - -0.0464
10.5877 27348 - -0.0509
10.6465 27500 0.0021 -
10.6876 27606 - -0.0292
10.7875 27864 - -0.0514
10.8401 28000 0.0017 -
10.8873 28122 - -0.0485
10.9872 28380 - -0.0471
11.0 28413 - -0.0468
11.0337 28500 0.0 -
11.0871 28638 - -0.0460
11.1870 28896 - -0.0450
11.2273 29000 0.0 -
11.2869 29154 - -0.0457
11.3868 29412 - -0.0450
11.4208 29500 0.0008 -
11.4866 29670 - -0.0440
11.5865 29928 - -0.0384
11.6144 30000 0.0028 -
11.6864 30186 - -0.0066

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cassador/indobert-t4

Finetuned
(37)
this model

Evaluation results