SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("seongil-dn/bge-m3-kor-retrieval-451949-bs128-finance-book-science-215")
# Run inference
sentences = [
    '1970년대 경제위기 상황으로 사회복지가 위기를 맞으며 사회적 경제 운동이 일어나1990년대 후반부터 사회적 기업이 시작된 나라는 어디야?',
    '제2차 세계대전 이후 1950년대와 1960년대 거치면서 경제성장을 기반으로 정부지출의 지속적인 증가와 복지에 대한 사회적 합의는 다수 산업의 국유화와 그에 따른 공공부분의 확대, 사회복지의 확대를 가능하게 하였다. 그러나 1970년대 경제위기 상황은 사회복지의 위기를 가져왔고 1980년대의 경기침체는 더 이상 복지지출의 확대를 허락하지 않는 ‘외부충격’이 있었다. 현대적 의미에서 사회적 기업은 1970년대부터의 노동자 협동조합, 신용조합, 지역사회 상점(community shop), 개발신탁, 지역사회 비즈니스 운동, 노동통합(work integration) 운동 등 ‘사회적 경제’ 운동에서 시작하였다고 한다. 영국 사회에 나타난 이와 같은 일련의 사건들은 복지국가 위기로 인식되었다. 한편으로는 이러한 사건들이 이전 18세기부터 발달해 왔던 협동조합, 상호공제조합, 자선단체와 같은 활동의 역할이 더욱 중요하게 부각되는 계기가 되기도 하였다. 영국에서는 1990년대 후반부터 이루어진 노동당의 집권이 현대적인 의미의 사회적 경제와 사회적 기업의 발전, 나아가 제도화에 큰 영향을 주었다.',
    'Ⅰ. 서론\n최근 일부 국가에서 2008년 글로벌 금융위기를 겪으면서 사회적경제의 역할과 기능에 대하여 전반적인 관심이 높아지면서 사회적경제의 활성화가 여러 국가들이 직면한 사회적・경제적 문제의 해결에 기여할 것이라는 사회적 공감대가 형성되었다 (권재열, 2015). 이에 스페인, 멕시코, 에콰도르, 포르투갈, 프랑스와 캐나다의 퀘벡주 등에서 사회적경제기본법이 제정되어 시행되고 있다. 각국의 사회적경제기본법은 사회적경제의 정체성 규정을 위한 법적 틀을 제공하고, 사회적경제에 대한 포괄적인 지원 및 촉진 정책을 제공하고 있다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • learning_rate: 3e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.05
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0.0047 1 1.2972
0.0094 2 1.7591
0.0142 3 1.5857
0.0189 4 1.3732
0.0236 5 1.4174
0.0283 6 1.4117
0.0330 7 1.2482
0.0377 8 1.4429
0.0425 9 1.1965
0.0472 10 0.9934
0.0519 11 0.8505
0.0566 12 0.7532
0.0613 13 0.7257
0.0660 14 0.5238
0.0708 15 0.4538
0.0755 16 0.4524
0.0802 17 0.4026
0.0849 18 0.4288
0.0896 19 0.3547
0.0943 20 0.3552
0.0991 21 0.2845
0.1038 22 0.3171
0.1085 23 0.2699
0.1132 24 0.2905
0.1179 25 0.2627
0.1226 26 0.268
0.1274 27 0.2205
0.1321 28 0.2374
0.1368 29 0.2653
0.1415 30 0.2517
0.1462 31 0.2145
0.1509 32 0.1949
0.1557 33 0.1515
0.1604 34 0.214
0.1651 35 0.213
0.1698 36 0.1739
0.1745 37 0.1588
0.1792 38 0.184
0.1840 39 0.1921
0.1887 40 0.1662
0.1934 41 0.1844
0.1981 42 0.1891
0.2028 43 0.1456
0.2075 44 0.1564
0.2123 45 0.131
0.2170 46 0.1636
0.2217 47 0.1528
0.2264 48 0.1491
0.2311 49 0.1432
0.2358 50 0.1399
0.2406 51 0.1683
0.2453 52 0.1757
0.25 53 0.1622
0.2547 54 0.1649
0.2594 55 0.1184
0.2642 56 0.1472
0.2689 57 0.146
0.2736 58 0.1387
0.2783 59 0.1527
0.2830 60 0.1333
0.2877 61 0.1349
0.2925 62 0.2007
0.2972 63 0.1548
0.3019 64 0.165
0.3066 65 0.1239
0.3113 66 0.1164
0.3160 67 0.1734
0.3208 68 0.1281
0.3255 69 0.1195
0.3302 70 0.1461
0.3349 71 0.1363
0.3396 72 0.1081
0.3443 73 0.1532
0.3491 74 0.1549
0.3538 75 0.1409
0.3585 76 0.1396
0.3632 77 0.0858
0.3679 78 0.121
0.3726 79 0.138
0.3774 80 0.1334
0.3821 81 0.1235
0.3868 82 0.1167
0.3915 83 0.1745
0.3962 84 0.1201
0.4009 85 0.1277
0.4057 86 0.1089
0.4104 87 0.1117
0.4151 88 0.11
0.4198 89 0.1604
0.4245 90 0.1312
0.4292 91 0.1368
0.4340 92 0.1338
0.4387 93 0.1464
0.4434 94 0.1442
0.4481 95 0.1281
0.4528 96 0.1296
0.4575 97 0.151
0.4623 98 0.1297
0.4670 99 0.1142
0.4717 100 0.119
0.4764 101 0.0956
0.4811 102 0.1049
0.4858 103 0.1294
0.4906 104 0.1102
0.4953 105 0.1172
0.5 106 0.1523
0.5047 107 0.0919
0.5094 108 0.1101
0.5142 109 0.1191
0.5189 110 0.1104
0.5236 111 0.0942
0.5283 112 0.1058
0.5330 113 0.1328
0.5377 114 0.1122
0.5425 115 0.1156
0.5472 116 0.1123
0.5519 117 0.0909
0.5566 118 0.1083
0.5613 119 0.1142
0.5660 120 0.1192
0.5708 121 0.1088
0.5755 122 0.1289
0.5802 123 0.1407
0.5849 124 0.1065
0.5896 125 0.1016
0.5943 126 0.1389
0.5991 127 0.1212
0.6038 128 0.1139
0.6085 129 0.1055
0.6132 130 0.0921
0.6179 131 0.0958
0.6226 132 0.1019
0.6274 133 0.0967
0.6321 134 0.1041
0.6368 135 0.1007
0.6415 136 0.1662
0.6462 137 0.0853
0.6509 138 0.1189
0.6557 139 0.1077
0.6604 140 0.12
0.6651 141 0.1352
0.6698 142 0.0953
0.6745 143 0.1173
0.6792 144 0.1082
0.6840 145 0.1283
0.6887 146 0.0978
0.6934 147 0.1187
0.6981 148 0.1247
0.7028 149 0.126
0.7075 150 0.0955
0.7123 151 0.1085
0.7170 152 0.0883
0.7217 153 0.1042
0.7264 154 0.1241
0.7311 155 0.0797
0.7358 156 0.1305
0.7406 157 0.1022
0.7453 158 0.097
0.75 159 0.108
0.7547 160 0.1111
0.7594 161 0.13
0.7642 162 0.1048
0.7689 163 0.1109
0.7736 164 0.0777
0.7783 165 0.081
0.7830 166 0.1077
0.7877 167 0.1025
0.7925 168 0.137
0.7972 169 0.0822
0.8019 170 0.0976
0.8066 171 0.1229
0.8113 172 0.1434
0.8160 173 0.1146
0.8208 174 0.1186
0.8255 175 0.1261
0.8302 176 0.0798
0.8349 177 0.0911
0.8396 178 0.1376
0.8443 179 0.104
0.8491 180 0.1152
0.8538 181 0.139
0.8585 182 0.0994
0.8632 183 0.0982
0.8679 184 0.1182
0.8726 185 0.086
0.8774 186 0.0968
0.8821 187 0.1048
0.8868 188 0.1447
0.8915 189 0.1069
0.8962 190 0.1402
0.9009 191 0.1004
0.9057 192 0.1
0.9104 193 0.0829
0.9151 194 0.102
0.9198 195 0.1025
0.9245 196 0.107
0.9292 197 0.0918
0.9340 198 0.0875
0.9387 199 0.1056
0.9434 200 0.0833
0.9481 201 0.1141
0.9528 202 0.0882
0.9575 203 0.0938
0.9623 204 0.1121
0.9670 205 0.1146
0.9717 206 0.0994
0.9764 207 0.0884
0.9811 208 0.0895
0.9858 209 0.1013
0.9906 210 0.0885
0.9953 211 0.142
1.0 212 0.0918
1.0047 213 0.0989
1.0094 214 0.1417
1.0142 215 0.1095

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for seongil-dn/bge-m3-kor-retrieval-451949-bs128-finance-book-science-215

Base model

BAAI/bge-m3
Finetuned
(192)
this model