SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Detomo/cl-nagoya-sup-simcse-ja-nss-v0_9_13")
# Run inference
sentences = [
    '科目:建具。名称:GCW-#窓。',
    '科目:建具。名称:AW-#窓。',
    '科目:建具。名称:STW-#窓。',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,546 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 1000 samples:
    sentence label
    type string int
    details
    • min: 11 tokens
    • mean: 17.07 tokens
    • max: 27 tokens
    • 0: ~0.30%
    • 1: ~0.30%
    • 2: ~0.30%
    • 3: ~0.30%
    • 4: ~0.30%
    • 5: ~0.30%
    • 6: ~0.30%
    • 7: ~0.30%
    • 8: ~0.30%
    • 9: ~0.30%
    • 10: ~0.30%
    • 11: ~0.40%
    • 12: ~0.30%
    • 13: ~0.30%
    • 14: ~0.30%
    • 15: ~0.30%
    • 16: ~0.30%
    • 17: ~0.30%
    • 18: ~0.50%
    • 19: ~0.30%
    • 20: ~0.30%
    • 21: ~0.30%
    • 22: ~0.30%
    • 23: ~0.30%
    • 24: ~0.30%
    • 25: ~0.30%
    • 26: ~0.30%
    • 27: ~0.30%
    • 28: ~0.30%
    • 29: ~0.30%
    • 30: ~0.30%
    • 31: ~0.30%
    • 32: ~0.30%
    • 33: ~0.30%
    • 34: ~0.30%
    • 35: ~0.30%
    • 36: ~0.30%
    • 37: ~0.30%
    • 38: ~0.30%
    • 39: ~0.30%
    • 40: ~0.40%
    • 41: ~0.30%
    • 42: ~0.30%
    • 43: ~0.30%
    • 44: ~0.60%
    • 45: ~0.70%
    • 46: ~0.30%
    • 47: ~0.30%
    • 48: ~0.30%
    • 49: ~0.30%
    • 50: ~0.30%
    • 51: ~0.30%
    • 52: ~0.30%
    • 53: ~0.30%
    • 54: ~0.30%
    • 55: ~0.30%
    • 56: ~0.30%
    • 57: ~0.80%
    • 58: ~0.30%
    • 59: ~0.30%
    • 60: ~0.30%
    • 61: ~0.30%
    • 62: ~0.30%
    • 63: ~0.30%
    • 64: ~0.30%
    • 65: ~0.30%
    • 66: ~0.50%
    • 67: ~0.30%
    • 68: ~0.30%
    • 69: ~0.30%
    • 70: ~0.30%
    • 71: ~0.30%
    • 72: ~0.60%
    • 73: ~0.30%
    • 74: ~0.30%
    • 75: ~0.30%
    • 76: ~0.30%
    • 77: ~0.30%
    • 78: ~0.30%
    • 79: ~0.30%
    • 80: ~0.30%
    • 81: ~0.30%
    • 82: ~0.30%
    • 83: ~0.30%
    • 84: ~0.30%
    • 85: ~0.30%
    • 86: ~0.80%
    • 87: ~0.60%
    • 88: ~0.50%
    • 89: ~0.30%
    • 90: ~0.30%
    • 91: ~0.60%
    • 92: ~8.00%
    • 93: ~1.70%
    • 94: ~0.30%
    • 95: ~0.30%
    • 96: ~0.60%
    • 97: ~0.30%
    • 98: ~0.30%
    • 99: ~0.30%
    • 100: ~0.30%
    • 101: ~1.20%
    • 102: ~0.30%
    • 103: ~0.30%
    • 104: ~0.30%
    • 105: ~0.30%
    • 106: ~0.30%
    • 107: ~0.30%
    • 108: ~0.30%
    • 109: ~0.30%
    • 110: ~0.30%
    • 111: ~0.30%
    • 112: ~0.30%
    • 113: ~0.30%
    • 114: ~0.30%
    • 115: ~0.30%
    • 116: ~0.30%
    • 117: ~0.30%
    • 118: ~0.30%
    • 119: ~0.30%
    • 120: ~0.30%
    • 121: ~0.50%
    • 122: ~0.30%
    • 123: ~0.30%
    • 124: ~0.30%
    • 125: ~0.30%
    • 126: ~0.30%
    • 127: ~0.30%
    • 128: ~0.30%
    • 129: ~0.40%
    • 130: ~0.70%
    • 131: ~0.30%
    • 132: ~3.10%
    • 133: ~0.30%
    • 134: ~2.30%
    • 135: ~0.30%
    • 136: ~0.30%
    • 137: ~0.50%
    • 138: ~0.50%
    • 139: ~0.50%
    • 140: ~0.30%
    • 141: ~0.30%
    • 142: ~0.30%
    • 143: ~0.30%
    • 144: ~0.80%
    • 145: ~0.30%
    • 146: ~0.30%
    • 147: ~0.30%
    • 148: ~0.30%
    • 149: ~0.30%
    • 150: ~0.30%
    • 151: ~0.30%
    • 152: ~0.30%
    • 153: ~0.30%
    • 154: ~0.30%
    • 155: ~0.30%
    • 156: ~0.30%
    • 157: ~0.30%
    • 158: ~0.30%
    • 159: ~0.30%
    • 160: ~0.30%
    • 161: ~0.30%
    • 162: ~0.30%
    • 163: ~0.30%
    • 164: ~0.30%
    • 165: ~0.30%
    • 166: ~0.30%
    • 167: ~0.30%
    • 168: ~0.60%
    • 169: ~0.30%
    • 170: ~0.30%
    • 171: ~0.30%
    • 172: ~0.30%
    • 173: ~0.30%
    • 174: ~0.70%
    • 175: ~0.30%
    • 176: ~0.30%
    • 177: ~0.30%
    • 178: ~1.30%
    • 179: ~0.30%
    • 180: ~0.30%
    • 181: ~0.30%
    • 182: ~0.30%
    • 183: ~0.30%
    • 184: ~0.30%
    • 185: ~1.10%
    • 186: ~0.30%
    • 187: ~0.30%
    • 188: ~0.30%
    • 189: ~0.30%
    • 190: ~0.30%
    • 191: ~0.30%
    • 192: ~0.30%
    • 193: ~0.30%
    • 194: ~1.50%
    • 195: ~0.30%
    • 196: ~0.30%
    • 197: ~0.30%
    • 198: ~0.30%
    • 199: ~1.00%
    • 200: ~0.30%
    • 201: ~0.30%
    • 202: ~0.30%
    • 203: ~1.80%
    • 204: ~0.30%
    • 205: ~0.50%
    • 206: ~0.70%
    • 207: ~0.30%
    • 208: ~0.30%
    • 209: ~0.30%
    • 210: ~0.30%
    • 211: ~0.30%
    • 212: ~0.30%
    • 213: ~0.30%
    • 214: ~0.30%
    • 215: ~4.00%
    • 216: ~0.30%
    • 217: ~0.30%
    • 218: ~0.30%
    • 219: ~0.60%
    • 220: ~0.30%
    • 221: ~0.30%
    • 222: ~0.70%
    • 223: ~0.30%
    • 224: ~0.30%
    • 225: ~0.30%
    • 226: ~0.60%
    • 227: ~0.30%
    • 228: ~0.10%
  • Samples:
    sentence label
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
    科目:コンクリート。名称:免震基礎天端グラウト注入。 0
  • Loss: sentence_transformer_lib.custom_batch_all_trip_loss.CustomBatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 250
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 250
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
2.5 10 34.4458
5.0 20 9.5341
7.5 30 2.0511
10.0 40 1.5025
12.5 50 1.4347
15.0 60 1.1549
17.5 70 1.2308
20.0 80 1.0908
22.5 90 1.1238
25.0 100 0.9793
2.5 10 1.1269
5.0 20 0.8895
7.5 30 0.8496
10.0 40 0.6124
12.5 50 0.5591
15.0 60 0.4262
17.5 70 0.3892
20.0 80 0.3309
22.5 90 0.3195
25.0 100 0.0781
7.5455 200 0.072
11.4242 300 0.073
15.3030 400 0.0715
19.1818 500 0.069
23.0606 600 0.0682
26.7273 700 0.0659
30.6061 800 0.0628
34.4848 900 0.0618
38.3636 1000 0.0639
42.2424 1100 0.0635
46.1212 1200 0.0635
49.7879 1300 0.0627
53.6667 1400 0.0593
57.5455 1500 0.0605
61.4242 1600 0.055
65.3030 1700 0.0556
69.1818 1800 0.0589
73.0606 1900 0.0585
76.7273 2000 0.0568
80.6061 2100 0.0521
84.4848 2200 0.0559
88.3636 2300 0.0508
92.2424 2400 0.051
96.1212 2500 0.0532
99.7879 2600 0.0545
103.6667 2700 0.0532
107.5455 2800 0.0542
111.4242 2900 0.052
115.3030 3000 0.0497
119.1818 3100 0.0486
123.0606 3200 0.0562
126.7273 3300 0.0544
130.6061 3400 0.0516
134.4848 3500 0.0491
138.3636 3600 0.0578
142.2424 3700 0.0508
146.1212 3800 0.0533
149.7879 3900 0.0487
153.6667 4000 0.045
157.5455 4100 0.0454
161.4242 4200 0.0497
165.3030 4300 0.0466
169.1818 4400 0.045
173.0606 4500 0.0477
176.7273 4600 0.0421
180.6061 4700 0.051
184.4848 4800 0.0389
188.3636 4900 0.0449
192.2424 5000 0.0425
196.1212 5100 0.0456
199.7879 5200 0.0465
203.6667 5300 0.0435
207.5455 5400 0.04
211.4242 5500 0.0405
215.3030 5600 0.0432
219.1818 5700 0.0394
223.0606 5800 0.0511
226.7273 5900 0.0462
230.6061 6000 0.0397
234.4848 6100 0.0413
238.3636 6200 0.0443
242.2424 6300 0.0377
246.1212 6400 0.0437
249.7879 6500 0.0407

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CustomBatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
50
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Detomo/cl-nagoya-sup-simcse-ja-nss-v0_9_13 1