SentenceTransformer based on klue/roberta-base
This is a sentence-transformers model finetuned from klue/roberta-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: klue/roberta-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'지금까지 이탈리아 여행중에 가장 좋은 숙소였습니다',
'지금까지 가본 호텔보다 더 좋은 숙소였습니다.',
'‘코로나 아세안 대응기금’, ‘필수의료물품 비축제도’는 아세안+3가 함께 만들어낸 의미 있는 결과입니다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.3477 |
spearman_cosine | 0.3556 |
pearson_manhattan | 0.3674 |
spearman_manhattan | 0.3646 |
pearson_euclidean | 0.3607 |
spearman_euclidean | 0.3548 |
pearson_dot | 0.2125 |
spearman_dot | 0.2006 |
pearson_max | 0.3674 |
spearman_max | 0.3646 |
Semantic Similarity
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.9592 |
spearman_cosine | 0.9206 |
pearson_manhattan | 0.9531 |
spearman_manhattan | 0.9204 |
pearson_euclidean | 0.9533 |
spearman_euclidean | 0.9202 |
pearson_dot | 0.9482 |
spearman_dot | 0.9016 |
pearson_max | 0.9592 |
spearman_max | 0.9206 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 10,501 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 7 tokens
- mean: 20.14 tokens
- max: 59 tokens
- min: 7 tokens
- mean: 19.71 tokens
- max: 68 tokens
- min: 0.0
- mean: 0.44
- max: 1.0
- Samples:
sentence_0 sentence_1 label 가스레인지 사용하지 않도록 유의해주세요
가스레인지 사용은 삼가주세요
0.74
이번주하고 다음주 중에 언제 동기 모임이 있어?
언제 자연어처리 학회 논문 접수가 마감되나요?
0.02
또한 각 부처는 생활방역 관련 업무를 종합·체계적으로 수행하기 위해 기관별로 생활방역 전담팀(TF)을 구성한다.
또한 생활방지와 관련된 업무를 종합적이고 체계적으로 수행하기 위하여 각 부서별로 생활방역 전담 태스크포스(TF)를 구성하여야 합니다.
0.72
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 4multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | spearman_max |
---|---|---|---|
0 | 0 | - | 0.3646 |
0.7610 | 500 | 0.0278 | - |
1.0 | 657 | - | 0.9187 |
1.5221 | 1000 | 0.0085 | 0.9117 |
2.0 | 1314 | - | 0.9201 |
2.2831 | 1500 | 0.0044 | - |
3.0 | 1971 | - | 0.9186 |
3.0441 | 2000 | 0.0034 | 0.9199 |
3.8052 | 2500 | 0.0027 | - |
4.0 | 2628 | - | 0.9206 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.031 kWh
- Carbon Emitted: 0.014 kg of CO2
- Hours Used: 0.154 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3060
- CPU Model: 12th Gen Intel(R) Core(TM) i5-12400
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.12.4
- Sentence Transformers: 3.2.1
- Transformers: 4.45.2
- PyTorch: 2.4.0+cu121
- Accelerate: 0.29.3
- Datasets: 2.19.0
- Tokenizers: 0.20.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ktaek94/klue-roberta-base-klue-sts
Base model
klue/roberta-baseEvaluation results
- Pearson Cosine on Unknownself-reported0.348
- Spearman Cosine on Unknownself-reported0.356
- Pearson Manhattan on Unknownself-reported0.367
- Spearman Manhattan on Unknownself-reported0.365
- Pearson Euclidean on Unknownself-reported0.361
- Spearman Euclidean on Unknownself-reported0.355
- Pearson Dot on Unknownself-reported0.213
- Spearman Dot on Unknownself-reported0.201
- Pearson Max on Unknownself-reported0.367
- Spearman Max on Unknownself-reported0.365