SentenceTransformer based on Geotrend/bert-base-sw-cased

This is a sentence-transformers model finetuned from Geotrend/bert-base-sw-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Geotrend/bert-base-sw-cased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mollel/swahili-bert-base-sw-cased-nli-matryoshka")
# Run inference
sentences = [
    'Mwanamume aliyevalia koti la bluu la kuzuia upepo, amelala uso chini kwenye benchi ya bustani, akiwa na chupa ya pombe iliyofungwa kwenye mojawapo ya miguu ya benchi.',
    'Mwanamume amelala uso chini kwenye benchi ya bustani.',
    'Mwanamume fulani anacheza dansi kwenye klabu hiyo akifungua chupa.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-test-768
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6869
spearman_cosine	0.6802
pearson_manhattan	0.6719
spearman_manhattan	0.6653
pearson_euclidean	0.6734
spearman_euclidean	0.6666
pearson_dot	0.554
spearman_dot	0.5399
pearson_max	0.6869
spearman_max	0.6802

Semantic Similarity

Dataset: sts-test-512
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6828
spearman_cosine	0.677
pearson_manhattan	0.6729
spearman_manhattan	0.6664
pearson_euclidean	0.6738
spearman_euclidean	0.6667
pearson_dot	0.5296
spearman_dot	0.5174
pearson_max	0.6828
spearman_max	0.677

Semantic Similarity

Dataset: sts-test-256
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6758
spearman_cosine	0.6702
pearson_manhattan	0.6718
spearman_manhattan	0.6643
pearson_euclidean	0.673
spearman_euclidean	0.665
pearson_dot	0.4892
spearman_dot	0.4783
pearson_max	0.6758
spearman_max	0.6702

Semantic Similarity

Dataset: sts-test-128
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.67
spearman_cosine	0.6638
pearson_manhattan	0.6693
spearman_manhattan	0.6594
pearson_euclidean	0.671
spearman_euclidean	0.6601
pearson_dot	0.4509
spearman_dot	0.4402
pearson_max	0.671
spearman_max	0.6638

Semantic Similarity

Dataset: sts-test-64
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.6615
spearman_cosine	0.6556
pearson_manhattan	0.6653
spearman_manhattan	0.6533
pearson_euclidean	0.6672
spearman_euclidean	0.654
pearson_dot	0.3868
spearman_dot	0.3771
pearson_max	0.6672
spearman_max	0.6556

Training Details

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	sts-test-128_spearman_cosine	sts-test-256_spearman_cosine	sts-test-512_spearman_cosine	sts-test-64_spearman_cosine	sts-test-768_spearman_cosine
0.0057	100	20.0932	-	-	-	-	-
0.0115	200	16.2641	-	-	-	-	-
0.0172	300	12.797	-	-	-	-	-
0.0229	400	12.1927	-	-	-	-	-
0.0287	500	11.0423	-	-	-	-	-
0.0344	600	9.676	-	-	-	-	-
0.0402	700	8.1545	-	-	-	-	-
0.0459	800	7.7822	-	-	-	-	-
0.0516	900	7.9352	-	-	-	-	-
0.0574	1000	7.9534	-	-	-	-	-
0.0631	1100	8.1006	-	-	-	-	-
0.0688	1200	7.4767	-	-	-	-	-
0.0746	1300	8.3747	-	-	-	-	-
0.0803	1400	7.7686	-	-	-	-	-
0.0860	1500	6.8076	-	-	-	-	-
0.0918	1600	6.9238	-	-	-	-	-
0.0975	1700	6.5503	-	-	-	-	-
0.1033	1800	6.74	-	-	-	-	-
0.1090	1900	7.7802	-	-	-	-	-
0.1147	2000	7.2594	-	-	-	-	-
0.1205	2100	7.091	-	-	-	-	-
0.1262	2200	6.8677	-	-	-	-	-
0.1319	2300	6.4249	-	-	-	-	-
0.1377	2400	6.1512	-	-	-	-	-
0.1434	2500	5.9714	-	-	-	-	-
0.1491	2600	5.4914	-	-	-	-	-
0.1549	2700	5.5825	-	-	-	-	-
0.1606	2800	5.9456	-	-	-	-	-
0.1664	2900	6.4012	-	-	-	-	-
0.1721	3000	7.1999	-	-	-	-	-
0.1778	3100	6.8254	-	-	-	-	-
0.1836	3200	6.541	-	-	-	-	-
0.1893	3300	6.5411	-	-	-	-	-
0.1950	3400	5.56	-	-	-	-	-
0.2008	3500	6.4692	-	-	-	-	-
0.2065	3600	5.9266	-	-	-	-	-
0.2122	3700	6.2055	-	-	-	-	-
0.2180	3800	6.0835	-	-	-	-	-
0.2237	3900	6.6112	-	-	-	-	-
0.2294	4000	6.3391	-	-	-	-	-
0.2352	4100	5.8379	-	-	-	-	-
0.2409	4200	5.8107	-	-	-	-	-
0.2467	4300	6.1473	-	-	-	-	-
0.2524	4400	6.2827	-	-	-	-	-
0.2581	4500	6.2299	-	-	-	-	-
0.2639	4600	6.1013	-	-	-	-	-
0.2696	4700	5.6491	-	-	-	-	-
0.2753	4800	5.8641	-	-	-	-	-
0.2811	4900	5.4278	-	-	-	-	-
0.2868	5000	5.7304	-	-	-	-	-
0.2925	5100	5.4652	-	-	-	-	-
0.2983	5200	5.9031	-	-	-	-	-
0.3040	5300	6.1014	-	-	-	-	-
0.3098	5400	5.9282	-	-	-	-	-
0.3155	5500	5.6618	-	-	-	-	-
0.3212	5600	5.3803	-	-	-	-	-
0.3270	5700	5.5759	-	-	-	-	-
0.3327	5800	5.6936	-	-	-	-	-
0.3384	5900	5.7249	-	-	-	-	-
0.3442	6000	5.5926	-	-	-	-	-
0.3499	6100	5.6329	-	-	-	-	-
0.3556	6200	5.7456	-	-	-	-	-
0.3614	6300	5.1638	-	-	-	-	-
0.3671	6400	5.3258	-	-	-	-	-
0.3729	6500	5.1216	-	-	-	-	-
0.3786	6600	5.7453	-	-	-	-	-
0.3843	6700	4.9906	-	-	-	-	-
0.3901	6800	5.1126	-	-	-	-	-
0.3958	6900	5.2389	-	-	-	-	-
0.4015	7000	5.1483	-	-	-	-	-
0.4073	7100	5.6072	-	-	-	-	-
0.4130	7200	5.2018	-	-	-	-	-
0.4187	7300	5.4083	-	-	-	-	-
0.4245	7400	5.1995	-	-	-	-	-
0.4302	7500	5.5787	-	-	-	-	-
0.4360	7600	4.9942	-	-	-	-	-
0.4417	7700	4.9196	-	-	-	-	-
0.4474	7800	5.3938	-	-	-	-	-
0.4532	7900	5.381	-	-	-	-	-
0.4589	8000	4.908	-	-	-	-	-
0.4646	8100	4.8871	-	-	-	-	-
0.4704	8200	5.2298	-	-	-	-	-
0.4761	8300	4.6157	-	-	-	-	-
0.4818	8400	5.0344	-	-	-	-	-
0.4876	8500	5.0713	-	-	-	-	-
0.4933	8600	5.1952	-	-	-	-	-
0.4991	8700	5.5352	-	-	-	-	-
0.5048	8800	5.1556	-	-	-	-	-
0.5105	8900	5.2318	-	-	-	-	-
0.5163	9000	4.7887	-	-	-	-	-
0.5220	9100	4.868	-	-	-	-	-
0.5277	9200	4.9544	-	-	-	-	-
0.5335	9300	4.816	-	-	-	-	-
0.5392	9400	4.8374	-	-	-	-	-
0.5449	9500	5.3242	-	-	-	-	-
0.5507	9600	4.9039	-	-	-	-	-
0.5564	9700	5.2907	-	-	-	-	-
0.5622	9800	5.4007	-	-	-	-	-
0.5679	9900	5.3016	-	-	-	-	-
0.5736	10000	5.3235	-	-	-	-	-
0.5794	10100	5.1566	-	-	-	-	-
0.5851	10200	5.1348	-	-	-	-	-
0.5908	10300	5.4583	-	-	-	-	-
0.5966	10400	4.9528	-	-	-	-	-
0.6023	10500	5.0073	-	-	-	-	-
0.6080	10600	5.0324	-	-	-	-	-
0.6138	10700	5.4107	-	-	-	-	-
0.6195	10800	5.3643	-	-	-	-	-
0.6253	10900	5.1267	-	-	-	-	-
0.6310	11000	5.0443	-	-	-	-	-
0.6367	11100	5.2001	-	-	-	-	-
0.6425	11200	4.8813	-	-	-	-	-
0.6482	11300	5.4734	-	-	-	-	-
0.6539	11400	5.0344	-	-	-	-	-
0.6597	11500	5.5043	-	-	-	-	-
0.6654	11600	4.6201	-	-	-	-	-
0.6711	11700	5.4626	-	-	-	-	-
0.6769	11800	5.3813	-	-	-	-	-
0.6826	11900	4.626	-	-	-	-	-
0.6883	12000	4.87	-	-	-	-	-
0.6941	12100	5.0015	-	-	-	-	-
0.6998	12200	4.962	-	-	-	-	-
0.7056	12300	5.1613	-	-	-	-	-
0.7113	12400	5.2074	-	-	-	-	-
0.7170	12500	4.958	-	-	-	-	-
0.7228	12600	4.4516	-	-	-	-	-
0.7285	12700	4.8421	-	-	-	-	-
0.7342	12800	4.9242	-	-	-	-	-
0.7400	12900	4.9256	-	-	-	-	-
0.7457	13000	4.8254	-	-	-	-	-
0.7514	13100	4.5114	-	-	-	-	-
0.7572	13200	7.7118	-	-	-	-	-
0.7629	13300	7.0822	-	-	-	-	-
0.7687	13400	6.8022	-	-	-	-	-
0.7744	13500	6.7295	-	-	-	-	-
0.7801	13600	6.0547	-	-	-	-	-
0.7859	13700	6.5285	-	-	-	-	-
0.7916	13800	6.2666	-	-	-	-	-
0.7973	13900	6.1031	-	-	-	-	-
0.8031	14000	5.9138	-	-	-	-	-
0.8088	14100	5.6636	-	-	-	-	-
0.8145	14200	5.7073	-	-	-	-	-
0.8203	14300	5.7963	-	-	-	-	-
0.8260	14400	5.7336	-	-	-	-	-
0.8318	14500	5.8113	-	-	-	-	-
0.8375	14600	5.6708	-	-	-	-	-
0.8432	14700	5.4565	-	-	-	-	-
0.8490	14800	5.4293	-	-	-	-	-
0.8547	14900	5.4166	-	-	-	-	-
0.8604	15000	5.3616	-	-	-	-	-
0.8662	15100	5.1579	-	-	-	-	-
0.8719	15200	5.3887	-	-	-	-	-
0.8776	15300	5.346	-	-	-	-	-
0.8834	15400	5.2762	-	-	-	-	-
0.8891	15500	5.3417	-	-	-	-	-
0.8949	15600	5.1607	-	-	-	-	-
0.9006	15700	5.4493	-	-	-	-	-
0.9063	15800	5.0268	-	-	-	-	-
0.9121	15900	5.0612	-	-	-	-	-
0.9178	16000	5.1471	-	-	-	-	-
0.9235	16100	4.8275	-	-	-	-	-
0.9293	16200	5.1464	-	-	-	-	-
0.9350	16300	4.958	-	-	-	-	-
0.9407	16400	5.1968	-	-	-	-	-
0.9465	16500	4.7783	-	-	-	-	-
0.9522	16600	5.0834	-	-	-	-	-
0.9580	16700	4.9839	-	-	-	-	-
0.9637	16800	5.0078	-	-	-	-	-
0.9694	16900	5.1624	-	-	-	-	-
0.9752	17000	5.2132	-	-	-	-	-
0.9809	17100	4.9741	-	-	-	-	-
0.9866	17200	4.96	-	-	-	-	-
0.9924	17300	5.1834	-	-	-	-	-
0.9981	17400	4.8955	-	-	-	-	-
1.0	17433	-	0.6638	0.6702	0.6770	0.6556	0.6802

Framework Versions

Python: 3.11.9
Sentence Transformers: 3.0.1
Transformers: 4.40.1
PyTorch: 2.3.0+cu121
Accelerate: 0.29.3
Datasets: 2.19.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Mollel
/

swahili-bert-base-sw-cased-nli-matryoshka

SentenceTransformer based on Geotrend/bert-base-sw-cased

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Semantic Similarity

Training Details

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MatryoshkaLoss

MultipleNegativesRankingLoss

Model tree for Mollel/swahili-bert-base-sw-cased-nli-matryoshka

Evaluation results