llmvetter commited on
Commit
b39bdaf
·
verified ·
1 Parent(s): 18fe3ae

Add new SentenceTransformer model.

Browse files
0_SentenceTransformer/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
0_SentenceTransformer/README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - feature-extraction
8
+ - sentence-similarity
9
+ - transformers
10
+ datasets:
11
+ - s2orc
12
+ - flax-sentence-embeddings/stackexchange_xml
13
+ - ms_marco
14
+ - gooaq
15
+ - yahoo_answers_topics
16
+ - code_search_net
17
+ - search_qa
18
+ - eli5
19
+ - snli
20
+ - multi_nli
21
+ - wikihow
22
+ - natural_questions
23
+ - trivia_qa
24
+ - embedding-data/sentence-compression
25
+ - embedding-data/flickr30k-captions
26
+ - embedding-data/altlex
27
+ - embedding-data/simple-wiki
28
+ - embedding-data/QQP
29
+ - embedding-data/SPECTER
30
+ - embedding-data/PAQ_pairs
31
+ - embedding-data/WikiAnswers
32
+ pipeline_tag: sentence-similarity
33
+ ---
34
+
35
+
36
+ # all-mpnet-base-v2
37
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
38
+
39
+ ## Usage (Sentence-Transformers)
40
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
41
+
42
+ ```
43
+ pip install -U sentence-transformers
44
+ ```
45
+
46
+ Then you can use the model like this:
47
+ ```python
48
+ from sentence_transformers import SentenceTransformer
49
+ sentences = ["This is an example sentence", "Each sentence is converted"]
50
+
51
+ model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
52
+ embeddings = model.encode(sentences)
53
+ print(embeddings)
54
+ ```
55
+
56
+ ## Usage (HuggingFace Transformers)
57
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModel
61
+ import torch
62
+ import torch.nn.functional as F
63
+
64
+ #Mean Pooling - Take attention mask into account for correct averaging
65
+ def mean_pooling(model_output, attention_mask):
66
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
67
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
68
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
69
+
70
+
71
+ # Sentences we want sentence embeddings for
72
+ sentences = ['This is an example sentence', 'Each sentence is converted']
73
+
74
+ # Load model from HuggingFace Hub
75
+ tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-mpnet-base-v2')
76
+ model = AutoModel.from_pretrained('sentence-transformers/all-mpnet-base-v2')
77
+
78
+ # Tokenize sentences
79
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
80
+
81
+ # Compute token embeddings
82
+ with torch.no_grad():
83
+ model_output = model(**encoded_input)
84
+
85
+ # Perform pooling
86
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
87
+
88
+ # Normalize embeddings
89
+ sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
90
+
91
+ print("Sentence embeddings:")
92
+ print(sentence_embeddings)
93
+ ```
94
+
95
+ ## Evaluation Results
96
+
97
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/all-mpnet-base-v2)
98
+
99
+ ------
100
+
101
+ ## Background
102
+
103
+ The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
104
+ contrastive learning objective. We used the pretrained [`microsoft/mpnet-base`](https://huggingface.co/microsoft/mpnet-base) model and fine-tuned in on a
105
+ 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
106
+
107
+ We developped this model during the
108
+ [Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
109
+ organized by Hugging Face. We developped this model as part of the project:
110
+ [Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks.
111
+
112
+ ## Intended uses
113
+
114
+ Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures
115
+ the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
116
+
117
+ By default, input text longer than 384 word pieces is truncated.
118
+
119
+
120
+ ## Training procedure
121
+
122
+ ### Pre-training
123
+
124
+ We use the pretrained [`microsoft/mpnet-base`](https://huggingface.co/microsoft/mpnet-base) model. Please refer to the model card for more detailed information about the pre-training procedure.
125
+
126
+ ### Fine-tuning
127
+
128
+ We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch.
129
+ We then apply the cross entropy loss by comparing with true pairs.
130
+
131
+ #### Hyper parameters
132
+
133
+ We trained ou model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core).
134
+ We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
135
+ a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
136
+
137
+ #### Training data
138
+
139
+ We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences.
140
+ We sampled each dataset given a weighted probability which configuration is detailed in the `data_config.json` file.
141
+
142
+
143
+ | Dataset | Paper | Number of training tuples |
144
+ |--------------------------------------------------------|:----------------------------------------:|:--------------------------:|
145
+ | [Reddit comments (2015-2018)](https://github.com/PolyAI-LDN/conversational-datasets/tree/master/reddit) | [paper](https://arxiv.org/abs/1904.06472) | 726,484,430 |
146
+ | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Abstracts) | [paper](https://aclanthology.org/2020.acl-main.447/) | 116,288,806 |
147
+ | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs | [paper](https://doi.org/10.1145/2623330.2623677) | 77,427,422 |
148
+ | [PAQ](https://github.com/facebookresearch/PAQ) (Question, Answer) pairs | [paper](https://arxiv.org/abs/2102.07033) | 64,371,441 |
149
+ | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Titles) | [paper](https://aclanthology.org/2020.acl-main.447/) | 52,603,982 |
150
+ | [S2ORC](https://github.com/allenai/s2orc) (Title, Abstract) | [paper](https://aclanthology.org/2020.acl-main.447/) | 41,769,185 |
151
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Body) pairs | - | 25,316,456 |
152
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title+Body, Answer) pairs | - | 21,396,559 |
153
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Answer) pairs | - | 21,396,559 |
154
+ | [MS MARCO](https://microsoft.github.io/msmarco/) triplets | [paper](https://doi.org/10.1145/3404835.3462804) | 9,144,553 |
155
+ | [GOOAQ: Open Question Answering with Diverse Answer Types](https://github.com/allenai/gooaq) | [paper](https://arxiv.org/pdf/2104.08727.pdf) | 3,012,496 |
156
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 1,198,260 |
157
+ | [Code Search](https://huggingface.co/datasets/code_search_net) | - | 1,151,414 |
158
+ | [COCO](https://cocodataset.org/#home) Image captions | [paper](https://link.springer.com/chapter/10.1007%2F978-3-319-10602-1_48) | 828,395|
159
+ | [SPECTER](https://github.com/allenai/specter) citation triplets | [paper](https://doi.org/10.18653/v1/2020.acl-main.207) | 684,100 |
160
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Question, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 681,164 |
161
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Question) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 659,896 |
162
+ | [SearchQA](https://huggingface.co/datasets/search_qa) | [paper](https://arxiv.org/abs/1704.05179) | 582,261 |
163
+ | [Eli5](https://huggingface.co/datasets/eli5) | [paper](https://doi.org/10.18653/v1/p19-1346) | 325,475 |
164
+ | [Flickr 30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/229/33) | 317,695 |
165
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles) | | 304,525 |
166
+ | AllNLI ([SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) | [paper SNLI](https://doi.org/10.18653/v1/d15-1075), [paper MultiNLI](https://doi.org/10.18653/v1/n18-1101) | 277,230 |
167
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (bodies) | | 250,519 |
168
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles+bodies) | | 250,460 |
169
+ | [Sentence Compression](https://github.com/google-research-datasets/sentence-compression) | [paper](https://www.aclweb.org/anthology/D13-1155/) | 180,000 |
170
+ | [Wikihow](https://github.com/pvl/wikihow_pairs_dataset) | [paper](https://arxiv.org/abs/1810.09305) | 128,542 |
171
+ | [Altlex](https://github.com/chridey/altlex/) | [paper](https://aclanthology.org/P16-1135.pdf) | 112,696 |
172
+ | [Quora Question Triplets](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs) | - | 103,663 |
173
+ | [Simple Wikipedia](https://cs.pomona.edu/~dkauchak/simplification/) | [paper](https://www.aclweb.org/anthology/P11-2117/) | 102,225 |
174
+ | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/1455) | 100,231 |
175
+ | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) | [paper](https://aclanthology.org/P18-2124.pdf) | 87,599 |
176
+ | [TriviaQA](https://huggingface.co/datasets/trivia_qa) | - | 73,346 |
177
+ | **Total** | | **1,170,060,424** |
0_SentenceTransformer/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/models/0_SentenceTransformer",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.47.1",
23
+ "vocab_size": 30527
24
+ }
0_SentenceTransformer/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
0_SentenceTransformer/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b3c8c717335c801abb15983036a6f1df4b6943fd6b93717969efd96d22eeec6
3
+ size 437967672
0_SentenceTransformer/modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
0_SentenceTransformer/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
0_SentenceTransformer/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
0_SentenceTransformer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
0_SentenceTransformer/tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
0_SentenceTransformer/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 768, "out_features": 512, "bias": true, "activation_function": "torch.nn.modules.activation.Tanh"}
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fa702c3aa3d3e152ce2eb57d0904dd43c42ba0023efd2c8bb8447a17d6000ab
3
+ size 1575072
README.md ADDED
@@ -0,0 +1,555 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:3820
8
+ - loss:MultipleNegativesRankingLoss
9
+ widget:
10
+ - source_sentence: samsung ms23h3125ak/ms23h3125ak
11
+ sentences:
12
+ - Canon EOS M50 + 15-45mm IS STM
13
+ - Bosch KIV32X23GB Integrated
14
+ - Indesit DIF04B1 Integrated
15
+ - Samsung MS23H3125AK Black
16
+ - Samsung RB29FWRNDBC Black
17
+ - Hisense RQ560N4WC1
18
+ - Samsung UE32M5520
19
+ - Nikon CoolPix A10
20
+ - Hotpoint RPD10457JKK
21
+ - HP Intel Xeon X5670 2.93GHz Socket 1366 3200MHz bus Upgrade Tray
22
+ - Indesit DFG15B1S Silver
23
+ - Samsung WW10M86DQOO
24
+ - Bosch SMV46MX00G Integrated
25
+ - LG 49SK8100PLA
26
+ - Nikon CoolPix W300
27
+ - AMD Ryzen 3 1300X 3.5GHz Box
28
+ - LG OLED65B8PLA
29
+ - Samsung Galaxy J5 SM-J530
30
+ - LG 65UK6500PLA
31
+ - Siemens WM14T391GB
32
+ - Apple iPhone SE 32GB
33
+ - source_sentence: lg oled65c8pla
34
+ sentences:
35
+ - Beko LCSM1545W White
36
+ - Bosch KAN90VI20G Stainless Steel
37
+ - Canon PowerShot SX60 HS
38
+ - Hotpoint WMAQF621P
39
+ - Apple iPhone 7 Plus 32GB
40
+ - Hotpoint FFU4DK Black
41
+ - Fujifilm Finepix XP130
42
+ - Bosch WAN24108GB
43
+ - LG OLED65E8PLA
44
+ - Intel Core i7-8700K 3.7GHz Box
45
+ - Fujifilm X-Pro2
46
+ - LG OLED65C8PLA
47
+ - Samsung UE55NU8000
48
+ - LG 49LK5900PLA
49
+ - Apple iPhone 8 64GB
50
+ - Samsung UE65NU7100
51
+ - AEG L6FBG942R
52
+ - AMD Ryzen 7 1700 3GHz Box
53
+ - Panasonic TX-49FX750B
54
+ - Bosch WKD28351GB
55
+ - Bosch GUD15A50GB Integrated
56
+ - source_sentence: 15.748 cm 6.2 2960 x 1440 samoled octa core 2.3ghz quad 1.7gh
57
+ sentences:
58
+ - Apple iPhone SE 32GB
59
+ - Apple iPhone X 64GB
60
+ - LG 55SK9500PLA
61
+ - Sony Cyber-shot DSC-WX500
62
+ - Samsung Galaxy A5 SM-A520F
63
+ - Apple iPhone 8 Plus 64GB
64
+ - Indesit IWDD7123
65
+ - Bosch SMS67MW01G White
66
+ - Bosch KGV33XW30G White
67
+ - Samsung WW80K5413UW
68
+ - AMD Ryzen 3 1300X 3.5GHz Box
69
+ - Bosch WAW28750GB
70
+ - Samsung Galaxy S8+ 64GB
71
+ - Bosch KGN39VW35G White
72
+ - Intel Core i7-7700K 4.2GHz Box
73
+ - Hotpoint RZAAV22P White
74
+ - Samsung UE49NU8000
75
+ - HP AMD Opteron 6276 2.3GHz Upgrade Tray
76
+ - Praktica Luxmedia Z250
77
+ - Hotpoint HFC2B19SV White
78
+ - Hisense RB385N4EW1 White
79
+ - source_sentence: boxed processor amd ryzen 3 1200 4 x 3.1 ghz quad
80
+ sentences:
81
+ - Bosch KGN36HI32 Stainless Steel
82
+ - Bosch SMS24AW01G White
83
+ - Hotpoint WDAL8640P
84
+ - Doro 6050
85
+ - Samsung QE55Q7FN
86
+ - AMD Ryzen 3 1200 3.1GHz Box
87
+ - Samsung UE55NU7500
88
+ - Huawei Honor 10 128GB Dual SIM
89
+ - Sony Xperia L1
90
+ - Hotpoint FFU4DK Black
91
+ - Hoover DXOC 68C3B
92
+ - Sony Xperia XA1
93
+ - Nikon D7200 + 18-105mm VR
94
+ - HP Intel Xeon DP E5640 2.66GHz Socket 1366 1066MHz bus Upgrade Tray
95
+ - Samsung UE49NU8000
96
+ - Panasonic Lumix DMC-FT30
97
+ - Hotpoint FDL 9640K UK
98
+ - Apple iPhone 6S Plus 128GB
99
+ - Nikon D5600 + AF-P 18-55mm VR
100
+ - HP AMD Opteron 6238 2.6GHz Upgrade Tray
101
+ - Apple iPhone SE 32GB
102
+ - source_sentence: lg 49uk6300plb/49uk6300plb
103
+ sentences:
104
+ - Bosch KIR24V20GB Integrated
105
+ - Bosch WAWH8660GB
106
+ - Intel Core i5-7600K 3.80GHz Box
107
+ - Sony Bravia KD-65AF8
108
+ - Samsung RL4362FBASL Stainless Steel
109
+ - Bosch SMI50C15GB Silver
110
+ - Apple iPhone XS Max 256GB
111
+ - Fujifilm X-T100 + XC 15-45/f3.5-5.6 OIS PZ
112
+ - Bosch KGN36VW35G White
113
+ - Samsung WW70K5410UW
114
+ - Samsung Galaxy J6
115
+ - LG 49UK6300PLB
116
+ - Doro Secure 580
117
+ - Sony Xperia XZ1 Compact
118
+ - Bosch SMV50C10GB Integrated
119
+ - Bosch KGN34VB35G Black
120
+ - Panasonic NN-E27JWMBPQ White
121
+ - Samsung WW10M86DQOA/EU
122
+ - LG 55SK9500PLA
123
+ - Samsung QE65Q8DN
124
+ - Canon EOS 80D
125
+ pipeline_tag: sentence-similarity
126
+ library_name: sentence-transformers
127
+ metrics:
128
+ - cosine_accuracy@1
129
+ - cosine_accuracy@3
130
+ - cosine_accuracy@5
131
+ - cosine_accuracy@10
132
+ - cosine_precision@1
133
+ - cosine_precision@3
134
+ - cosine_precision@5
135
+ - cosine_precision@10
136
+ - cosine_recall@1
137
+ - cosine_recall@3
138
+ - cosine_recall@5
139
+ - cosine_recall@10
140
+ - cosine_ndcg@10
141
+ - cosine_mrr@10
142
+ - cosine_map@100
143
+ model-index:
144
+ - name: SentenceTransformer
145
+ results:
146
+ - task:
147
+ type: information-retrieval
148
+ name: Information Retrieval
149
+ dataset:
150
+ name: Product Category Retrieval Test
151
+ type: Product-Category-Retrieval-Test
152
+ metrics:
153
+ - type: cosine_accuracy@1
154
+ value: 0.8085774058577406
155
+ name: Cosine Accuracy@1
156
+ - type: cosine_accuracy@3
157
+ value: 0.9476987447698745
158
+ name: Cosine Accuracy@3
159
+ - type: cosine_accuracy@5
160
+ value: 0.9644351464435147
161
+ name: Cosine Accuracy@5
162
+ - type: cosine_accuracy@10
163
+ value: 0.9769874476987448
164
+ name: Cosine Accuracy@10
165
+ - type: cosine_precision@1
166
+ value: 0.8085774058577406
167
+ name: Cosine Precision@1
168
+ - type: cosine_precision@3
169
+ value: 0.3158995815899582
170
+ name: Cosine Precision@3
171
+ - type: cosine_precision@5
172
+ value: 0.19288702928870294
173
+ name: Cosine Precision@5
174
+ - type: cosine_precision@10
175
+ value: 0.09769874476987449
176
+ name: Cosine Precision@10
177
+ - type: cosine_recall@1
178
+ value: 0.8085774058577406
179
+ name: Cosine Recall@1
180
+ - type: cosine_recall@3
181
+ value: 0.9476987447698745
182
+ name: Cosine Recall@3
183
+ - type: cosine_recall@5
184
+ value: 0.9644351464435147
185
+ name: Cosine Recall@5
186
+ - type: cosine_recall@10
187
+ value: 0.9769874476987448
188
+ name: Cosine Recall@10
189
+ - type: cosine_ndcg@10
190
+ value: 0.9041917131034228
191
+ name: Cosine Ndcg@10
192
+ - type: cosine_mrr@10
193
+ value: 0.879607906621505
194
+ name: Cosine Mrr@10
195
+ - type: cosine_map@100
196
+ value: 0.8805000617705705
197
+ name: Cosine Map@100
198
+ ---
199
+
200
+ # SentenceTransformer
201
+
202
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
203
+
204
+ ## Model Details
205
+
206
+ ### Model Description
207
+ - **Model Type:** Sentence Transformer
208
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
209
+ - **Maximum Sequence Length:** 384 tokens
210
+ - **Output Dimensionality:** 512 dimensions
211
+ - **Similarity Function:** Cosine Similarity
212
+ <!-- - **Training Dataset:** Unknown -->
213
+ <!-- - **Language:** Unknown -->
214
+ <!-- - **License:** Unknown -->
215
+
216
+ ### Model Sources
217
+
218
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
219
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
220
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
221
+
222
+ ### Full Model Architecture
223
+
224
+ ```
225
+ SentenceTransformer(
226
+ (0): SentenceTransformer(
227
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
228
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
229
+ (2): Normalize()
230
+ )
231
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
232
+ (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
233
+ )
234
+ ```
235
+
236
+ ## Usage
237
+
238
+ ### Direct Usage (Sentence Transformers)
239
+
240
+ First install the Sentence Transformers library:
241
+
242
+ ```bash
243
+ pip install -U sentence-transformers
244
+ ```
245
+
246
+ Then you can load this model and run inference.
247
+ ```python
248
+ from sentence_transformers import SentenceTransformer
249
+
250
+ # Download from the 🤗 Hub
251
+ model = SentenceTransformer("llmvetter/embedding_finetune")
252
+ # Run inference
253
+ sentences = [
254
+ 'lg 49uk6300plb/49uk6300plb',
255
+ 'LG 49UK6300PLB',
256
+ 'Samsung Galaxy J6',
257
+ ]
258
+ embeddings = model.encode(sentences)
259
+ print(embeddings.shape)
260
+ # [3, 512]
261
+
262
+ # Get the similarity scores for the embeddings
263
+ similarities = model.similarity(embeddings, embeddings)
264
+ print(similarities.shape)
265
+ # [3, 3]
266
+ ```
267
+
268
+ <!--
269
+ ### Direct Usage (Transformers)
270
+
271
+ <details><summary>Click to see the direct usage in Transformers</summary>
272
+
273
+ </details>
274
+ -->
275
+
276
+ <!--
277
+ ### Downstream Usage (Sentence Transformers)
278
+
279
+ You can finetune this model on your own dataset.
280
+
281
+ <details><summary>Click to expand</summary>
282
+
283
+ </details>
284
+ -->
285
+
286
+ <!--
287
+ ### Out-of-Scope Use
288
+
289
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
290
+ -->
291
+
292
+ ## Evaluation
293
+
294
+ ### Metrics
295
+
296
+ #### Information Retrieval
297
+
298
+ * Dataset: `Product-Category-Retrieval-Test`
299
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
300
+
301
+ | Metric | Value |
302
+ |:--------------------|:-----------|
303
+ | cosine_accuracy@1 | 0.8086 |
304
+ | cosine_accuracy@3 | 0.9477 |
305
+ | cosine_accuracy@5 | 0.9644 |
306
+ | cosine_accuracy@10 | 0.977 |
307
+ | cosine_precision@1 | 0.8086 |
308
+ | cosine_precision@3 | 0.3159 |
309
+ | cosine_precision@5 | 0.1929 |
310
+ | cosine_precision@10 | 0.0977 |
311
+ | cosine_recall@1 | 0.8086 |
312
+ | cosine_recall@3 | 0.9477 |
313
+ | cosine_recall@5 | 0.9644 |
314
+ | cosine_recall@10 | 0.977 |
315
+ | **cosine_ndcg@10** | **0.9042** |
316
+ | cosine_mrr@10 | 0.8796 |
317
+ | cosine_map@100 | 0.8805 |
318
+
319
+ <!--
320
+ ## Bias, Risks and Limitations
321
+
322
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
323
+ -->
324
+
325
+ <!--
326
+ ### Recommendations
327
+
328
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
329
+ -->
330
+
331
+ ## Training Details
332
+
333
+ ### Training Dataset
334
+
335
+ #### Unnamed Dataset
336
+
337
+
338
+ * Size: 3,820 training samples
339
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, <code>sentence_2</code>, <code>sentence_3</code>, <code>sentence_4</code>, <code>sentence_5</code>, <code>sentence_6</code>, <code>sentence_7</code>, <code>sentence_8</code>, <code>sentence_9</code>, <code>sentence_10</code>, <code>sentence_11</code>, <code>sentence_12</code>, <code>sentence_13</code>, <code>sentence_14</code>, <code>sentence_15</code>, <code>sentence_16</code>, <code>sentence_17</code>, <code>sentence_18</code>, <code>sentence_19</code>, <code>sentence_20</code>, and <code>sentence_21</code>
340
+ * Approximate statistics based on the first 1000 samples:
341
+ | | sentence_0 | sentence_1 | sentence_2 | sentence_3 | sentence_4 | sentence_5 | sentence_6 | sentence_7 | sentence_8 | sentence_9 | sentence_10 | sentence_11 | sentence_12 | sentence_13 | sentence_14 | sentence_15 | sentence_16 | sentence_17 | sentence_18 | sentence_19 | sentence_20 | sentence_21 |
342
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
343
+ | type | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string |
344
+ | details | <ul><li>min: 4 tokens</li><li>mean: 18.41 tokens</li><li>max: 47 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.94 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.11 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.15 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.89 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.89 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.98 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.07 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.04 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.84 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.82 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.81 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.05 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.92 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.18 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.07 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.93 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.02 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.04 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 11.02 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.95 tokens</li><li>max: 30 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 10.86 tokens</li><li>max: 30 tokens</li></ul> |
345
+ * Samples:
346
+ | sentence_0 | sentence_1 | sentence_2 | sentence_3 | sentence_4 | sentence_5 | sentence_6 | sentence_7 | sentence_8 | sentence_9 | sentence_10 | sentence_11 | sentence_12 | sentence_13 | sentence_14 | sentence_15 | sentence_16 | sentence_17 | sentence_18 | sentence_19 | sentence_20 | sentence_21 |
347
+ |:---------------------------------------------------------------------|:----------------------------------------|:---------------------------------------------|:-------------------------------------|:-------------------------------------|:--------------------------------------|:----------------------------------------------|:----------------------------------|:---------------------------------|:----------------------------------------------|:-----------------------------------------------------------------------------|:---------------------------------------------|:------------------------------------|:--------------------------------------------|:---------------------------------------------|:----------------------------------------|:-------------------------------------------------|:-------------------------------|:------------------------------------------|:---------------------------------------|:--------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
348
+ | <code>sony kd49xf8505bu 49 4k ultra hd tv</code> | <code>Sony Bravia KD-49XF8505</code> | <code>Intel Core i7-8700K 3.7GHz Box</code> | <code>Bosch WAN24100GB</code> | <code>AMD FX-6300 3.5GHz Box</code> | <code>Bosch WIW28500GB</code> | <code>Bosch KGN36VL35G Stainless Steel</code> | <code>Indesit XWDE751480XS</code> | <code>CAT S41 Dual SIM</code> | <code>Sony Xperia XA1 Ultra 32GB</code> | <code>Samsung Galaxy J6</code> | <code>Samsung QE55Q7FN</code> | <code>Bosch KGN39VW35G White</code> | <code>Intel Core i5 7400 3.0GHz Box</code> | <code>Neff C17UR02N0B Stainless Steel</code> | <code>Samsung RR39M7340SA Silver</code> | <code>Samsung RB41J7255SR Stainless Steel</code> | <code>Hoover DXOC 68C3B</code> | <code>Canon PowerShot SX730 HS</code> | <code>Samsung RR39M7340BC Black</code> | <code>Praktica Luxmedia WP240</code> | <code>HP Intel Xeon DP E5506 2.13GHz Socket 1366 800MHz bus Upgrade Tray</code> |
349
+ | <code>doro 8040 4g sim free mobile phone black</code> | <code>Doro 8040</code> | <code>Bosch HMT75M551 Stainless Steel</code> | <code>Bosch SMI50C15GB Silver</code> | <code>Samsung WW90K5413UX</code> | <code>Panasonic Lumix DMC-TZ70</code> | <code>Sony KD-49XF7073</code> | <code>Nikon CoolPix W100</code> | <code>Samsung WD90J6A10AW</code> | <code>Bosch CFA634GS1B Stainless Steel</code> | <code>HP AMD Opteron 8425 HE 2.1GHz Socket F 4800MHz bus Upgrade Tray</code> | <code>Canon EOS 800D + 18-55mm IS STM</code> | <code>Samsung UE50NU7400</code> | <code>Apple iPhone 6S 128GB</code> | <code>Samsung RS52N3313SA/EU Graphite</code> | <code>Bosch WAW325H0GB</code> | <code>Sony Bravia KD-55AF8</code> | <code>Sony Alpha 6500</code> | <code>Doro 5030</code> | <code>LG GSL761WBXV Black</code> | <code>Bosch SMS67MW00G White</code> | <code>AEG L6FBG942R</code> |
350
+ | <code>fridgemaster muz4965 undercounter freezer white a rated</code> | <code>Fridgemaster MUZ4965 White</code> | <code>Samsung UE49NU7100</code> | <code>Nikon CoolPix A10</code> | <code>Samsung UE55NU7100</code> | <code>Samsung QE55Q7FN</code> | <code>Bosch KGN49XL30G Stainless Steel</code> | <code>Samsung UE49NU7500</code> | <code>LG 55UK6300PLB</code> | <code>Hoover DXOC 68C3B</code> | <code>Panasonic Lumix DMC-FZ2000</code> | <code>Panasonic Lumix DMC-TZ80</code> | <code>Bosch WKD28541GB</code> | <code>Apple iPhone 6 32GB</code> | <code>Sony Bravia KDL-32WE613</code> | <code>Lec TF50152W White</code> | <code>Bosch KGV36VW32G White</code> | <code>Bosch WAYH8790GB</code> | <code>Samsung RS68N8240B1/EU Black</code> | <code>Sony Xperia XZ1</code> | <code>HP Intel Xeon DP E5506 2.13GHz Socket 1366 800MHz bus Upgrade Tray</code> | <code>Sharp R372WM White</code> |
351
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
352
+ ```json
353
+ {
354
+ "scale": 20.0,
355
+ "similarity_fct": "cos_sim"
356
+ }
357
+ ```
358
+
359
+ ### Training Hyperparameters
360
+ #### Non-Default Hyperparameters
361
+
362
+ - `per_device_train_batch_size`: 32
363
+ - `per_device_eval_batch_size`: 32
364
+ - `num_train_epochs`: 8
365
+ - `multi_dataset_batch_sampler`: round_robin
366
+
367
+ #### All Hyperparameters
368
+ <details><summary>Click to expand</summary>
369
+
370
+ - `overwrite_output_dir`: False
371
+ - `do_predict`: False
372
+ - `eval_strategy`: no
373
+ - `prediction_loss_only`: True
374
+ - `per_device_train_batch_size`: 32
375
+ - `per_device_eval_batch_size`: 32
376
+ - `per_gpu_train_batch_size`: None
377
+ - `per_gpu_eval_batch_size`: None
378
+ - `gradient_accumulation_steps`: 1
379
+ - `eval_accumulation_steps`: None
380
+ - `torch_empty_cache_steps`: None
381
+ - `learning_rate`: 5e-05
382
+ - `weight_decay`: 0.0
383
+ - `adam_beta1`: 0.9
384
+ - `adam_beta2`: 0.999
385
+ - `adam_epsilon`: 1e-08
386
+ - `max_grad_norm`: 1
387
+ - `num_train_epochs`: 8
388
+ - `max_steps`: -1
389
+ - `lr_scheduler_type`: linear
390
+ - `lr_scheduler_kwargs`: {}
391
+ - `warmup_ratio`: 0.0
392
+ - `warmup_steps`: 0
393
+ - `log_level`: passive
394
+ - `log_level_replica`: warning
395
+ - `log_on_each_node`: True
396
+ - `logging_nan_inf_filter`: True
397
+ - `save_safetensors`: True
398
+ - `save_on_each_node`: False
399
+ - `save_only_model`: False
400
+ - `restore_callback_states_from_checkpoint`: False
401
+ - `no_cuda`: False
402
+ - `use_cpu`: False
403
+ - `use_mps_device`: False
404
+ - `seed`: 42
405
+ - `data_seed`: None
406
+ - `jit_mode_eval`: False
407
+ - `use_ipex`: False
408
+ - `bf16`: False
409
+ - `fp16`: False
410
+ - `fp16_opt_level`: O1
411
+ - `half_precision_backend`: auto
412
+ - `bf16_full_eval`: False
413
+ - `fp16_full_eval`: False
414
+ - `tf32`: None
415
+ - `local_rank`: 0
416
+ - `ddp_backend`: None
417
+ - `tpu_num_cores`: None
418
+ - `tpu_metrics_debug`: False
419
+ - `debug`: []
420
+ - `dataloader_drop_last`: False
421
+ - `dataloader_num_workers`: 0
422
+ - `dataloader_prefetch_factor`: None
423
+ - `past_index`: -1
424
+ - `disable_tqdm`: False
425
+ - `remove_unused_columns`: True
426
+ - `label_names`: None
427
+ - `load_best_model_at_end`: False
428
+ - `ignore_data_skip`: False
429
+ - `fsdp`: []
430
+ - `fsdp_min_num_params`: 0
431
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
432
+ - `fsdp_transformer_layer_cls_to_wrap`: None
433
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
434
+ - `deepspeed`: None
435
+ - `label_smoothing_factor`: 0.0
436
+ - `optim`: adamw_torch
437
+ - `optim_args`: None
438
+ - `adafactor`: False
439
+ - `group_by_length`: False
440
+ - `length_column_name`: length
441
+ - `ddp_find_unused_parameters`: None
442
+ - `ddp_bucket_cap_mb`: None
443
+ - `ddp_broadcast_buffers`: False
444
+ - `dataloader_pin_memory`: True
445
+ - `dataloader_persistent_workers`: False
446
+ - `skip_memory_metrics`: True
447
+ - `use_legacy_prediction_loop`: False
448
+ - `push_to_hub`: False
449
+ - `resume_from_checkpoint`: None
450
+ - `hub_model_id`: None
451
+ - `hub_strategy`: every_save
452
+ - `hub_private_repo`: None
453
+ - `hub_always_push`: False
454
+ - `gradient_checkpointing`: False
455
+ - `gradient_checkpointing_kwargs`: None
456
+ - `include_inputs_for_metrics`: False
457
+ - `include_for_metrics`: []
458
+ - `eval_do_concat_batches`: True
459
+ - `fp16_backend`: auto
460
+ - `push_to_hub_model_id`: None
461
+ - `push_to_hub_organization`: None
462
+ - `mp_parameters`:
463
+ - `auto_find_batch_size`: False
464
+ - `full_determinism`: False
465
+ - `torchdynamo`: None
466
+ - `ray_scope`: last
467
+ - `ddp_timeout`: 1800
468
+ - `torch_compile`: False
469
+ - `torch_compile_backend`: None
470
+ - `torch_compile_mode`: None
471
+ - `dispatch_batches`: None
472
+ - `split_batches`: None
473
+ - `include_tokens_per_second`: False
474
+ - `include_num_input_tokens_seen`: False
475
+ - `neftune_noise_alpha`: None
476
+ - `optim_target_modules`: None
477
+ - `batch_eval_metrics`: False
478
+ - `eval_on_start`: False
479
+ - `use_liger_kernel`: False
480
+ - `eval_use_gather_object`: False
481
+ - `average_tokens_across_devices`: False
482
+ - `prompts`: None
483
+ - `batch_sampler`: batch_sampler
484
+ - `multi_dataset_batch_sampler`: round_robin
485
+
486
+ </details>
487
+
488
+ ### Training Logs
489
+ | Epoch | Step | Training Loss | Product-Category-Retrieval-Test_cosine_ndcg@10 |
490
+ |:------:|:----:|:-------------:|:----------------------------------------------:|
491
+ | 1.0 | 120 | - | 0.7406 |
492
+ | 2.0 | 240 | - | 0.8437 |
493
+ | 3.0 | 360 | - | 0.8756 |
494
+ | 4.0 | 480 | - | 0.8875 |
495
+ | 4.1667 | 500 | 2.5302 | - |
496
+ | 5.0 | 600 | - | 0.8963 |
497
+ | 6.0 | 720 | - | 0.9015 |
498
+ | 7.0 | 840 | - | 0.9042 |
499
+
500
+
501
+ ### Framework Versions
502
+ - Python: 3.11.10
503
+ - Sentence Transformers: 3.3.1
504
+ - Transformers: 4.47.1
505
+ - PyTorch: 2.5.1+cu124
506
+ - Accelerate: 1.2.1
507
+ - Datasets: 3.2.0
508
+ - Tokenizers: 0.21.0
509
+
510
+ ## Citation
511
+
512
+ ### BibTeX
513
+
514
+ #### Sentence Transformers
515
+ ```bibtex
516
+ @inproceedings{reimers-2019-sentence-bert,
517
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
518
+ author = "Reimers, Nils and Gurevych, Iryna",
519
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
520
+ month = "11",
521
+ year = "2019",
522
+ publisher = "Association for Computational Linguistics",
523
+ url = "https://arxiv.org/abs/1908.10084",
524
+ }
525
+ ```
526
+
527
+ #### MultipleNegativesRankingLoss
528
+ ```bibtex
529
+ @misc{henderson2017efficient,
530
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
531
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
532
+ year={2017},
533
+ eprint={1705.00652},
534
+ archivePrefix={arXiv},
535
+ primaryClass={cs.CL}
536
+ }
537
+ ```
538
+
539
+ <!--
540
+ ## Glossary
541
+
542
+ *Clearly define terms in order to be accessible across audiences.*
543
+ -->
544
+
545
+ <!--
546
+ ## Model Card Authors
547
+
548
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
549
+ -->
550
+
551
+ <!--
552
+ ## Model Card Contact
553
+
554
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
555
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "0_SentenceTransformer",
6
+ "type": "sentence_transformers.SentenceTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ }
20
+ ]