girijesh commited on
Commit
e8373a3
1 Parent(s): 082b02c

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: The net cash provided by operating activities during fiscal 2023
16
+ was related to net income of $208 million, adjusted for non-cash items including
17
+ $3.8 billion of depreciation and amortization and $3.3 billion related to stock-based
18
+ compensation expense.
19
+ sentences:
20
+ - What are the three key aspects encompassed in a company's internal control over
21
+ financial reporting?
22
+ - What was the net cash provided by operating activities for fiscal 2023?
23
+ - What are the two operating segments of NVIDIA as mentioned in the text?
24
+ - source_sentence: Intellectual Property To establish and protect our proprietary
25
+ rights, we rely on a combination of patents, trademarks, copyrights, trade secrets,
26
+ including know-how, license agreements, confidentiality procedures, non-disclosure
27
+ agreements with third parties, employee disclosure and invention assignment agreements,
28
+ and other contractual rights.
29
+ sentences:
30
+ - What condition does Synthroid treat and what type of drug is it formulated as?
31
+ - What legal tools does the company use to protect its intellectual property?
32
+ - In which item and part of a financial document would you find information on legal
33
+ proceedings?
34
+ - source_sentence: Cost of revenues is comprised of TAC and other costs of revenues.
35
+ TAC includes amounts paid to our distribution partners and Google Network partners
36
+ primarily for ads displayed on their properties. Other cost of revenues includes
37
+ compensation expense related to our data centers and operations, content acquisition
38
+ costs, depreciation expense related to technical infrastructure, and inventory
39
+ and other costs related to devices we sell.
40
+ sentences:
41
+ - What is included in the cost of revenues for Google?
42
+ - What was the total net uncertain tax positions as of December 31, 2023?
43
+ - What portion of the restructuring charges incurred in fiscal 2023 are expected
44
+ to be settled with cash?
45
+ - source_sentence: Comprehensive income (loss) | $ | (362) | | $ | 1,868 | $ | 4,775
46
+ sentences:
47
+ - What measures does the company take to ensure product quality?
48
+ - How many pages does Item 8, which includes Financial Statements and Supplementary
49
+ Data, span?
50
+ - What was the total comprehensive income for Airbnb, Inc. in 2023?
51
+ - source_sentence: We make our branded beverage products available to consumers throughout
52
+ the world through our network of independent bottling partners, distributors,
53
+ wholesalers and retailers as well as our consolidated bottling and distribution
54
+ operations.
55
+ sentences:
56
+ - How does The Coca-Cola Company distribute its beverage products globally?
57
+ - What accounting method is predominantly used to determine inventory costs in the
58
+ Company's supermarket divisions before LIFO adjustments?
59
+ - How are the company's inventories valued?
60
+ pipeline_tag: sentence-similarity
61
+ library_name: sentence-transformers
62
+ metrics:
63
+ - cosine_accuracy@1
64
+ - cosine_accuracy@3
65
+ - cosine_accuracy@5
66
+ - cosine_accuracy@10
67
+ - cosine_precision@1
68
+ - cosine_precision@3
69
+ - cosine_precision@5
70
+ - cosine_precision@10
71
+ - cosine_recall@1
72
+ - cosine_recall@3
73
+ - cosine_recall@5
74
+ - cosine_recall@10
75
+ - cosine_ndcg@10
76
+ - cosine_mrr@10
77
+ - cosine_map@100
78
+ model-index:
79
+ - name: BGE base Financial Matryoshka
80
+ results:
81
+ - task:
82
+ type: information-retrieval
83
+ name: Information Retrieval
84
+ dataset:
85
+ name: dim 768
86
+ type: dim_768
87
+ metrics:
88
+ - type: cosine_accuracy@1
89
+ value: 0.7142857142857143
90
+ name: Cosine Accuracy@1
91
+ - type: cosine_accuracy@3
92
+ value: 0.8485714285714285
93
+ name: Cosine Accuracy@3
94
+ - type: cosine_accuracy@5
95
+ value: 0.8814285714285715
96
+ name: Cosine Accuracy@5
97
+ - type: cosine_accuracy@10
98
+ value: 0.9171428571428571
99
+ name: Cosine Accuracy@10
100
+ - type: cosine_precision@1
101
+ value: 0.7142857142857143
102
+ name: Cosine Precision@1
103
+ - type: cosine_precision@3
104
+ value: 0.28285714285714286
105
+ name: Cosine Precision@3
106
+ - type: cosine_precision@5
107
+ value: 0.17628571428571424
108
+ name: Cosine Precision@5
109
+ - type: cosine_precision@10
110
+ value: 0.09171428571428569
111
+ name: Cosine Precision@10
112
+ - type: cosine_recall@1
113
+ value: 0.7142857142857143
114
+ name: Cosine Recall@1
115
+ - type: cosine_recall@3
116
+ value: 0.8485714285714285
117
+ name: Cosine Recall@3
118
+ - type: cosine_recall@5
119
+ value: 0.8814285714285715
120
+ name: Cosine Recall@5
121
+ - type: cosine_recall@10
122
+ value: 0.9171428571428571
123
+ name: Cosine Recall@10
124
+ - type: cosine_ndcg@10
125
+ value: 0.8195547708074192
126
+ name: Cosine Ndcg@10
127
+ - type: cosine_mrr@10
128
+ value: 0.7879784580498865
129
+ name: Cosine Mrr@10
130
+ - type: cosine_map@100
131
+ value: 0.791495828863575
132
+ name: Cosine Map@100
133
+ - task:
134
+ type: information-retrieval
135
+ name: Information Retrieval
136
+ dataset:
137
+ name: dim 512
138
+ type: dim_512
139
+ metrics:
140
+ - type: cosine_accuracy@1
141
+ value: 0.7157142857142857
142
+ name: Cosine Accuracy@1
143
+ - type: cosine_accuracy@3
144
+ value: 0.8457142857142858
145
+ name: Cosine Accuracy@3
146
+ - type: cosine_accuracy@5
147
+ value: 0.8814285714285715
148
+ name: Cosine Accuracy@5
149
+ - type: cosine_accuracy@10
150
+ value: 0.92
151
+ name: Cosine Accuracy@10
152
+ - type: cosine_precision@1
153
+ value: 0.7157142857142857
154
+ name: Cosine Precision@1
155
+ - type: cosine_precision@3
156
+ value: 0.2819047619047619
157
+ name: Cosine Precision@3
158
+ - type: cosine_precision@5
159
+ value: 0.17628571428571424
160
+ name: Cosine Precision@5
161
+ - type: cosine_precision@10
162
+ value: 0.09199999999999998
163
+ name: Cosine Precision@10
164
+ - type: cosine_recall@1
165
+ value: 0.7157142857142857
166
+ name: Cosine Recall@1
167
+ - type: cosine_recall@3
168
+ value: 0.8457142857142858
169
+ name: Cosine Recall@3
170
+ - type: cosine_recall@5
171
+ value: 0.8814285714285715
172
+ name: Cosine Recall@5
173
+ - type: cosine_recall@10
174
+ value: 0.92
175
+ name: Cosine Recall@10
176
+ - type: cosine_ndcg@10
177
+ value: 0.8200080507124731
178
+ name: Cosine Ndcg@10
179
+ - type: cosine_mrr@10
180
+ value: 0.7878299319727888
181
+ name: Cosine Mrr@10
182
+ - type: cosine_map@100
183
+ value: 0.7911645774121049
184
+ name: Cosine Map@100
185
+ - task:
186
+ type: information-retrieval
187
+ name: Information Retrieval
188
+ dataset:
189
+ name: dim 256
190
+ type: dim_256
191
+ metrics:
192
+ - type: cosine_accuracy@1
193
+ value: 0.6914285714285714
194
+ name: Cosine Accuracy@1
195
+ - type: cosine_accuracy@3
196
+ value: 0.8471428571428572
197
+ name: Cosine Accuracy@3
198
+ - type: cosine_accuracy@5
199
+ value: 0.88
200
+ name: Cosine Accuracy@5
201
+ - type: cosine_accuracy@10
202
+ value: 0.91
203
+ name: Cosine Accuracy@10
204
+ - type: cosine_precision@1
205
+ value: 0.6914285714285714
206
+ name: Cosine Precision@1
207
+ - type: cosine_precision@3
208
+ value: 0.28238095238095234
209
+ name: Cosine Precision@3
210
+ - type: cosine_precision@5
211
+ value: 0.176
212
+ name: Cosine Precision@5
213
+ - type: cosine_precision@10
214
+ value: 0.09099999999999998
215
+ name: Cosine Precision@10
216
+ - type: cosine_recall@1
217
+ value: 0.6914285714285714
218
+ name: Cosine Recall@1
219
+ - type: cosine_recall@3
220
+ value: 0.8471428571428572
221
+ name: Cosine Recall@3
222
+ - type: cosine_recall@5
223
+ value: 0.88
224
+ name: Cosine Recall@5
225
+ - type: cosine_recall@10
226
+ value: 0.91
227
+ name: Cosine Recall@10
228
+ - type: cosine_ndcg@10
229
+ value: 0.8087696033003087
230
+ name: Cosine Ndcg@10
231
+ - type: cosine_mrr@10
232
+ value: 0.7755997732426303
233
+ name: Cosine Mrr@10
234
+ - type: cosine_map@100
235
+ value: 0.7799208675704249
236
+ name: Cosine Map@100
237
+ - task:
238
+ type: information-retrieval
239
+ name: Information Retrieval
240
+ dataset:
241
+ name: dim 128
242
+ type: dim_128
243
+ metrics:
244
+ - type: cosine_accuracy@1
245
+ value: 0.6914285714285714
246
+ name: Cosine Accuracy@1
247
+ - type: cosine_accuracy@3
248
+ value: 0.83
249
+ name: Cosine Accuracy@3
250
+ - type: cosine_accuracy@5
251
+ value: 0.87
252
+ name: Cosine Accuracy@5
253
+ - type: cosine_accuracy@10
254
+ value: 0.9071428571428571
255
+ name: Cosine Accuracy@10
256
+ - type: cosine_precision@1
257
+ value: 0.6914285714285714
258
+ name: Cosine Precision@1
259
+ - type: cosine_precision@3
260
+ value: 0.27666666666666667
261
+ name: Cosine Precision@3
262
+ - type: cosine_precision@5
263
+ value: 0.174
264
+ name: Cosine Precision@5
265
+ - type: cosine_precision@10
266
+ value: 0.0907142857142857
267
+ name: Cosine Precision@10
268
+ - type: cosine_recall@1
269
+ value: 0.6914285714285714
270
+ name: Cosine Recall@1
271
+ - type: cosine_recall@3
272
+ value: 0.83
273
+ name: Cosine Recall@3
274
+ - type: cosine_recall@5
275
+ value: 0.87
276
+ name: Cosine Recall@5
277
+ - type: cosine_recall@10
278
+ value: 0.9071428571428571
279
+ name: Cosine Recall@10
280
+ - type: cosine_ndcg@10
281
+ value: 0.8024684596621504
282
+ name: Cosine Ndcg@10
283
+ - type: cosine_mrr@10
284
+ value: 0.7686116780045347
285
+ name: Cosine Mrr@10
286
+ - type: cosine_map@100
287
+ value: 0.7729258054107728
288
+ name: Cosine Map@100
289
+ - task:
290
+ type: information-retrieval
291
+ name: Information Retrieval
292
+ dataset:
293
+ name: dim 64
294
+ type: dim_64
295
+ metrics:
296
+ - type: cosine_accuracy@1
297
+ value: 0.6585714285714286
298
+ name: Cosine Accuracy@1
299
+ - type: cosine_accuracy@3
300
+ value: 0.8028571428571428
301
+ name: Cosine Accuracy@3
302
+ - type: cosine_accuracy@5
303
+ value: 0.8357142857142857
304
+ name: Cosine Accuracy@5
305
+ - type: cosine_accuracy@10
306
+ value: 0.8828571428571429
307
+ name: Cosine Accuracy@10
308
+ - type: cosine_precision@1
309
+ value: 0.6585714285714286
310
+ name: Cosine Precision@1
311
+ - type: cosine_precision@3
312
+ value: 0.2676190476190476
313
+ name: Cosine Precision@3
314
+ - type: cosine_precision@5
315
+ value: 0.1671428571428571
316
+ name: Cosine Precision@5
317
+ - type: cosine_precision@10
318
+ value: 0.08828571428571429
319
+ name: Cosine Precision@10
320
+ - type: cosine_recall@1
321
+ value: 0.6585714285714286
322
+ name: Cosine Recall@1
323
+ - type: cosine_recall@3
324
+ value: 0.8028571428571428
325
+ name: Cosine Recall@3
326
+ - type: cosine_recall@5
327
+ value: 0.8357142857142857
328
+ name: Cosine Recall@5
329
+ - type: cosine_recall@10
330
+ value: 0.8828571428571429
331
+ name: Cosine Recall@10
332
+ - type: cosine_ndcg@10
333
+ value: 0.7735846622621076
334
+ name: Cosine Ndcg@10
335
+ - type: cosine_mrr@10
336
+ value: 0.738378684807256
337
+ name: Cosine Mrr@10
338
+ - type: cosine_map@100
339
+ value: 0.7433829659777168
340
+ name: Cosine Map@100
341
+ ---
342
+
343
+ # BGE base Financial Matryoshka
344
+
345
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
346
+
347
+ ## Model Details
348
+
349
+ ### Model Description
350
+ - **Model Type:** Sentence Transformer
351
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
352
+ - **Maximum Sequence Length:** 512 tokens
353
+ - **Output Dimensionality:** 768 tokens
354
+ - **Similarity Function:** Cosine Similarity
355
+ - **Training Dataset:**
356
+ - json
357
+ - **Language:** en
358
+ - **License:** apache-2.0
359
+
360
+ ### Model Sources
361
+
362
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
363
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
364
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
365
+
366
+ ### Full Model Architecture
367
+
368
+ ```
369
+ SentenceTransformer(
370
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
371
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
372
+ (2): Normalize()
373
+ )
374
+ ```
375
+
376
+ ## Usage
377
+
378
+ ### Direct Usage (Sentence Transformers)
379
+
380
+ First install the Sentence Transformers library:
381
+
382
+ ```bash
383
+ pip install -U sentence-transformers
384
+ ```
385
+
386
+ Then you can load this model and run inference.
387
+ ```python
388
+ from sentence_transformers import SentenceTransformer
389
+
390
+ # Download from the 🤗 Hub
391
+ model = SentenceTransformer("girijesh/bge-base-financial-matryoshka")
392
+ # Run inference
393
+ sentences = [
394
+ 'We make our branded beverage products available to consumers throughout the world through our network of independent bottling partners, distributors, wholesalers and retailers as well as our consolidated bottling and distribution operations.',
395
+ 'How does The Coca-Cola Company distribute its beverage products globally?',
396
+ "What accounting method is predominantly used to determine inventory costs in the Company's supermarket divisions before LIFO adjustments?",
397
+ ]
398
+ embeddings = model.encode(sentences)
399
+ print(embeddings.shape)
400
+ # [3, 768]
401
+
402
+ # Get the similarity scores for the embeddings
403
+ similarities = model.similarity(embeddings, embeddings)
404
+ print(similarities.shape)
405
+ # [3, 3]
406
+ ```
407
+
408
+ <!--
409
+ ### Direct Usage (Transformers)
410
+
411
+ <details><summary>Click to see the direct usage in Transformers</summary>
412
+
413
+ </details>
414
+ -->
415
+
416
+ <!--
417
+ ### Downstream Usage (Sentence Transformers)
418
+
419
+ You can finetune this model on your own dataset.
420
+
421
+ <details><summary>Click to expand</summary>
422
+
423
+ </details>
424
+ -->
425
+
426
+ <!--
427
+ ### Out-of-Scope Use
428
+
429
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
430
+ -->
431
+
432
+ ## Evaluation
433
+
434
+ ### Metrics
435
+
436
+ #### Information Retrieval
437
+ * Dataset: `dim_768`
438
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
439
+
440
+ | Metric | Value |
441
+ |:--------------------|:-----------|
442
+ | cosine_accuracy@1 | 0.7143 |
443
+ | cosine_accuracy@3 | 0.8486 |
444
+ | cosine_accuracy@5 | 0.8814 |
445
+ | cosine_accuracy@10 | 0.9171 |
446
+ | cosine_precision@1 | 0.7143 |
447
+ | cosine_precision@3 | 0.2829 |
448
+ | cosine_precision@5 | 0.1763 |
449
+ | cosine_precision@10 | 0.0917 |
450
+ | cosine_recall@1 | 0.7143 |
451
+ | cosine_recall@3 | 0.8486 |
452
+ | cosine_recall@5 | 0.8814 |
453
+ | cosine_recall@10 | 0.9171 |
454
+ | cosine_ndcg@10 | 0.8196 |
455
+ | cosine_mrr@10 | 0.788 |
456
+ | **cosine_map@100** | **0.7915** |
457
+
458
+ #### Information Retrieval
459
+ * Dataset: `dim_512`
460
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
461
+
462
+ | Metric | Value |
463
+ |:--------------------|:-----------|
464
+ | cosine_accuracy@1 | 0.7157 |
465
+ | cosine_accuracy@3 | 0.8457 |
466
+ | cosine_accuracy@5 | 0.8814 |
467
+ | cosine_accuracy@10 | 0.92 |
468
+ | cosine_precision@1 | 0.7157 |
469
+ | cosine_precision@3 | 0.2819 |
470
+ | cosine_precision@5 | 0.1763 |
471
+ | cosine_precision@10 | 0.092 |
472
+ | cosine_recall@1 | 0.7157 |
473
+ | cosine_recall@3 | 0.8457 |
474
+ | cosine_recall@5 | 0.8814 |
475
+ | cosine_recall@10 | 0.92 |
476
+ | cosine_ndcg@10 | 0.82 |
477
+ | cosine_mrr@10 | 0.7878 |
478
+ | **cosine_map@100** | **0.7912** |
479
+
480
+ #### Information Retrieval
481
+ * Dataset: `dim_256`
482
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
483
+
484
+ | Metric | Value |
485
+ |:--------------------|:-----------|
486
+ | cosine_accuracy@1 | 0.6914 |
487
+ | cosine_accuracy@3 | 0.8471 |
488
+ | cosine_accuracy@5 | 0.88 |
489
+ | cosine_accuracy@10 | 0.91 |
490
+ | cosine_precision@1 | 0.6914 |
491
+ | cosine_precision@3 | 0.2824 |
492
+ | cosine_precision@5 | 0.176 |
493
+ | cosine_precision@10 | 0.091 |
494
+ | cosine_recall@1 | 0.6914 |
495
+ | cosine_recall@3 | 0.8471 |
496
+ | cosine_recall@5 | 0.88 |
497
+ | cosine_recall@10 | 0.91 |
498
+ | cosine_ndcg@10 | 0.8088 |
499
+ | cosine_mrr@10 | 0.7756 |
500
+ | **cosine_map@100** | **0.7799** |
501
+
502
+ #### Information Retrieval
503
+ * Dataset: `dim_128`
504
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
505
+
506
+ | Metric | Value |
507
+ |:--------------------|:-----------|
508
+ | cosine_accuracy@1 | 0.6914 |
509
+ | cosine_accuracy@3 | 0.83 |
510
+ | cosine_accuracy@5 | 0.87 |
511
+ | cosine_accuracy@10 | 0.9071 |
512
+ | cosine_precision@1 | 0.6914 |
513
+ | cosine_precision@3 | 0.2767 |
514
+ | cosine_precision@5 | 0.174 |
515
+ | cosine_precision@10 | 0.0907 |
516
+ | cosine_recall@1 | 0.6914 |
517
+ | cosine_recall@3 | 0.83 |
518
+ | cosine_recall@5 | 0.87 |
519
+ | cosine_recall@10 | 0.9071 |
520
+ | cosine_ndcg@10 | 0.8025 |
521
+ | cosine_mrr@10 | 0.7686 |
522
+ | **cosine_map@100** | **0.7729** |
523
+
524
+ #### Information Retrieval
525
+ * Dataset: `dim_64`
526
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
527
+
528
+ | Metric | Value |
529
+ |:--------------------|:-----------|
530
+ | cosine_accuracy@1 | 0.6586 |
531
+ | cosine_accuracy@3 | 0.8029 |
532
+ | cosine_accuracy@5 | 0.8357 |
533
+ | cosine_accuracy@10 | 0.8829 |
534
+ | cosine_precision@1 | 0.6586 |
535
+ | cosine_precision@3 | 0.2676 |
536
+ | cosine_precision@5 | 0.1671 |
537
+ | cosine_precision@10 | 0.0883 |
538
+ | cosine_recall@1 | 0.6586 |
539
+ | cosine_recall@3 | 0.8029 |
540
+ | cosine_recall@5 | 0.8357 |
541
+ | cosine_recall@10 | 0.8829 |
542
+ | cosine_ndcg@10 | 0.7736 |
543
+ | cosine_mrr@10 | 0.7384 |
544
+ | **cosine_map@100** | **0.7434** |
545
+
546
+ <!--
547
+ ## Bias, Risks and Limitations
548
+
549
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
550
+ -->
551
+
552
+ <!--
553
+ ### Recommendations
554
+
555
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
556
+ -->
557
+
558
+ ## Training Details
559
+
560
+ ### Training Dataset
561
+
562
+ #### json
563
+
564
+ * Dataset: json
565
+ * Size: 6,300 training samples
566
+ * Columns: <code>positive</code> and <code>anchor</code>
567
+ * Approximate statistics based on the first 1000 samples:
568
+ | | positive | anchor |
569
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
570
+ | type | string | string |
571
+ | details | <ul><li>min: 8 tokens</li><li>mean: 44.98 tokens</li><li>max: 439 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.31 tokens</li><li>max: 45 tokens</li></ul> |
572
+ * Samples:
573
+ | positive | anchor |
574
+ ||:--------------------------------------------------------------------------------------------------------------------------|
575
+ | <code>Change in control events potentially triggering benefits under the CIC Plan and Mr. Begor’s agreement would occur, subject to certain exceptions, if (1) any person acquires 20% or more of our voting stock; (2) upon a merger or other business combination, our shareholders receive less than two-thirds of the common stock and combined voting power of the new company; (3) members of the current Board of Directors ceasing to constitute a majority of the Board of Directors, except for new directors that are regularly elected; (4) we sell or otherwise dispose of all or substantially all of our assets; or (5) we liquidate or dissolve.</code> | <code>What events potentially trigger benefits under Mark W. Begor's change in control agreement and the CIC Plan?</code> |
576
+ | <code>The growth in marketplace revenue was primarily due to the impact of the pricing update to increase our seller transaction fee for the Etsy marketplace from 5% to 6.5% beginning on April 11, 2022, and an increase in foreign currency payments, which we earn an additional transaction fee on, in the year ended December 31, 2023.</code> | <code>What drove the growth in marketplace revenue for the year ended December 31, 2023?</code> |
577
+ | <code>We are focused on ensuring that we efficiently allocate our resources to the areas with the highest potential for profitable growth. ... The uncertain macroeconomic environment in many of these markets is expected to continue and we aim to ensure our investments in these international markets are appropriate relative to the size of the opportunity.</code> | <code>What are Hershey's goals for international expansion and how are they being approached?</code> |
578
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
579
+ ```json
580
+ {
581
+ "loss": "MultipleNegativesRankingLoss",
582
+ "matryoshka_dims": [
583
+ 768,
584
+ 512,
585
+ 256,
586
+ 128,
587
+ 64
588
+ ],
589
+ "matryoshka_weights": [
590
+ 1,
591
+ 1,
592
+ 1,
593
+ 1,
594
+ 1
595
+ ],
596
+ "n_dims_per_step": -1
597
+ }
598
+ ```
599
+
600
+ ### Training Hyperparameters
601
+ #### Non-Default Hyperparameters
602
+
603
+ - `eval_strategy`: epoch
604
+ - `per_device_train_batch_size`: 32
605
+ - `per_device_eval_batch_size`: 16
606
+ - `gradient_accumulation_steps`: 16
607
+ - `learning_rate`: 2e-05
608
+ - `num_train_epochs`: 4
609
+ - `lr_scheduler_type`: cosine
610
+ - `warmup_ratio`: 0.1
611
+ - `bf16`: True
612
+ - `tf32`: True
613
+ - `load_best_model_at_end`: True
614
+ - `optim`: adamw_torch_fused
615
+ - `batch_sampler`: no_duplicates
616
+
617
+ #### All Hyperparameters
618
+ <details><summary>Click to expand</summary>
619
+
620
+ - `overwrite_output_dir`: False
621
+ - `do_predict`: False
622
+ - `eval_strategy`: epoch
623
+ - `prediction_loss_only`: True
624
+ - `per_device_train_batch_size`: 32
625
+ - `per_device_eval_batch_size`: 16
626
+ - `per_gpu_train_batch_size`: None
627
+ - `per_gpu_eval_batch_size`: None
628
+ - `gradient_accumulation_steps`: 16
629
+ - `eval_accumulation_steps`: None
630
+ - `learning_rate`: 2e-05
631
+ - `weight_decay`: 0.0
632
+ - `adam_beta1`: 0.9
633
+ - `adam_beta2`: 0.999
634
+ - `adam_epsilon`: 1e-08
635
+ - `max_grad_norm`: 1.0
636
+ - `num_train_epochs`: 4
637
+ - `max_steps`: -1
638
+ - `lr_scheduler_type`: cosine
639
+ - `lr_scheduler_kwargs`: {}
640
+ - `warmup_ratio`: 0.1
641
+ - `warmup_steps`: 0
642
+ - `log_level`: passive
643
+ - `log_level_replica`: warning
644
+ - `log_on_each_node`: True
645
+ - `logging_nan_inf_filter`: True
646
+ - `save_safetensors`: True
647
+ - `save_on_each_node`: False
648
+ - `save_only_model`: False
649
+ - `restore_callback_states_from_checkpoint`: False
650
+ - `no_cuda`: False
651
+ - `use_cpu`: False
652
+ - `use_mps_device`: False
653
+ - `seed`: 42
654
+ - `data_seed`: None
655
+ - `jit_mode_eval`: False
656
+ - `use_ipex`: False
657
+ - `bf16`: True
658
+ - `fp16`: False
659
+ - `fp16_opt_level`: O1
660
+ - `half_precision_backend`: auto
661
+ - `bf16_full_eval`: False
662
+ - `fp16_full_eval`: False
663
+ - `tf32`: True
664
+ - `local_rank`: 0
665
+ - `ddp_backend`: None
666
+ - `tpu_num_cores`: None
667
+ - `tpu_metrics_debug`: False
668
+ - `debug`: []
669
+ - `dataloader_drop_last`: False
670
+ - `dataloader_num_workers`: 0
671
+ - `dataloader_prefetch_factor`: None
672
+ - `past_index`: -1
673
+ - `disable_tqdm`: False
674
+ - `remove_unused_columns`: True
675
+ - `label_names`: None
676
+ - `load_best_model_at_end`: True
677
+ - `ignore_data_skip`: False
678
+ - `fsdp`: []
679
+ - `fsdp_min_num_params`: 0
680
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
681
+ - `fsdp_transformer_layer_cls_to_wrap`: None
682
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
683
+ - `deepspeed`: None
684
+ - `label_smoothing_factor`: 0.0
685
+ - `optim`: adamw_torch_fused
686
+ - `optim_args`: None
687
+ - `adafactor`: False
688
+ - `group_by_length`: False
689
+ - `length_column_name`: length
690
+ - `ddp_find_unused_parameters`: None
691
+ - `ddp_bucket_cap_mb`: None
692
+ - `ddp_broadcast_buffers`: False
693
+ - `dataloader_pin_memory`: True
694
+ - `dataloader_persistent_workers`: False
695
+ - `skip_memory_metrics`: True
696
+ - `use_legacy_prediction_loop`: False
697
+ - `push_to_hub`: False
698
+ - `resume_from_checkpoint`: None
699
+ - `hub_model_id`: None
700
+ - `hub_strategy`: every_save
701
+ - `hub_private_repo`: False
702
+ - `hub_always_push`: False
703
+ - `gradient_checkpointing`: False
704
+ - `gradient_checkpointing_kwargs`: None
705
+ - `include_inputs_for_metrics`: False
706
+ - `eval_do_concat_batches`: True
707
+ - `fp16_backend`: auto
708
+ - `push_to_hub_model_id`: None
709
+ - `push_to_hub_organization`: None
710
+ - `mp_parameters`:
711
+ - `auto_find_batch_size`: False
712
+ - `full_determinism`: False
713
+ - `torchdynamo`: None
714
+ - `ray_scope`: last
715
+ - `ddp_timeout`: 1800
716
+ - `torch_compile`: False
717
+ - `torch_compile_backend`: None
718
+ - `torch_compile_mode`: None
719
+ - `dispatch_batches`: None
720
+ - `split_batches`: None
721
+ - `include_tokens_per_second`: False
722
+ - `include_num_input_tokens_seen`: False
723
+ - `neftune_noise_alpha`: None
724
+ - `optim_target_modules`: None
725
+ - `batch_eval_metrics`: False
726
+ - `batch_sampler`: no_duplicates
727
+ - `multi_dataset_batch_sampler`: proportional
728
+
729
+ </details>
730
+
731
+ ### Training Logs
732
+ | Epoch | Step | Training Loss | dim_768_cosine_map@100 | dim_512_cosine_map@100 | dim_256_cosine_map@100 | dim_128_cosine_map@100 | dim_64_cosine_map@100 |
733
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
734
+ | 0.9697 | 6 | - | 0.7527 | 0.7516 | 0.7454 | 0.7253 | 0.6808 |
735
+ | 1.6162 | 10 | 2.3351 | - | - | - | - | - |
736
+ | 1.9394 | 12 | - | 0.7740 | 0.7699 | 0.7707 | 0.7474 | 0.7188 |
737
+ | 2.9091 | 18 | - | 0.7784 | 0.7790 | 0.7735 | 0.7575 | 0.7275 |
738
+ | 3.2323 | 20 | 1.0519 | - | - | - | - | - |
739
+ | **3.8788** | **24** | **-** | **0.7818** | **0.7784** | **0.7763** | **0.7581** | **0.7293** |
740
+ | 0.9697 | 6 | - | 0.7836 | 0.7826 | 0.7817 | 0.7664 | 0.7353 |
741
+ | 1.6162 | 10 | 0.8132 | - | - | - | - | - |
742
+ | 1.9394 | 12 | - | 0.7887 | 0.7887 | 0.7837 | 0.7714 | 0.7409 |
743
+ | 2.9091 | 18 | - | 0.7897 | 0.7902 | 0.7798 | 0.7721 | 0.7410 |
744
+ | 3.2323 | 20 | 0.6098 | - | - | - | - | - |
745
+ | **3.8788** | **24** | **-** | **0.7915** | **0.7912** | **0.7799** | **0.7729** | **0.7434** |
746
+
747
+ * The bold row denotes the saved checkpoint.
748
+
749
+ ### Framework Versions
750
+ - Python: 3.10.12
751
+ - Sentence Transformers: 3.2.1
752
+ - Transformers: 4.41.2
753
+ - PyTorch: 2.1.2+cu121
754
+ - Accelerate: 1.0.1
755
+ - Datasets: 2.19.1
756
+ - Tokenizers: 0.19.1
757
+
758
+ ## Citation
759
+
760
+ ### BibTeX
761
+
762
+ #### Sentence Transformers
763
+ ```bibtex
764
+ @inproceedings{reimers-2019-sentence-bert,
765
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
766
+ author = "Reimers, Nils and Gurevych, Iryna",
767
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
768
+ month = "11",
769
+ year = "2019",
770
+ publisher = "Association for Computational Linguistics",
771
+ url = "https://arxiv.org/abs/1908.10084",
772
+ }
773
+ ```
774
+
775
+ #### MatryoshkaLoss
776
+ ```bibtex
777
+ @misc{kusupati2024matryoshka,
778
+ title={Matryoshka Representation Learning},
779
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
780
+ year={2024},
781
+ eprint={2205.13147},
782
+ archivePrefix={arXiv},
783
+ primaryClass={cs.LG}
784
+ }
785
+ ```
786
+
787
+ #### MultipleNegativesRankingLoss
788
+ ```bibtex
789
+ @misc{henderson2017efficient,
790
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
791
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
792
+ year={2017},
793
+ eprint={1705.00652},
794
+ archivePrefix={arXiv},
795
+ primaryClass={cs.CL}
796
+ }
797
+ ```
798
+
799
+ <!--
800
+ ## Glossary
801
+
802
+ *Clearly define terms in order to be accessible across audiences.*
803
+ -->
804
+
805
+ <!--
806
+ ## Model Card Authors
807
+
808
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
809
+ -->
810
+
811
+ <!--
812
+ ## Model Card Contact
813
+
814
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
815
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01bf7e520ef93bfb6c39b341b76c78c8b410fef4265b3f8c3dfa2ba507ed8f0d
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff