Hritikmore commited on
Commit
8bd47ae
1 Parent(s): bf97698

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,810 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: sentence-transformers
6
+ tags:
7
+ - sentence-transformers
8
+ - sentence-similarity
9
+ - feature-extraction
10
+ - generated_from_trainer
11
+ - dataset_size:6300
12
+ - loss:MatryoshkaLoss
13
+ - loss:MultipleNegativesRankingLoss
14
+ base_model: BAAI/bge-base-en-v1.5
15
+ datasets: []
16
+ metrics:
17
+ - cosine_accuracy@1
18
+ - cosine_accuracy@3
19
+ - cosine_accuracy@5
20
+ - cosine_accuracy@10
21
+ - cosine_precision@1
22
+ - cosine_precision@3
23
+ - cosine_precision@5
24
+ - cosine_precision@10
25
+ - cosine_recall@1
26
+ - cosine_recall@3
27
+ - cosine_recall@5
28
+ - cosine_recall@10
29
+ - cosine_ndcg@10
30
+ - cosine_mrr@10
31
+ - cosine_map@100
32
+ widget:
33
+ - source_sentence: 'Forward-looking statements may appear throughout this report,
34
+ including without limitation, the following sections: “Management''s Discussion
35
+ and Analysis,” “Risk Factors” and "Notes 4, 8 and 13 to the Consolidated Financial
36
+ Statements."'
37
+ sentences:
38
+ - How does a one-year adjustment in the 2023 expected retirement age for U.S. plans
39
+ affect income before income taxes?
40
+ - Which sections of the report might contain forward-looking statements according
41
+ to the text?
42
+ - What was the allowance for loan and lease losses at Bank of America as of December
43
+ 31, 2022?
44
+ - source_sentence: Interest income | $ | 267 | | | $ | 29 | | $ | 238 | | 821 | %
45
+ sentences:
46
+ - What are the key risks and uncertainties mentioned that could impact the validity
47
+ of DaVita's forward-looking statements?
48
+ - How did the interest income change in fiscal year 2023 compared to the previous
49
+ year?
50
+ - What are some of the main competitive factors in the interactive entertainment
51
+ industry?
52
+ - source_sentence: Veklury received U.S. Food and Drug Administration (FDA) and European
53
+ Commission (EC) approval to treat COVID-19 in patients with mild to severe hepatic
54
+ impairment and those with severe renal impairment, including those on dialysis.
55
+ sentences:
56
+ - What significant regulatory approvals did Gilead's Veklury receive?
57
+ - What type of information is included under the caption "Legal Proceedings" in
58
+ an Annual Report on Form 10-K?
59
+ - What was the cash change related to changes in operating assets and liabilities,
60
+ including working capital, in 2022?
61
+ - source_sentence: The net value of property, plant, and equipment for the consolidated
62
+ group increased from $12,028 million in 2022 to $12,680 million in 2023.
63
+ sentences:
64
+ - What steps does the company plan to take next after discussing data with regulators
65
+ and key opinion leaders?
66
+ - How does the company manage fluctuations in foreign currency exchange rates?
67
+ - What was the increase in property, plant, and equipment net value from 2022 to
68
+ 2023 for the consolidated group?
69
+ - source_sentence: The effective duration of our total AFS and HTM investments securities
70
+ as of December 31, 2023 is approximately 3.9 years.
71
+ sentences:
72
+ - What are the effective durations of the total Available-for-Sale (AFS) and Held-to-Maturity
73
+ (HTM) investment securities as of December 31, 2023?
74
+ - What was the net unit growth percentage for Hilton in the year ended December
75
+ 31, 2023?
76
+ - What does goodwill represent in accounting?
77
+ pipeline_tag: sentence-similarity
78
+ model-index:
79
+ - name: BGE base Financial Matryoshka
80
+ results:
81
+ - task:
82
+ type: information-retrieval
83
+ name: Information Retrieval
84
+ dataset:
85
+ name: dim 768
86
+ type: dim_768
87
+ metrics:
88
+ - type: cosine_accuracy@1
89
+ value: 0.7285714285714285
90
+ name: Cosine Accuracy@1
91
+ - type: cosine_accuracy@3
92
+ value: 0.8485714285714285
93
+ name: Cosine Accuracy@3
94
+ - type: cosine_accuracy@5
95
+ value: 0.8885714285714286
96
+ name: Cosine Accuracy@5
97
+ - type: cosine_accuracy@10
98
+ value: 0.9214285714285714
99
+ name: Cosine Accuracy@10
100
+ - type: cosine_precision@1
101
+ value: 0.7285714285714285
102
+ name: Cosine Precision@1
103
+ - type: cosine_precision@3
104
+ value: 0.28285714285714286
105
+ name: Cosine Precision@3
106
+ - type: cosine_precision@5
107
+ value: 0.17771428571428569
108
+ name: Cosine Precision@5
109
+ - type: cosine_precision@10
110
+ value: 0.09214285714285712
111
+ name: Cosine Precision@10
112
+ - type: cosine_recall@1
113
+ value: 0.7285714285714285
114
+ name: Cosine Recall@1
115
+ - type: cosine_recall@3
116
+ value: 0.8485714285714285
117
+ name: Cosine Recall@3
118
+ - type: cosine_recall@5
119
+ value: 0.8885714285714286
120
+ name: Cosine Recall@5
121
+ - type: cosine_recall@10
122
+ value: 0.9214285714285714
123
+ name: Cosine Recall@10
124
+ - type: cosine_ndcg@10
125
+ value: 0.8274202252845575
126
+ name: Cosine Ndcg@10
127
+ - type: cosine_mrr@10
128
+ value: 0.7969903628117911
129
+ name: Cosine Mrr@10
130
+ - type: cosine_map@100
131
+ value: 0.7998523047098398
132
+ name: Cosine Map@100
133
+ - task:
134
+ type: information-retrieval
135
+ name: Information Retrieval
136
+ dataset:
137
+ name: dim 512
138
+ type: dim_512
139
+ metrics:
140
+ - type: cosine_accuracy@1
141
+ value: 0.72
142
+ name: Cosine Accuracy@1
143
+ - type: cosine_accuracy@3
144
+ value: 0.8442857142857143
145
+ name: Cosine Accuracy@3
146
+ - type: cosine_accuracy@5
147
+ value: 0.8785714285714286
148
+ name: Cosine Accuracy@5
149
+ - type: cosine_accuracy@10
150
+ value: 0.92
151
+ name: Cosine Accuracy@10
152
+ - type: cosine_precision@1
153
+ value: 0.72
154
+ name: Cosine Precision@1
155
+ - type: cosine_precision@3
156
+ value: 0.2814285714285714
157
+ name: Cosine Precision@3
158
+ - type: cosine_precision@5
159
+ value: 0.17571428571428568
160
+ name: Cosine Precision@5
161
+ - type: cosine_precision@10
162
+ value: 0.09199999999999998
163
+ name: Cosine Precision@10
164
+ - type: cosine_recall@1
165
+ value: 0.72
166
+ name: Cosine Recall@1
167
+ - type: cosine_recall@3
168
+ value: 0.8442857142857143
169
+ name: Cosine Recall@3
170
+ - type: cosine_recall@5
171
+ value: 0.8785714285714286
172
+ name: Cosine Recall@5
173
+ - type: cosine_recall@10
174
+ value: 0.92
175
+ name: Cosine Recall@10
176
+ - type: cosine_ndcg@10
177
+ value: 0.8213589464095679
178
+ name: Cosine Ndcg@10
179
+ - type: cosine_mrr@10
180
+ value: 0.7896825396825394
181
+ name: Cosine Mrr@10
182
+ - type: cosine_map@100
183
+ value: 0.7926726035572866
184
+ name: Cosine Map@100
185
+ - task:
186
+ type: information-retrieval
187
+ name: Information Retrieval
188
+ dataset:
189
+ name: dim 256
190
+ type: dim_256
191
+ metrics:
192
+ - type: cosine_accuracy@1
193
+ value: 0.7214285714285714
194
+ name: Cosine Accuracy@1
195
+ - type: cosine_accuracy@3
196
+ value: 0.8385714285714285
197
+ name: Cosine Accuracy@3
198
+ - type: cosine_accuracy@5
199
+ value: 0.8742857142857143
200
+ name: Cosine Accuracy@5
201
+ - type: cosine_accuracy@10
202
+ value: 0.9128571428571428
203
+ name: Cosine Accuracy@10
204
+ - type: cosine_precision@1
205
+ value: 0.7214285714285714
206
+ name: Cosine Precision@1
207
+ - type: cosine_precision@3
208
+ value: 0.27952380952380956
209
+ name: Cosine Precision@3
210
+ - type: cosine_precision@5
211
+ value: 0.17485714285714282
212
+ name: Cosine Precision@5
213
+ - type: cosine_precision@10
214
+ value: 0.09128571428571428
215
+ name: Cosine Precision@10
216
+ - type: cosine_recall@1
217
+ value: 0.7214285714285714
218
+ name: Cosine Recall@1
219
+ - type: cosine_recall@3
220
+ value: 0.8385714285714285
221
+ name: Cosine Recall@3
222
+ - type: cosine_recall@5
223
+ value: 0.8742857142857143
224
+ name: Cosine Recall@5
225
+ - type: cosine_recall@10
226
+ value: 0.9128571428571428
227
+ name: Cosine Recall@10
228
+ - type: cosine_ndcg@10
229
+ value: 0.8190844047519252
230
+ name: Cosine Ndcg@10
231
+ - type: cosine_mrr@10
232
+ value: 0.7888673469387758
233
+ name: Cosine Mrr@10
234
+ - type: cosine_map@100
235
+ value: 0.7921199469128796
236
+ name: Cosine Map@100
237
+ - task:
238
+ type: information-retrieval
239
+ name: Information Retrieval
240
+ dataset:
241
+ name: dim 128
242
+ type: dim_128
243
+ metrics:
244
+ - type: cosine_accuracy@1
245
+ value: 0.6971428571428572
246
+ name: Cosine Accuracy@1
247
+ - type: cosine_accuracy@3
248
+ value: 0.8328571428571429
249
+ name: Cosine Accuracy@3
250
+ - type: cosine_accuracy@5
251
+ value: 0.8671428571428571
252
+ name: Cosine Accuracy@5
253
+ - type: cosine_accuracy@10
254
+ value: 0.9057142857142857
255
+ name: Cosine Accuracy@10
256
+ - type: cosine_precision@1
257
+ value: 0.6971428571428572
258
+ name: Cosine Precision@1
259
+ - type: cosine_precision@3
260
+ value: 0.2776190476190476
261
+ name: Cosine Precision@3
262
+ - type: cosine_precision@5
263
+ value: 0.1734285714285714
264
+ name: Cosine Precision@5
265
+ - type: cosine_precision@10
266
+ value: 0.09057142857142855
267
+ name: Cosine Precision@10
268
+ - type: cosine_recall@1
269
+ value: 0.6971428571428572
270
+ name: Cosine Recall@1
271
+ - type: cosine_recall@3
272
+ value: 0.8328571428571429
273
+ name: Cosine Recall@3
274
+ - type: cosine_recall@5
275
+ value: 0.8671428571428571
276
+ name: Cosine Recall@5
277
+ - type: cosine_recall@10
278
+ value: 0.9057142857142857
279
+ name: Cosine Recall@10
280
+ - type: cosine_ndcg@10
281
+ value: 0.8054254319689889
282
+ name: Cosine Ndcg@10
283
+ - type: cosine_mrr@10
284
+ value: 0.7729421768707481
285
+ name: Cosine Mrr@10
286
+ - type: cosine_map@100
287
+ value: 0.776216648701894
288
+ name: Cosine Map@100
289
+ - task:
290
+ type: information-retrieval
291
+ name: Information Retrieval
292
+ dataset:
293
+ name: dim 64
294
+ type: dim_64
295
+ metrics:
296
+ - type: cosine_accuracy@1
297
+ value: 0.6614285714285715
298
+ name: Cosine Accuracy@1
299
+ - type: cosine_accuracy@3
300
+ value: 0.7985714285714286
301
+ name: Cosine Accuracy@3
302
+ - type: cosine_accuracy@5
303
+ value: 0.8442857142857143
304
+ name: Cosine Accuracy@5
305
+ - type: cosine_accuracy@10
306
+ value: 0.8814285714285715
307
+ name: Cosine Accuracy@10
308
+ - type: cosine_precision@1
309
+ value: 0.6614285714285715
310
+ name: Cosine Precision@1
311
+ - type: cosine_precision@3
312
+ value: 0.26619047619047614
313
+ name: Cosine Precision@3
314
+ - type: cosine_precision@5
315
+ value: 0.16885714285714284
316
+ name: Cosine Precision@5
317
+ - type: cosine_precision@10
318
+ value: 0.08814285714285712
319
+ name: Cosine Precision@10
320
+ - type: cosine_recall@1
321
+ value: 0.6614285714285715
322
+ name: Cosine Recall@1
323
+ - type: cosine_recall@3
324
+ value: 0.7985714285714286
325
+ name: Cosine Recall@3
326
+ - type: cosine_recall@5
327
+ value: 0.8442857142857143
328
+ name: Cosine Recall@5
329
+ - type: cosine_recall@10
330
+ value: 0.8814285714285715
331
+ name: Cosine Recall@10
332
+ - type: cosine_ndcg@10
333
+ value: 0.7728992637054746
334
+ name: Cosine Ndcg@10
335
+ - type: cosine_mrr@10
336
+ value: 0.737815759637188
337
+ name: Cosine Mrr@10
338
+ - type: cosine_map@100
339
+ value: 0.7417951294330247
340
+ name: Cosine Map@100
341
+ ---
342
+
343
+ # BGE base Financial Matryoshka
344
+
345
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
346
+
347
+ ## Model Details
348
+
349
+ ### Model Description
350
+ - **Model Type:** Sentence Transformer
351
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
352
+ - **Maximum Sequence Length:** 512 tokens
353
+ - **Output Dimensionality:** 768 tokens
354
+ - **Similarity Function:** Cosine Similarity
355
+ <!-- - **Training Dataset:** Unknown -->
356
+ - **Language:** en
357
+ - **License:** apache-2.0
358
+
359
+ ### Model Sources
360
+
361
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
362
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
363
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
364
+
365
+ ### Full Model Architecture
366
+
367
+ ```
368
+ SentenceTransformer(
369
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
370
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
371
+ (2): Normalize()
372
+ )
373
+ ```
374
+
375
+ ## Usage
376
+
377
+ ### Direct Usage (Sentence Transformers)
378
+
379
+ First install the Sentence Transformers library:
380
+
381
+ ```bash
382
+ pip install -U sentence-transformers
383
+ ```
384
+
385
+ Then you can load this model and run inference.
386
+ ```python
387
+ from sentence_transformers import SentenceTransformer
388
+
389
+ # Download from the 🤗 Hub
390
+ model = SentenceTransformer("Hritikmore/bge-base-financial-matryoshka")
391
+ # Run inference
392
+ sentences = [
393
+ 'The effective duration of our total AFS and HTM investments securities as of December 31, 2023 is approximately 3.9 years.',
394
+ 'What are the effective durations of the total Available-for-Sale (AFS) and Held-to-Maturity (HTM) investment securities as of December 31, 2023?',
395
+ 'What was the net unit growth percentage for Hilton in the year ended December 31, 2023?',
396
+ ]
397
+ embeddings = model.encode(sentences)
398
+ print(embeddings.shape)
399
+ # [3, 768]
400
+
401
+ # Get the similarity scores for the embeddings
402
+ similarities = model.similarity(embeddings, embeddings)
403
+ print(similarities.shape)
404
+ # [3, 3]
405
+ ```
406
+
407
+ <!--
408
+ ### Direct Usage (Transformers)
409
+
410
+ <details><summary>Click to see the direct usage in Transformers</summary>
411
+
412
+ </details>
413
+ -->
414
+
415
+ <!--
416
+ ### Downstream Usage (Sentence Transformers)
417
+
418
+ You can finetune this model on your own dataset.
419
+
420
+ <details><summary>Click to expand</summary>
421
+
422
+ </details>
423
+ -->
424
+
425
+ <!--
426
+ ### Out-of-Scope Use
427
+
428
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
429
+ -->
430
+
431
+ ## Evaluation
432
+
433
+ ### Metrics
434
+
435
+ #### Information Retrieval
436
+ * Dataset: `dim_768`
437
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
438
+
439
+ | Metric | Value |
440
+ |:--------------------|:-----------|
441
+ | cosine_accuracy@1 | 0.7286 |
442
+ | cosine_accuracy@3 | 0.8486 |
443
+ | cosine_accuracy@5 | 0.8886 |
444
+ | cosine_accuracy@10 | 0.9214 |
445
+ | cosine_precision@1 | 0.7286 |
446
+ | cosine_precision@3 | 0.2829 |
447
+ | cosine_precision@5 | 0.1777 |
448
+ | cosine_precision@10 | 0.0921 |
449
+ | cosine_recall@1 | 0.7286 |
450
+ | cosine_recall@3 | 0.8486 |
451
+ | cosine_recall@5 | 0.8886 |
452
+ | cosine_recall@10 | 0.9214 |
453
+ | cosine_ndcg@10 | 0.8274 |
454
+ | cosine_mrr@10 | 0.797 |
455
+ | **cosine_map@100** | **0.7999** |
456
+
457
+ #### Information Retrieval
458
+ * Dataset: `dim_512`
459
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
460
+
461
+ | Metric | Value |
462
+ |:--------------------|:-----------|
463
+ | cosine_accuracy@1 | 0.72 |
464
+ | cosine_accuracy@3 | 0.8443 |
465
+ | cosine_accuracy@5 | 0.8786 |
466
+ | cosine_accuracy@10 | 0.92 |
467
+ | cosine_precision@1 | 0.72 |
468
+ | cosine_precision@3 | 0.2814 |
469
+ | cosine_precision@5 | 0.1757 |
470
+ | cosine_precision@10 | 0.092 |
471
+ | cosine_recall@1 | 0.72 |
472
+ | cosine_recall@3 | 0.8443 |
473
+ | cosine_recall@5 | 0.8786 |
474
+ | cosine_recall@10 | 0.92 |
475
+ | cosine_ndcg@10 | 0.8214 |
476
+ | cosine_mrr@10 | 0.7897 |
477
+ | **cosine_map@100** | **0.7927** |
478
+
479
+ #### Information Retrieval
480
+ * Dataset: `dim_256`
481
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
482
+
483
+ | Metric | Value |
484
+ |:--------------------|:-----------|
485
+ | cosine_accuracy@1 | 0.7214 |
486
+ | cosine_accuracy@3 | 0.8386 |
487
+ | cosine_accuracy@5 | 0.8743 |
488
+ | cosine_accuracy@10 | 0.9129 |
489
+ | cosine_precision@1 | 0.7214 |
490
+ | cosine_precision@3 | 0.2795 |
491
+ | cosine_precision@5 | 0.1749 |
492
+ | cosine_precision@10 | 0.0913 |
493
+ | cosine_recall@1 | 0.7214 |
494
+ | cosine_recall@3 | 0.8386 |
495
+ | cosine_recall@5 | 0.8743 |
496
+ | cosine_recall@10 | 0.9129 |
497
+ | cosine_ndcg@10 | 0.8191 |
498
+ | cosine_mrr@10 | 0.7889 |
499
+ | **cosine_map@100** | **0.7921** |
500
+
501
+ #### Information Retrieval
502
+ * Dataset: `dim_128`
503
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
504
+
505
+ | Metric | Value |
506
+ |:--------------------|:-----------|
507
+ | cosine_accuracy@1 | 0.6971 |
508
+ | cosine_accuracy@3 | 0.8329 |
509
+ | cosine_accuracy@5 | 0.8671 |
510
+ | cosine_accuracy@10 | 0.9057 |
511
+ | cosine_precision@1 | 0.6971 |
512
+ | cosine_precision@3 | 0.2776 |
513
+ | cosine_precision@5 | 0.1734 |
514
+ | cosine_precision@10 | 0.0906 |
515
+ | cosine_recall@1 | 0.6971 |
516
+ | cosine_recall@3 | 0.8329 |
517
+ | cosine_recall@5 | 0.8671 |
518
+ | cosine_recall@10 | 0.9057 |
519
+ | cosine_ndcg@10 | 0.8054 |
520
+ | cosine_mrr@10 | 0.7729 |
521
+ | **cosine_map@100** | **0.7762** |
522
+
523
+ #### Information Retrieval
524
+ * Dataset: `dim_64`
525
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
526
+
527
+ | Metric | Value |
528
+ |:--------------------|:-----------|
529
+ | cosine_accuracy@1 | 0.6614 |
530
+ | cosine_accuracy@3 | 0.7986 |
531
+ | cosine_accuracy@5 | 0.8443 |
532
+ | cosine_accuracy@10 | 0.8814 |
533
+ | cosine_precision@1 | 0.6614 |
534
+ | cosine_precision@3 | 0.2662 |
535
+ | cosine_precision@5 | 0.1689 |
536
+ | cosine_precision@10 | 0.0881 |
537
+ | cosine_recall@1 | 0.6614 |
538
+ | cosine_recall@3 | 0.7986 |
539
+ | cosine_recall@5 | 0.8443 |
540
+ | cosine_recall@10 | 0.8814 |
541
+ | cosine_ndcg@10 | 0.7729 |
542
+ | cosine_mrr@10 | 0.7378 |
543
+ | **cosine_map@100** | **0.7418** |
544
+
545
+ <!--
546
+ ## Bias, Risks and Limitations
547
+
548
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
549
+ -->
550
+
551
+ <!--
552
+ ### Recommendations
553
+
554
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
555
+ -->
556
+
557
+ ## Training Details
558
+
559
+ ### Training Dataset
560
+
561
+ #### Unnamed Dataset
562
+
563
+
564
+ * Size: 6,300 training samples
565
+ * Columns: <code>positive</code> and <code>anchor</code>
566
+ * Approximate statistics based on the first 1000 samples:
567
+ | | positive | anchor |
568
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
569
+ | type | string | string |
570
+ | details | <ul><li>min: 2 tokens</li><li>mean: 45.87 tokens</li><li>max: 272 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 20.43 tokens</li><li>max: 41 tokens</li></ul> |
571
+ * Samples:
572
+ | positive | anchor |
573
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------|
574
+ | <code>Significant judgment is required in evaluating our tax positions and during the ordinary course of business, there are many transactions and calculations for which the ultimate tax settlement is uncertain. As a result, we recognize the effect of this uncertainty on our tax attributes or taxes payable based on our estimates of the eventual outcome.</code> | <code>Why might the company's tax settlements vary?</code> |
575
+ | <code>OPSUMIT is used for the treatment of pediatric pulmonary arterial hypertension.</code> | <code>What medical condition does OPSUMIT treat?</code> |
576
+ | <code>Tangible equity ratios and tangible book value per share of common stock are non-GAAP financial measures. For more information on these ratios and corresponding reconciliations to GAAP financial measures, see Supplemental Financial Data and Non-GAAP Reconciliations.</code> | <code>What is the tangible equity ratio considered according to standard financial measures?</code> |
577
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
578
+ ```json
579
+ {
580
+ "loss": "MultipleNegativesRankingLoss",
581
+ "matryoshka_dims": [
582
+ 768,
583
+ 512,
584
+ 256,
585
+ 128,
586
+ 64
587
+ ],
588
+ "matryoshka_weights": [
589
+ 1,
590
+ 1,
591
+ 1,
592
+ 1,
593
+ 1
594
+ ],
595
+ "n_dims_per_step": -1
596
+ }
597
+ ```
598
+
599
+ ### Training Hyperparameters
600
+ #### Non-Default Hyperparameters
601
+
602
+ - `eval_strategy`: epoch
603
+ - `gradient_accumulation_steps`: 16
604
+ - `learning_rate`: 2e-05
605
+ - `num_train_epochs`: 2
606
+ - `lr_scheduler_type`: cosine
607
+ - `warmup_ratio`: 0.1
608
+ - `tf32`: False
609
+ - `load_best_model_at_end`: True
610
+ - `optim`: adamw_torch_fused
611
+ - `batch_sampler`: no_duplicates
612
+
613
+ #### All Hyperparameters
614
+ <details><summary>Click to expand</summary>
615
+
616
+ - `overwrite_output_dir`: False
617
+ - `do_predict`: False
618
+ - `eval_strategy`: epoch
619
+ - `prediction_loss_only`: True
620
+ - `per_device_train_batch_size`: 8
621
+ - `per_device_eval_batch_size`: 8
622
+ - `per_gpu_train_batch_size`: None
623
+ - `per_gpu_eval_batch_size`: None
624
+ - `gradient_accumulation_steps`: 16
625
+ - `eval_accumulation_steps`: None
626
+ - `learning_rate`: 2e-05
627
+ - `weight_decay`: 0.0
628
+ - `adam_beta1`: 0.9
629
+ - `adam_beta2`: 0.999
630
+ - `adam_epsilon`: 1e-08
631
+ - `max_grad_norm`: 1.0
632
+ - `num_train_epochs`: 2
633
+ - `max_steps`: -1
634
+ - `lr_scheduler_type`: cosine
635
+ - `lr_scheduler_kwargs`: {}
636
+ - `warmup_ratio`: 0.1
637
+ - `warmup_steps`: 0
638
+ - `log_level`: passive
639
+ - `log_level_replica`: warning
640
+ - `log_on_each_node`: True
641
+ - `logging_nan_inf_filter`: True
642
+ - `save_safetensors`: True
643
+ - `save_on_each_node`: False
644
+ - `save_only_model`: False
645
+ - `restore_callback_states_from_checkpoint`: False
646
+ - `no_cuda`: False
647
+ - `use_cpu`: False
648
+ - `use_mps_device`: False
649
+ - `seed`: 42
650
+ - `data_seed`: None
651
+ - `jit_mode_eval`: False
652
+ - `use_ipex`: False
653
+ - `bf16`: False
654
+ - `fp16`: False
655
+ - `fp16_opt_level`: O1
656
+ - `half_precision_backend`: auto
657
+ - `bf16_full_eval`: False
658
+ - `fp16_full_eval`: False
659
+ - `tf32`: False
660
+ - `local_rank`: 0
661
+ - `ddp_backend`: None
662
+ - `tpu_num_cores`: None
663
+ - `tpu_metrics_debug`: False
664
+ - `debug`: []
665
+ - `dataloader_drop_last`: False
666
+ - `dataloader_num_workers`: 0
667
+ - `dataloader_prefetch_factor`: None
668
+ - `past_index`: -1
669
+ - `disable_tqdm`: False
670
+ - `remove_unused_columns`: True
671
+ - `label_names`: None
672
+ - `load_best_model_at_end`: True
673
+ - `ignore_data_skip`: False
674
+ - `fsdp`: []
675
+ - `fsdp_min_num_params`: 0
676
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
677
+ - `fsdp_transformer_layer_cls_to_wrap`: None
678
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
679
+ - `deepspeed`: None
680
+ - `label_smoothing_factor`: 0.0
681
+ - `optim`: adamw_torch_fused
682
+ - `optim_args`: None
683
+ - `adafactor`: False
684
+ - `group_by_length`: False
685
+ - `length_column_name`: length
686
+ - `ddp_find_unused_parameters`: None
687
+ - `ddp_bucket_cap_mb`: None
688
+ - `ddp_broadcast_buffers`: False
689
+ - `dataloader_pin_memory`: True
690
+ - `dataloader_persistent_workers`: False
691
+ - `skip_memory_metrics`: True
692
+ - `use_legacy_prediction_loop`: False
693
+ - `push_to_hub`: False
694
+ - `resume_from_checkpoint`: None
695
+ - `hub_model_id`: None
696
+ - `hub_strategy`: every_save
697
+ - `hub_private_repo`: False
698
+ - `hub_always_push`: False
699
+ - `gradient_checkpointing`: False
700
+ - `gradient_checkpointing_kwargs`: None
701
+ - `include_inputs_for_metrics`: False
702
+ - `eval_do_concat_batches`: True
703
+ - `fp16_backend`: auto
704
+ - `push_to_hub_model_id`: None
705
+ - `push_to_hub_organization`: None
706
+ - `mp_parameters`:
707
+ - `auto_find_batch_size`: False
708
+ - `full_determinism`: False
709
+ - `torchdynamo`: None
710
+ - `ray_scope`: last
711
+ - `ddp_timeout`: 1800
712
+ - `torch_compile`: False
713
+ - `torch_compile_backend`: None
714
+ - `torch_compile_mode`: None
715
+ - `dispatch_batches`: None
716
+ - `split_batches`: None
717
+ - `include_tokens_per_second`: False
718
+ - `include_num_input_tokens_seen`: False
719
+ - `neftune_noise_alpha`: None
720
+ - `optim_target_modules`: None
721
+ - `batch_eval_metrics`: False
722
+ - `batch_sampler`: no_duplicates
723
+ - `multi_dataset_batch_sampler`: proportional
724
+
725
+ </details>
726
+
727
+ ### Training Logs
728
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
729
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
730
+ | 0.2030 | 10 | 0.7168 | - | - | - | - | - |
731
+ | 0.4061 | 20 | 0.3345 | - | - | - | - | - |
732
+ | 0.6091 | 30 | 0.2234 | - | - | - | - | - |
733
+ | 0.8122 | 40 | 0.2126 | - | - | - | - | - |
734
+ | **0.9949** | **49** | **-** | **0.7796** | **0.7844** | **0.7905** | **0.7293** | **0.7973** |
735
+ | 1.0152 | 50 | 0.2301 | - | - | - | - | - |
736
+ | 1.2183 | 60 | 0.1595 | - | - | - | - | - |
737
+ | 1.4213 | 70 | 0.1082 | - | - | - | - | - |
738
+ | 1.6244 | 80 | 0.0911 | - | - | - | - | - |
739
+ | 1.8274 | 90 | 0.1068 | - | - | - | - | - |
740
+ | 1.9898 | 98 | - | 0.7762 | 0.7921 | 0.7927 | 0.7418 | 0.7999 |
741
+
742
+ * The bold row denotes the saved checkpoint.
743
+
744
+ ### Framework Versions
745
+ - Python: 3.10.12
746
+ - Sentence Transformers: 3.0.1
747
+ - Transformers: 4.41.2
748
+ - PyTorch: 2.1.2+cu121
749
+ - Accelerate: 0.31.0
750
+ - Datasets: 2.19.1
751
+ - Tokenizers: 0.19.1
752
+
753
+ ## Citation
754
+
755
+ ### BibTeX
756
+
757
+ #### Sentence Transformers
758
+ ```bibtex
759
+ @inproceedings{reimers-2019-sentence-bert,
760
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
761
+ author = "Reimers, Nils and Gurevych, Iryna",
762
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
763
+ month = "11",
764
+ year = "2019",
765
+ publisher = "Association for Computational Linguistics",
766
+ url = "https://arxiv.org/abs/1908.10084",
767
+ }
768
+ ```
769
+
770
+ #### MatryoshkaLoss
771
+ ```bibtex
772
+ @misc{kusupati2024matryoshka,
773
+ title={Matryoshka Representation Learning},
774
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
775
+ year={2024},
776
+ eprint={2205.13147},
777
+ archivePrefix={arXiv},
778
+ primaryClass={cs.LG}
779
+ }
780
+ ```
781
+
782
+ #### MultipleNegativesRankingLoss
783
+ ```bibtex
784
+ @misc{henderson2017efficient,
785
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
786
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
787
+ year={2017},
788
+ eprint={1705.00652},
789
+ archivePrefix={arXiv},
790
+ primaryClass={cs.CL}
791
+ }
792
+ ```
793
+
794
+ <!--
795
+ ## Glossary
796
+
797
+ *Clearly define terms in order to be accessible across audiences.*
798
+ -->
799
+
800
+ <!--
801
+ ## Model Card Authors
802
+
803
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
804
+ -->
805
+
806
+ <!--
807
+ ## Model Card Contact
808
+
809
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
810
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20f6c740f335c3a84ccc61f6095792231cda14ed2697ba1cc486cd07299e077b
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff