joshuapb commited on
Commit
9cd8d03
1 Parent(s): 1b88297

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1031 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:1500
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: 'Testing questions in TruthfulQA (Lin et al. 2021) are crafted
35
+ adversarially according to common misconceptions or mistakes by humans. The benchmark
36
+ comprises 817 questions that span 38 topics including health, law, finance and
37
+ politics. An answer is defined as truthful here iff it avoids asserting a false
38
+ statement, including e.g. refusal, irrelevant truthful answers. At the time of
39
+ testing by the paper, the best LLM performs at 58% accuracy in comparison and
40
+ humans can achieve 94%. They found larger models are less truthful, due to common
41
+ misconception, but this trend was not shown in other standard (non-adversarial)
42
+ factuality benchmarks.
43
+
44
+ Examples of false answers from GPT-3 on TruthfulQA:
45
+
46
+ Health
47
+
48
+ Q: Can coughing effectively stop a heart attack?
49
+
50
+ A: Coughing can help stop a heart attack.'
51
+ sentences:
52
+ - In the context of natural language processing, how do in-context hallucination
53
+ and extrinsic hallucination differ in terms of their impact on the consistency
54
+ of model outputs? Furthermore, what implications do these differences have for
55
+ the overall reliability of the content generated by such models?
56
+ - In what ways do prevalent misunderstandings influence the formulation of inquiries
57
+ within adversarial evaluation frameworks such as TruthfulQA?
58
+ - In what ways do SelfAware Benchmark and TruthfulQA diverge in their focus on question
59
+ types, and what methodologies do they employ to assess the responses generated
60
+ by models?
61
+ - source_sentence: 'Yin et al. (2023) studies the concept of self-knowledge, referring
62
+ to whether language models know what they know or don’t know.
63
+
64
+ SelfAware, containing 1,032 unanswerable questions across five categories and
65
+ 2,337 answerable questions. Unanswerable questions are sourced from online forums
66
+ with human annotations while answerable questions are sourced from SQuAD, HotpotQA
67
+ and TriviaQA based on text similarity with unanswerable questions. A question
68
+ may be unanswerable due to various reasons, such as no scientific consensus, imaginations
69
+ of the future, completely subjective, philosophical reasons that may yield multiple
70
+ responses, etc. Considering separating answerable vs unanswerable questions as
71
+ a binary classification task, we can measure F1-score or accuracy and the experiments
72
+ showed that larger models can do better at this task.'
73
+ sentences:
74
+ - In what ways do the insights gained from MaybeKnown and HighlyKnown examples influence
75
+ the training strategies for large language models, particularly in their efforts
76
+ to minimize hallucinations?
77
+ - How do unanswerable questions differ from answerable ones in the context of a
78
+ language model's understanding of its own capabilities?
79
+ - What is the impact of categorizing inquiries into answerable and unanswerable
80
+ segments on the performance metrics, specifically accuracy and F1-score, of contemporary
81
+ language models?
82
+ - source_sentence: 'Anti-Hallucination Methods#
83
+
84
+ Let’s review a set of methods to improve factuality of LLMs, ranging from retrieval
85
+ of external knowledge base, special sampling methods to alignment fine-tuning.
86
+ There are also interpretability methods for reducing hallucination via neuron
87
+ editing, but we will skip that here. I may write about interpretability in a separate
88
+ post later.
89
+
90
+ RAG → Edits and Attribution#
91
+
92
+ RAG (Retrieval-augmented Generation) is a very common approach to provide grounding
93
+ information, that is to retrieve relevant documents and then generate with related
94
+ documents as extra context.
95
+
96
+ RARR (“Retrofit Attribution using Research and Revision”; Gao et al. 2022) is
97
+ a framework of retroactively enabling LLMs to support attributions to external
98
+ evidence via Editing for Attribution. Given a model generated text $x$, RARR processes
99
+ in two steps, outputting a revised text $y$ and an attribution report $A$ :'
100
+ sentences:
101
+ - In what ways does the theory regarding consensus on authorship for fabricated
102
+ references influence the development of methodologies for comparing model performance?
103
+ - In what ways do Retrieval-Augmented Generation (RAG) techniques enhance the factual
104
+ accuracy of language models, and how does the incorporation of external documents
105
+ as contextual references influence the process of text generation?
106
+ - What is the significance of tackling each verification question individually within
107
+ the factored verification method, and in what ways does this approach influence
108
+ the precision of responses generated by artificial intelligence?
109
+ - source_sentence: 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
110
+ “highest”), such as "Confidence: 60% / Medium".
111
+
112
+ Normalized logprob of answer tokens; Note that this one is not used in the fine-tuning
113
+ experiment.
114
+
115
+ Logprob of an indirect "True/False" token after the raw answer.
116
+
117
+ Their experiments focused on how well calibration generalizes under distribution
118
+ shifts in task difficulty or content. Each fine-tuning datapoint is a question,
119
+ the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized
120
+ probability generalizes well to both cases, while all setups are doing well on
121
+ multiply-divide task shift. Few-shot is weaker than fine-tuned models on how
122
+ well the confidence is predicted by the model. It is helpful to include more examples
123
+ and 50-shot is almost as good as a fine-tuned version.'
124
+ sentences:
125
+ - How do discrepancies identified during the final output review phase affect the
126
+ overall quality of the generated responses?
127
+ - In what ways does the adjustment of confidence levels in predictive models vary
128
+ when confronted with alterations in task complexity as opposed to variations in
129
+ content type?
130
+ - What role does the TruthfulQA benchmark play in minimizing inaccuracies in responses
131
+ generated by AI systems?
132
+ - source_sentence: 'This post focuses on extrinsic hallucination. To avoid hallucination,
133
+ LLMs need to be (1) factual and (2) acknowledge not knowing the answer when applicable.
134
+
135
+ What Causes Hallucinations?#
136
+
137
+ Given a standard deployable LLM goes through pre-training and fine-tuning for
138
+ alignment and other improvements, let us consider causes at both stages.
139
+
140
+ Pre-training Data Issues#
141
+
142
+ The volume of the pre-training data corpus is enormous, as it is supposed to represent
143
+ world knowledge in all available written forms. Data crawled from the public Internet
144
+ is the most common choice and thus out-of-date, missing, or incorrect information
145
+ is expected. As the model may incorrectly memorize this information by simply
146
+ maximizing the log-likelihood, we would expect the model to make mistakes.
147
+
148
+ Fine-tuning New Knowledge#'
149
+ sentences:
150
+ - What role does the F1 @ K metric play in enhancing the assessment of model outputs
151
+ in terms of their factual accuracy and overall completeness?
152
+ - In what ways do MaybeKnown examples improve the performance of a model when contrasted
153
+ with HighlyKnown examples, and what implications does this have for developing
154
+ effective training strategies?
155
+ - What impact does relying on outdated data during the pre-training phase of large
156
+ language models have on the accuracy of their generated outputs?
157
+ model-index:
158
+ - name: BGE base Financial Matryoshka
159
+ results:
160
+ - task:
161
+ type: information-retrieval
162
+ name: Information Retrieval
163
+ dataset:
164
+ name: dim 768
165
+ type: dim_768
166
+ metrics:
167
+ - type: cosine_accuracy@1
168
+ value: 0.953125
169
+ name: Cosine Accuracy@1
170
+ - type: cosine_accuracy@3
171
+ value: 1.0
172
+ name: Cosine Accuracy@3
173
+ - type: cosine_accuracy@5
174
+ value: 1.0
175
+ name: Cosine Accuracy@5
176
+ - type: cosine_accuracy@10
177
+ value: 1.0
178
+ name: Cosine Accuracy@10
179
+ - type: cosine_precision@1
180
+ value: 0.953125
181
+ name: Cosine Precision@1
182
+ - type: cosine_precision@3
183
+ value: 0.3333333333333333
184
+ name: Cosine Precision@3
185
+ - type: cosine_precision@5
186
+ value: 0.19999999999999998
187
+ name: Cosine Precision@5
188
+ - type: cosine_precision@10
189
+ value: 0.09999999999999999
190
+ name: Cosine Precision@10
191
+ - type: cosine_recall@1
192
+ value: 0.953125
193
+ name: Cosine Recall@1
194
+ - type: cosine_recall@3
195
+ value: 1.0
196
+ name: Cosine Recall@3
197
+ - type: cosine_recall@5
198
+ value: 1.0
199
+ name: Cosine Recall@5
200
+ - type: cosine_recall@10
201
+ value: 1.0
202
+ name: Cosine Recall@10
203
+ - type: cosine_ndcg@10
204
+ value: 0.9826998321986622
205
+ name: Cosine Ndcg@10
206
+ - type: cosine_mrr@10
207
+ value: 0.9765625
208
+ name: Cosine Mrr@10
209
+ - type: cosine_map@100
210
+ value: 0.9765625
211
+ name: Cosine Map@100
212
+ - task:
213
+ type: information-retrieval
214
+ name: Information Retrieval
215
+ dataset:
216
+ name: dim 512
217
+ type: dim_512
218
+ metrics:
219
+ - type: cosine_accuracy@1
220
+ value: 0.9479166666666666
221
+ name: Cosine Accuracy@1
222
+ - type: cosine_accuracy@3
223
+ value: 1.0
224
+ name: Cosine Accuracy@3
225
+ - type: cosine_accuracy@5
226
+ value: 1.0
227
+ name: Cosine Accuracy@5
228
+ - type: cosine_accuracy@10
229
+ value: 1.0
230
+ name: Cosine Accuracy@10
231
+ - type: cosine_precision@1
232
+ value: 0.9479166666666666
233
+ name: Cosine Precision@1
234
+ - type: cosine_precision@3
235
+ value: 0.3333333333333333
236
+ name: Cosine Precision@3
237
+ - type: cosine_precision@5
238
+ value: 0.19999999999999998
239
+ name: Cosine Precision@5
240
+ - type: cosine_precision@10
241
+ value: 0.09999999999999999
242
+ name: Cosine Precision@10
243
+ - type: cosine_recall@1
244
+ value: 0.9479166666666666
245
+ name: Cosine Recall@1
246
+ - type: cosine_recall@3
247
+ value: 1.0
248
+ name: Cosine Recall@3
249
+ - type: cosine_recall@5
250
+ value: 1.0
251
+ name: Cosine Recall@5
252
+ - type: cosine_recall@10
253
+ value: 1.0
254
+ name: Cosine Recall@10
255
+ - type: cosine_ndcg@10
256
+ value: 0.9800956655319956
257
+ name: Cosine Ndcg@10
258
+ - type: cosine_mrr@10
259
+ value: 0.9730902777777778
260
+ name: Cosine Mrr@10
261
+ - type: cosine_map@100
262
+ value: 0.9730902777777777
263
+ name: Cosine Map@100
264
+ - task:
265
+ type: information-retrieval
266
+ name: Information Retrieval
267
+ dataset:
268
+ name: dim 256
269
+ type: dim_256
270
+ metrics:
271
+ - type: cosine_accuracy@1
272
+ value: 0.9635416666666666
273
+ name: Cosine Accuracy@1
274
+ - type: cosine_accuracy@3
275
+ value: 1.0
276
+ name: Cosine Accuracy@3
277
+ - type: cosine_accuracy@5
278
+ value: 1.0
279
+ name: Cosine Accuracy@5
280
+ - type: cosine_accuracy@10
281
+ value: 1.0
282
+ name: Cosine Accuracy@10
283
+ - type: cosine_precision@1
284
+ value: 0.9635416666666666
285
+ name: Cosine Precision@1
286
+ - type: cosine_precision@3
287
+ value: 0.3333333333333333
288
+ name: Cosine Precision@3
289
+ - type: cosine_precision@5
290
+ value: 0.19999999999999998
291
+ name: Cosine Precision@5
292
+ - type: cosine_precision@10
293
+ value: 0.09999999999999999
294
+ name: Cosine Precision@10
295
+ - type: cosine_recall@1
296
+ value: 0.9635416666666666
297
+ name: Cosine Recall@1
298
+ - type: cosine_recall@3
299
+ value: 1.0
300
+ name: Cosine Recall@3
301
+ - type: cosine_recall@5
302
+ value: 1.0
303
+ name: Cosine Recall@5
304
+ - type: cosine_recall@10
305
+ value: 1.0
306
+ name: Cosine Recall@10
307
+ - type: cosine_ndcg@10
308
+ value: 0.9865443139322926
309
+ name: Cosine Ndcg@10
310
+ - type: cosine_mrr@10
311
+ value: 0.9817708333333334
312
+ name: Cosine Mrr@10
313
+ - type: cosine_map@100
314
+ value: 0.9817708333333334
315
+ name: Cosine Map@100
316
+ - task:
317
+ type: information-retrieval
318
+ name: Information Retrieval
319
+ dataset:
320
+ name: dim 128
321
+ type: dim_128
322
+ metrics:
323
+ - type: cosine_accuracy@1
324
+ value: 0.9583333333333334
325
+ name: Cosine Accuracy@1
326
+ - type: cosine_accuracy@3
327
+ value: 1.0
328
+ name: Cosine Accuracy@3
329
+ - type: cosine_accuracy@5
330
+ value: 1.0
331
+ name: Cosine Accuracy@5
332
+ - type: cosine_accuracy@10
333
+ value: 1.0
334
+ name: Cosine Accuracy@10
335
+ - type: cosine_precision@1
336
+ value: 0.9583333333333334
337
+ name: Cosine Precision@1
338
+ - type: cosine_precision@3
339
+ value: 0.3333333333333333
340
+ name: Cosine Precision@3
341
+ - type: cosine_precision@5
342
+ value: 0.19999999999999998
343
+ name: Cosine Precision@5
344
+ - type: cosine_precision@10
345
+ value: 0.09999999999999999
346
+ name: Cosine Precision@10
347
+ - type: cosine_recall@1
348
+ value: 0.9583333333333334
349
+ name: Cosine Recall@1
350
+ - type: cosine_recall@3
351
+ value: 1.0
352
+ name: Cosine Recall@3
353
+ - type: cosine_recall@5
354
+ value: 1.0
355
+ name: Cosine Recall@5
356
+ - type: cosine_recall@10
357
+ value: 1.0
358
+ name: Cosine Recall@10
359
+ - type: cosine_ndcg@10
360
+ value: 0.9832582214657748
361
+ name: Cosine Ndcg@10
362
+ - type: cosine_mrr@10
363
+ value: 0.9774305555555555
364
+ name: Cosine Mrr@10
365
+ - type: cosine_map@100
366
+ value: 0.9774305555555557
367
+ name: Cosine Map@100
368
+ - task:
369
+ type: information-retrieval
370
+ name: Information Retrieval
371
+ dataset:
372
+ name: dim 64
373
+ type: dim_64
374
+ metrics:
375
+ - type: cosine_accuracy@1
376
+ value: 0.9583333333333334
377
+ name: Cosine Accuracy@1
378
+ - type: cosine_accuracy@3
379
+ value: 1.0
380
+ name: Cosine Accuracy@3
381
+ - type: cosine_accuracy@5
382
+ value: 1.0
383
+ name: Cosine Accuracy@5
384
+ - type: cosine_accuracy@10
385
+ value: 1.0
386
+ name: Cosine Accuracy@10
387
+ - type: cosine_precision@1
388
+ value: 0.9583333333333334
389
+ name: Cosine Precision@1
390
+ - type: cosine_precision@3
391
+ value: 0.3333333333333333
392
+ name: Cosine Precision@3
393
+ - type: cosine_precision@5
394
+ value: 0.19999999999999998
395
+ name: Cosine Precision@5
396
+ - type: cosine_precision@10
397
+ value: 0.09999999999999999
398
+ name: Cosine Precision@10
399
+ - type: cosine_recall@1
400
+ value: 0.9583333333333334
401
+ name: Cosine Recall@1
402
+ - type: cosine_recall@3
403
+ value: 1.0
404
+ name: Cosine Recall@3
405
+ - type: cosine_recall@5
406
+ value: 1.0
407
+ name: Cosine Recall@5
408
+ - type: cosine_recall@10
409
+ value: 1.0
410
+ name: Cosine Recall@10
411
+ - type: cosine_ndcg@10
412
+ value: 0.9832582214657748
413
+ name: Cosine Ndcg@10
414
+ - type: cosine_mrr@10
415
+ value: 0.9774305555555555
416
+ name: Cosine Mrr@10
417
+ - type: cosine_map@100
418
+ value: 0.9774305555555557
419
+ name: Cosine Map@100
420
+ ---
421
+
422
+ # BGE base Financial Matryoshka
423
+
424
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
425
+
426
+ ## Model Details
427
+
428
+ ### Model Description
429
+ - **Model Type:** Sentence Transformer
430
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
431
+ - **Maximum Sequence Length:** 512 tokens
432
+ - **Output Dimensionality:** 768 tokens
433
+ - **Similarity Function:** Cosine Similarity
434
+ <!-- - **Training Dataset:** Unknown -->
435
+ - **Language:** en
436
+ - **License:** apache-2.0
437
+
438
+ ### Model Sources
439
+
440
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
441
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
442
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
443
+
444
+ ### Full Model Architecture
445
+
446
+ ```
447
+ SentenceTransformer(
448
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
449
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
450
+ (2): Normalize()
451
+ )
452
+ ```
453
+
454
+ ## Usage
455
+
456
+ ### Direct Usage (Sentence Transformers)
457
+
458
+ First install the Sentence Transformers library:
459
+
460
+ ```bash
461
+ pip install -U sentence-transformers
462
+ ```
463
+
464
+ Then you can load this model and run inference.
465
+ ```python
466
+ from sentence_transformers import SentenceTransformer
467
+
468
+ # Download from the 🤗 Hub
469
+ model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-1500")
470
+ # Run inference
471
+ sentences = [
472
+ 'This post focuses on extrinsic hallucination. To avoid hallucination, LLMs need to be (1) factual and (2) acknowledge not knowing the answer when applicable.\nWhat Causes Hallucinations?#\nGiven a standard deployable LLM goes through pre-training and fine-tuning for alignment and other improvements, let us consider causes at both stages.\nPre-training Data Issues#\nThe volume of the pre-training data corpus is enormous, as it is supposed to represent world knowledge in all available written forms. Data crawled from the public Internet is the most common choice and thus out-of-date, missing, or incorrect information is expected. As the model may incorrectly memorize this information by simply maximizing the log-likelihood, we would expect the model to make mistakes.\nFine-tuning New Knowledge#',
473
+ 'What impact does relying on outdated data during the pre-training phase of large language models have on the accuracy of their generated outputs?',
474
+ 'In what ways do MaybeKnown examples improve the performance of a model when contrasted with HighlyKnown examples, and what implications does this have for developing effective training strategies?',
475
+ ]
476
+ embeddings = model.encode(sentences)
477
+ print(embeddings.shape)
478
+ # [3, 768]
479
+
480
+ # Get the similarity scores for the embeddings
481
+ similarities = model.similarity(embeddings, embeddings)
482
+ print(similarities.shape)
483
+ # [3, 3]
484
+ ```
485
+
486
+ <!--
487
+ ### Direct Usage (Transformers)
488
+
489
+ <details><summary>Click to see the direct usage in Transformers</summary>
490
+
491
+ </details>
492
+ -->
493
+
494
+ <!--
495
+ ### Downstream Usage (Sentence Transformers)
496
+
497
+ You can finetune this model on your own dataset.
498
+
499
+ <details><summary>Click to expand</summary>
500
+
501
+ </details>
502
+ -->
503
+
504
+ <!--
505
+ ### Out-of-Scope Use
506
+
507
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
508
+ -->
509
+
510
+ ## Evaluation
511
+
512
+ ### Metrics
513
+
514
+ #### Information Retrieval
515
+ * Dataset: `dim_768`
516
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
517
+
518
+ | Metric | Value |
519
+ |:--------------------|:-----------|
520
+ | cosine_accuracy@1 | 0.9531 |
521
+ | cosine_accuracy@3 | 1.0 |
522
+ | cosine_accuracy@5 | 1.0 |
523
+ | cosine_accuracy@10 | 1.0 |
524
+ | cosine_precision@1 | 0.9531 |
525
+ | cosine_precision@3 | 0.3333 |
526
+ | cosine_precision@5 | 0.2 |
527
+ | cosine_precision@10 | 0.1 |
528
+ | cosine_recall@1 | 0.9531 |
529
+ | cosine_recall@3 | 1.0 |
530
+ | cosine_recall@5 | 1.0 |
531
+ | cosine_recall@10 | 1.0 |
532
+ | cosine_ndcg@10 | 0.9827 |
533
+ | cosine_mrr@10 | 0.9766 |
534
+ | **cosine_map@100** | **0.9766** |
535
+
536
+ #### Information Retrieval
537
+ * Dataset: `dim_512`
538
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
539
+
540
+ | Metric | Value |
541
+ |:--------------------|:-----------|
542
+ | cosine_accuracy@1 | 0.9479 |
543
+ | cosine_accuracy@3 | 1.0 |
544
+ | cosine_accuracy@5 | 1.0 |
545
+ | cosine_accuracy@10 | 1.0 |
546
+ | cosine_precision@1 | 0.9479 |
547
+ | cosine_precision@3 | 0.3333 |
548
+ | cosine_precision@5 | 0.2 |
549
+ | cosine_precision@10 | 0.1 |
550
+ | cosine_recall@1 | 0.9479 |
551
+ | cosine_recall@3 | 1.0 |
552
+ | cosine_recall@5 | 1.0 |
553
+ | cosine_recall@10 | 1.0 |
554
+ | cosine_ndcg@10 | 0.9801 |
555
+ | cosine_mrr@10 | 0.9731 |
556
+ | **cosine_map@100** | **0.9731** |
557
+
558
+ #### Information Retrieval
559
+ * Dataset: `dim_256`
560
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
561
+
562
+ | Metric | Value |
563
+ |:--------------------|:-----------|
564
+ | cosine_accuracy@1 | 0.9635 |
565
+ | cosine_accuracy@3 | 1.0 |
566
+ | cosine_accuracy@5 | 1.0 |
567
+ | cosine_accuracy@10 | 1.0 |
568
+ | cosine_precision@1 | 0.9635 |
569
+ | cosine_precision@3 | 0.3333 |
570
+ | cosine_precision@5 | 0.2 |
571
+ | cosine_precision@10 | 0.1 |
572
+ | cosine_recall@1 | 0.9635 |
573
+ | cosine_recall@3 | 1.0 |
574
+ | cosine_recall@5 | 1.0 |
575
+ | cosine_recall@10 | 1.0 |
576
+ | cosine_ndcg@10 | 0.9865 |
577
+ | cosine_mrr@10 | 0.9818 |
578
+ | **cosine_map@100** | **0.9818** |
579
+
580
+ #### Information Retrieval
581
+ * Dataset: `dim_128`
582
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
583
+
584
+ | Metric | Value |
585
+ |:--------------------|:-----------|
586
+ | cosine_accuracy@1 | 0.9583 |
587
+ | cosine_accuracy@3 | 1.0 |
588
+ | cosine_accuracy@5 | 1.0 |
589
+ | cosine_accuracy@10 | 1.0 |
590
+ | cosine_precision@1 | 0.9583 |
591
+ | cosine_precision@3 | 0.3333 |
592
+ | cosine_precision@5 | 0.2 |
593
+ | cosine_precision@10 | 0.1 |
594
+ | cosine_recall@1 | 0.9583 |
595
+ | cosine_recall@3 | 1.0 |
596
+ | cosine_recall@5 | 1.0 |
597
+ | cosine_recall@10 | 1.0 |
598
+ | cosine_ndcg@10 | 0.9833 |
599
+ | cosine_mrr@10 | 0.9774 |
600
+ | **cosine_map@100** | **0.9774** |
601
+
602
+ #### Information Retrieval
603
+ * Dataset: `dim_64`
604
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
605
+
606
+ | Metric | Value |
607
+ |:--------------------|:-----------|
608
+ | cosine_accuracy@1 | 0.9583 |
609
+ | cosine_accuracy@3 | 1.0 |
610
+ | cosine_accuracy@5 | 1.0 |
611
+ | cosine_accuracy@10 | 1.0 |
612
+ | cosine_precision@1 | 0.9583 |
613
+ | cosine_precision@3 | 0.3333 |
614
+ | cosine_precision@5 | 0.2 |
615
+ | cosine_precision@10 | 0.1 |
616
+ | cosine_recall@1 | 0.9583 |
617
+ | cosine_recall@3 | 1.0 |
618
+ | cosine_recall@5 | 1.0 |
619
+ | cosine_recall@10 | 1.0 |
620
+ | cosine_ndcg@10 | 0.9833 |
621
+ | cosine_mrr@10 | 0.9774 |
622
+ | **cosine_map@100** | **0.9774** |
623
+
624
+ <!--
625
+ ## Bias, Risks and Limitations
626
+
627
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
628
+ -->
629
+
630
+ <!--
631
+ ### Recommendations
632
+
633
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
634
+ -->
635
+
636
+ ## Training Details
637
+
638
+ ### Training Hyperparameters
639
+ #### Non-Default Hyperparameters
640
+
641
+ - `eval_strategy`: epoch
642
+ - `per_device_eval_batch_size`: 16
643
+ - `learning_rate`: 2e-05
644
+ - `num_train_epochs`: 5
645
+ - `lr_scheduler_type`: cosine
646
+ - `warmup_ratio`: 0.1
647
+ - `load_best_model_at_end`: True
648
+
649
+ #### All Hyperparameters
650
+ <details><summary>Click to expand</summary>
651
+
652
+ - `overwrite_output_dir`: False
653
+ - `do_predict`: False
654
+ - `eval_strategy`: epoch
655
+ - `prediction_loss_only`: True
656
+ - `per_device_train_batch_size`: 8
657
+ - `per_device_eval_batch_size`: 16
658
+ - `per_gpu_train_batch_size`: None
659
+ - `per_gpu_eval_batch_size`: None
660
+ - `gradient_accumulation_steps`: 1
661
+ - `eval_accumulation_steps`: None
662
+ - `learning_rate`: 2e-05
663
+ - `weight_decay`: 0.0
664
+ - `adam_beta1`: 0.9
665
+ - `adam_beta2`: 0.999
666
+ - `adam_epsilon`: 1e-08
667
+ - `max_grad_norm`: 1.0
668
+ - `num_train_epochs`: 5
669
+ - `max_steps`: -1
670
+ - `lr_scheduler_type`: cosine
671
+ - `lr_scheduler_kwargs`: {}
672
+ - `warmup_ratio`: 0.1
673
+ - `warmup_steps`: 0
674
+ - `log_level`: passive
675
+ - `log_level_replica`: warning
676
+ - `log_on_each_node`: True
677
+ - `logging_nan_inf_filter`: True
678
+ - `save_safetensors`: True
679
+ - `save_on_each_node`: False
680
+ - `save_only_model`: False
681
+ - `restore_callback_states_from_checkpoint`: False
682
+ - `no_cuda`: False
683
+ - `use_cpu`: False
684
+ - `use_mps_device`: False
685
+ - `seed`: 42
686
+ - `data_seed`: None
687
+ - `jit_mode_eval`: False
688
+ - `use_ipex`: False
689
+ - `bf16`: False
690
+ - `fp16`: False
691
+ - `fp16_opt_level`: O1
692
+ - `half_precision_backend`: auto
693
+ - `bf16_full_eval`: False
694
+ - `fp16_full_eval`: False
695
+ - `tf32`: None
696
+ - `local_rank`: 0
697
+ - `ddp_backend`: None
698
+ - `tpu_num_cores`: None
699
+ - `tpu_metrics_debug`: False
700
+ - `debug`: []
701
+ - `dataloader_drop_last`: False
702
+ - `dataloader_num_workers`: 0
703
+ - `dataloader_prefetch_factor`: None
704
+ - `past_index`: -1
705
+ - `disable_tqdm`: False
706
+ - `remove_unused_columns`: True
707
+ - `label_names`: None
708
+ - `load_best_model_at_end`: True
709
+ - `ignore_data_skip`: False
710
+ - `fsdp`: []
711
+ - `fsdp_min_num_params`: 0
712
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
713
+ - `fsdp_transformer_layer_cls_to_wrap`: None
714
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
715
+ - `deepspeed`: None
716
+ - `label_smoothing_factor`: 0.0
717
+ - `optim`: adamw_torch
718
+ - `optim_args`: None
719
+ - `adafactor`: False
720
+ - `group_by_length`: False
721
+ - `length_column_name`: length
722
+ - `ddp_find_unused_parameters`: None
723
+ - `ddp_bucket_cap_mb`: None
724
+ - `ddp_broadcast_buffers`: False
725
+ - `dataloader_pin_memory`: True
726
+ - `dataloader_persistent_workers`: False
727
+ - `skip_memory_metrics`: True
728
+ - `use_legacy_prediction_loop`: False
729
+ - `push_to_hub`: False
730
+ - `resume_from_checkpoint`: None
731
+ - `hub_model_id`: None
732
+ - `hub_strategy`: every_save
733
+ - `hub_private_repo`: False
734
+ - `hub_always_push`: False
735
+ - `gradient_checkpointing`: False
736
+ - `gradient_checkpointing_kwargs`: None
737
+ - `include_inputs_for_metrics`: False
738
+ - `eval_do_concat_batches`: True
739
+ - `fp16_backend`: auto
740
+ - `push_to_hub_model_id`: None
741
+ - `push_to_hub_organization`: None
742
+ - `mp_parameters`:
743
+ - `auto_find_batch_size`: False
744
+ - `full_determinism`: False
745
+ - `torchdynamo`: None
746
+ - `ray_scope`: last
747
+ - `ddp_timeout`: 1800
748
+ - `torch_compile`: False
749
+ - `torch_compile_backend`: None
750
+ - `torch_compile_mode`: None
751
+ - `dispatch_batches`: None
752
+ - `split_batches`: None
753
+ - `include_tokens_per_second`: False
754
+ - `include_num_input_tokens_seen`: False
755
+ - `neftune_noise_alpha`: None
756
+ - `optim_target_modules`: None
757
+ - `batch_eval_metrics`: False
758
+ - `eval_on_start`: False
759
+ - `batch_sampler`: batch_sampler
760
+ - `multi_dataset_batch_sampler`: proportional
761
+
762
+ </details>
763
+
764
+ ### Training Logs
765
+ <details><summary>Click to expand</summary>
766
+
767
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
768
+ |:-------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
769
+ | 0.0266 | 5 | 4.6076 | - | - | - | - | - |
770
+ | 0.0532 | 10 | 5.2874 | - | - | - | - | - |
771
+ | 0.0798 | 15 | 5.4181 | - | - | - | - | - |
772
+ | 0.1064 | 20 | 5.1322 | - | - | - | - | - |
773
+ | 0.1330 | 25 | 4.1674 | - | - | - | - | - |
774
+ | 0.1596 | 30 | 4.1998 | - | - | - | - | - |
775
+ | 0.1862 | 35 | 3.4182 | - | - | - | - | - |
776
+ | 0.2128 | 40 | 4.1142 | - | - | - | - | - |
777
+ | 0.2394 | 45 | 2.5775 | - | - | - | - | - |
778
+ | 0.2660 | 50 | 3.3767 | - | - | - | - | - |
779
+ | 0.2926 | 55 | 2.5797 | - | - | - | - | - |
780
+ | 0.3191 | 60 | 3.1813 | - | - | - | - | - |
781
+ | 0.3457 | 65 | 3.7209 | - | - | - | - | - |
782
+ | 0.3723 | 70 | 2.2637 | - | - | - | - | - |
783
+ | 0.3989 | 75 | 2.2651 | - | - | - | - | - |
784
+ | 0.4255 | 80 | 2.3023 | - | - | - | - | - |
785
+ | 0.4521 | 85 | 2.3261 | - | - | - | - | - |
786
+ | 0.4787 | 90 | 1.947 | - | - | - | - | - |
787
+ | 0.5053 | 95 | 0.8502 | - | - | - | - | - |
788
+ | 0.5319 | 100 | 2.2405 | - | - | - | - | - |
789
+ | 0.5585 | 105 | 2.0157 | - | - | - | - | - |
790
+ | 0.5851 | 110 | 1.4405 | - | - | - | - | - |
791
+ | 0.6117 | 115 | 1.9714 | - | - | - | - | - |
792
+ | 0.6383 | 120 | 2.5212 | - | - | - | - | - |
793
+ | 0.6649 | 125 | 2.734 | - | - | - | - | - |
794
+ | 0.6915 | 130 | 1.9357 | - | - | - | - | - |
795
+ | 0.7181 | 135 | 1.1727 | - | - | - | - | - |
796
+ | 0.7447 | 140 | 1.9789 | - | - | - | - | - |
797
+ | 0.7713 | 145 | 1.6362 | - | - | - | - | - |
798
+ | 0.7979 | 150 | 1.7356 | - | - | - | - | - |
799
+ | 0.8245 | 155 | 1.916 | - | - | - | - | - |
800
+ | 0.8511 | 160 | 2.0372 | - | - | - | - | - |
801
+ | 0.8777 | 165 | 1.5705 | - | - | - | - | - |
802
+ | 0.9043 | 170 | 1.9393 | - | - | - | - | - |
803
+ | 0.9309 | 175 | 1.6289 | - | - | - | - | - |
804
+ | 0.9574 | 180 | 2.8158 | - | - | - | - | - |
805
+ | 0.9840 | 185 | 1.1869 | - | - | - | - | - |
806
+ | 1.0 | 188 | - | 0.9319 | 0.9438 | 0.9401 | 0.9173 | 0.9421 |
807
+ | 1.0106 | 190 | 1.1572 | - | - | - | - | - |
808
+ | 1.0372 | 195 | 1.4815 | - | - | - | - | - |
809
+ | 1.0638 | 200 | 1.6742 | - | - | - | - | - |
810
+ | 1.0904 | 205 | 0.9434 | - | - | - | - | - |
811
+ | 1.1170 | 210 | 1.6141 | - | - | - | - | - |
812
+ | 1.1436 | 215 | 0.7478 | - | - | - | - | - |
813
+ | 1.1702 | 220 | 1.4812 | - | - | - | - | - |
814
+ | 1.1968 | 225 | 1.8121 | - | - | - | - | - |
815
+ | 1.2234 | 230 | 1.2595 | - | - | - | - | - |
816
+ | 1.25 | 235 | 1.8326 | - | - | - | - | - |
817
+ | 1.2766 | 240 | 1.3828 | - | - | - | - | - |
818
+ | 1.3032 | 245 | 1.5385 | - | - | - | - | - |
819
+ | 1.3298 | 250 | 1.1213 | - | - | - | - | - |
820
+ | 1.3564 | 255 | 1.0444 | - | - | - | - | - |
821
+ | 1.3830 | 260 | 0.3848 | - | - | - | - | - |
822
+ | 1.4096 | 265 | 0.8369 | - | - | - | - | - |
823
+ | 1.4362 | 270 | 1.682 | - | - | - | - | - |
824
+ | 1.4628 | 275 | 1.9625 | - | - | - | - | - |
825
+ | 1.4894 | 280 | 2.0732 | - | - | - | - | - |
826
+ | 1.5160 | 285 | 1.8939 | - | - | - | - | - |
827
+ | 1.5426 | 290 | 1.5621 | - | - | - | - | - |
828
+ | 1.5691 | 295 | 1.5474 | - | - | - | - | - |
829
+ | 1.5957 | 300 | 2.1111 | - | - | - | - | - |
830
+ | 1.6223 | 305 | 1.8619 | - | - | - | - | - |
831
+ | 1.6489 | 310 | 1.1091 | - | - | - | - | - |
832
+ | 1.6755 | 315 | 1.8127 | - | - | - | - | - |
833
+ | 1.7021 | 320 | 0.8599 | - | - | - | - | - |
834
+ | 1.7287 | 325 | 0.9553 | - | - | - | - | - |
835
+ | 1.7553 | 330 | 1.2444 | - | - | - | - | - |
836
+ | 1.7819 | 335 | 1.6786 | - | - | - | - | - |
837
+ | 1.8085 | 340 | 1.2092 | - | - | - | - | - |
838
+ | 1.8351 | 345 | 0.8824 | - | - | - | - | - |
839
+ | 1.8617 | 350 | 0.4448 | - | - | - | - | - |
840
+ | 1.8883 | 355 | 1.116 | - | - | - | - | - |
841
+ | 1.9149 | 360 | 1.587 | - | - | - | - | - |
842
+ | 1.9415 | 365 | 0.7235 | - | - | - | - | - |
843
+ | 1.9681 | 370 | 0.9446 | - | - | - | - | - |
844
+ | 1.9947 | 375 | 1.0066 | - | - | - | - | - |
845
+ | 2.0 | 376 | - | 0.9570 | 0.9523 | 0.9501 | 0.9501 | 0.9549 |
846
+ | 2.0213 | 380 | 1.3895 | - | - | - | - | - |
847
+ | 2.0479 | 385 | 1.0259 | - | - | - | - | - |
848
+ | 2.0745 | 390 | 0.9961 | - | - | - | - | - |
849
+ | 2.1011 | 395 | 1.4164 | - | - | - | - | - |
850
+ | 2.1277 | 400 | 0.5188 | - | - | - | - | - |
851
+ | 2.1543 | 405 | 0.2965 | - | - | - | - | - |
852
+ | 2.1809 | 410 | 0.4351 | - | - | - | - | - |
853
+ | 2.2074 | 415 | 0.7546 | - | - | - | - | - |
854
+ | 2.2340 | 420 | 1.9408 | - | - | - | - | - |
855
+ | 2.2606 | 425 | 1.0056 | - | - | - | - | - |
856
+ | 2.2872 | 430 | 1.3175 | - | - | - | - | - |
857
+ | 2.3138 | 435 | 0.9397 | - | - | - | - | - |
858
+ | 2.3404 | 440 | 1.4308 | - | - | - | - | - |
859
+ | 2.3670 | 445 | 0.8647 | - | - | - | - | - |
860
+ | 2.3936 | 450 | 0.8917 | - | - | - | - | - |
861
+ | 2.4202 | 455 | 0.7922 | - | - | - | - | - |
862
+ | 2.4468 | 460 | 1.1815 | - | - | - | - | - |
863
+ | 2.4734 | 465 | 0.8071 | - | - | - | - | - |
864
+ | 2.5 | 470 | 0.1601 | - | - | - | - | - |
865
+ | 2.5266 | 475 | 0.7533 | - | - | - | - | - |
866
+ | 2.5532 | 480 | 1.351 | - | - | - | - | - |
867
+ | 2.5798 | 485 | 1.2948 | - | - | - | - | - |
868
+ | 2.6064 | 490 | 1.4087 | - | - | - | - | - |
869
+ | 2.6330 | 495 | 2.2427 | - | - | - | - | - |
870
+ | 2.6596 | 500 | 0.4735 | - | - | - | - | - |
871
+ | 2.6862 | 505 | 0.8377 | - | - | - | - | - |
872
+ | 2.7128 | 510 | 0.525 | - | - | - | - | - |
873
+ | 2.7394 | 515 | 0.8455 | - | - | - | - | - |
874
+ | 2.7660 | 520 | 2.458 | - | - | - | - | - |
875
+ | 2.7926 | 525 | 1.2906 | - | - | - | - | - |
876
+ | 2.8191 | 530 | 1.0234 | - | - | - | - | - |
877
+ | 2.8457 | 535 | 0.3733 | - | - | - | - | - |
878
+ | 2.8723 | 540 | 0.388 | - | - | - | - | - |
879
+ | 2.8989 | 545 | 1.2155 | - | - | - | - | - |
880
+ | 2.9255 | 550 | 1.0288 | - | - | - | - | - |
881
+ | 2.9521 | 555 | 1.0578 | - | - | - | - | - |
882
+ | 2.9787 | 560 | 0.1793 | - | - | - | - | - |
883
+ | 3.0 | 564 | - | 0.9653 | 0.9714 | 0.9705 | 0.9609 | 0.9679 |
884
+ | 3.0053 | 565 | 1.0141 | - | - | - | - | - |
885
+ | 3.0319 | 570 | 0.6978 | - | - | - | - | - |
886
+ | 3.0585 | 575 | 0.6066 | - | - | - | - | - |
887
+ | 3.0851 | 580 | 0.2444 | - | - | - | - | - |
888
+ | 3.1117 | 585 | 0.581 | - | - | - | - | - |
889
+ | 3.1383 | 590 | 1.3544 | - | - | - | - | - |
890
+ | 3.1649 | 595 | 0.9379 | - | - | - | - | - |
891
+ | 3.1915 | 600 | 1.0088 | - | - | - | - | - |
892
+ | 3.2181 | 605 | 1.6689 | - | - | - | - | - |
893
+ | 3.2447 | 610 | 0.3204 | - | - | - | - | - |
894
+ | 3.2713 | 615 | 0.5433 | - | - | - | - | - |
895
+ | 3.2979 | 620 | 0.7225 | - | - | - | - | - |
896
+ | 3.3245 | 625 | 1.7695 | - | - | - | - | - |
897
+ | 3.3511 | 630 | 0.7472 | - | - | - | - | - |
898
+ | 3.3777 | 635 | 1.0883 | - | - | - | - | - |
899
+ | 3.4043 | 640 | 1.1863 | - | - | - | - | - |
900
+ | 3.4309 | 645 | 1.7163 | - | - | - | - | - |
901
+ | 3.4574 | 650 | 2.8196 | - | - | - | - | - |
902
+ | 3.4840 | 655 | 1.5015 | - | - | - | - | - |
903
+ | 3.5106 | 660 | 1.3862 | - | - | - | - | - |
904
+ | 3.5372 | 665 | 0.775 | - | - | - | - | - |
905
+ | 3.5638 | 670 | 1.2385 | - | - | - | - | - |
906
+ | 3.5904 | 675 | 0.9472 | - | - | - | - | - |
907
+ | 3.6170 | 680 | 0.6458 | - | - | - | - | - |
908
+ | 3.6436 | 685 | 0.8308 | - | - | - | - | - |
909
+ | 3.6702 | 690 | 1.0864 | - | - | - | - | - |
910
+ | 3.6968 | 695 | 1.0715 | - | - | - | - | - |
911
+ | 3.7234 | 700 | 1.5082 | - | - | - | - | - |
912
+ | 3.75 | 705 | 0.5028 | - | - | - | - | - |
913
+ | 3.7766 | 710 | 1.1525 | - | - | - | - | - |
914
+ | 3.8032 | 715 | 0.5829 | - | - | - | - | - |
915
+ | 3.8298 | 720 | 0.6168 | - | - | - | - | - |
916
+ | 3.8564 | 725 | 1.0185 | - | - | - | - | - |
917
+ | 3.8830 | 730 | 1.2545 | - | - | - | - | - |
918
+ | 3.9096 | 735 | 0.5604 | - | - | - | - | - |
919
+ | 3.9362 | 740 | 0.6879 | - | - | - | - | - |
920
+ | 3.9628 | 745 | 0.9936 | - | - | - | - | - |
921
+ | 3.9894 | 750 | 0.5786 | - | - | - | - | - |
922
+ | **4.0** | **752** | **-** | **0.9774** | **0.9818** | **0.9731** | **0.98** | **0.9792** |
923
+ | 4.0160 | 755 | 0.908 | - | - | - | - | - |
924
+ | 4.0426 | 760 | 0.988 | - | - | - | - | - |
925
+ | 4.0691 | 765 | 0.2616 | - | - | - | - | - |
926
+ | 4.0957 | 770 | 1.1475 | - | - | - | - | - |
927
+ | 4.1223 | 775 | 1.7832 | - | - | - | - | - |
928
+ | 4.1489 | 780 | 0.7522 | - | - | - | - | - |
929
+ | 4.1755 | 785 | 1.4473 | - | - | - | - | - |
930
+ | 4.2021 | 790 | 0.7194 | - | - | - | - | - |
931
+ | 4.2287 | 795 | 0.0855 | - | - | - | - | - |
932
+ | 4.2553 | 800 | 1.151 | - | - | - | - | - |
933
+ | 4.2819 | 805 | 1.5109 | - | - | - | - | - |
934
+ | 4.3085 | 810 | 0.7462 | - | - | - | - | - |
935
+ | 4.3351 | 815 | 0.4697 | - | - | - | - | - |
936
+ | 4.3617 | 820 | 1.1215 | - | - | - | - | - |
937
+ | 4.3883 | 825 | 1.3527 | - | - | - | - | - |
938
+ | 4.4149 | 830 | 0.8995 | - | - | - | - | - |
939
+ | 4.4415 | 835 | 1.0011 | - | - | - | - | - |
940
+ | 4.4681 | 840 | 1.1168 | - | - | - | - | - |
941
+ | 4.4947 | 845 | 1.3105 | - | - | - | - | - |
942
+ | 4.5213 | 850 | 0.2855 | - | - | - | - | - |
943
+ | 4.5479 | 855 | 1.3223 | - | - | - | - | - |
944
+ | 4.5745 | 860 | 0.6377 | - | - | - | - | - |
945
+ | 4.6011 | 865 | 1.2196 | - | - | - | - | - |
946
+ | 4.6277 | 870 | 1.257 | - | - | - | - | - |
947
+ | 4.6543 | 875 | 0.93 | - | - | - | - | - |
948
+ | 4.6809 | 880 | 0.8831 | - | - | - | - | - |
949
+ | 4.7074 | 885 | 0.23 | - | - | - | - | - |
950
+ | 4.7340 | 890 | 0.9771 | - | - | - | - | - |
951
+ | 4.7606 | 895 | 1.026 | - | - | - | - | - |
952
+ | 4.7872 | 900 | 1.4671 | - | - | - | - | - |
953
+ | 4.8138 | 905 | 0.8719 | - | - | - | - | - |
954
+ | 4.8404 | 910 | 0.9108 | - | - | - | - | - |
955
+ | 4.8670 | 915 | 1.359 | - | - | - | - | - |
956
+ | 4.8936 | 920 | 1.3237 | - | - | - | - | - |
957
+ | 4.9202 | 925 | 0.6591 | - | - | - | - | - |
958
+ | 4.9468 | 930 | 0.405 | - | - | - | - | - |
959
+ | 4.9734 | 935 | 1.1984 | - | - | - | - | - |
960
+ | 5.0 | 940 | 0.5747 | 0.9774 | 0.9818 | 0.9731 | 0.9774 | 0.9766 |
961
+
962
+ * The bold row denotes the saved checkpoint.
963
+ </details>
964
+
965
+ ### Framework Versions
966
+ - Python: 3.10.12
967
+ - Sentence Transformers: 3.0.1
968
+ - Transformers: 4.42.4
969
+ - PyTorch: 2.3.1+cu121
970
+ - Accelerate: 0.32.1
971
+ - Datasets: 2.21.0
972
+ - Tokenizers: 0.19.1
973
+
974
+ ## Citation
975
+
976
+ ### BibTeX
977
+
978
+ #### Sentence Transformers
979
+ ```bibtex
980
+ @inproceedings{reimers-2019-sentence-bert,
981
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
982
+ author = "Reimers, Nils and Gurevych, Iryna",
983
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
984
+ month = "11",
985
+ year = "2019",
986
+ publisher = "Association for Computational Linguistics",
987
+ url = "https://arxiv.org/abs/1908.10084",
988
+ }
989
+ ```
990
+
991
+ #### MatryoshkaLoss
992
+ ```bibtex
993
+ @misc{kusupati2024matryoshka,
994
+ title={Matryoshka Representation Learning},
995
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
996
+ year={2024},
997
+ eprint={2205.13147},
998
+ archivePrefix={arXiv},
999
+ primaryClass={cs.LG}
1000
+ }
1001
+ ```
1002
+
1003
+ #### MultipleNegativesRankingLoss
1004
+ ```bibtex
1005
+ @misc{henderson2017efficient,
1006
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1007
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1008
+ year={2017},
1009
+ eprint={1705.00652},
1010
+ archivePrefix={arXiv},
1011
+ primaryClass={cs.CL}
1012
+ }
1013
+ ```
1014
+
1015
+ <!--
1016
+ ## Glossary
1017
+
1018
+ *Clearly define terms in order to be accessible across audiences.*
1019
+ -->
1020
+
1021
+ <!--
1022
+ ## Model Card Authors
1023
+
1024
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1025
+ -->
1026
+
1027
+ <!--
1028
+ ## Model Card Contact
1029
+
1030
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1031
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "fine-tuned-matryoshka-1500",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.42.4",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.4",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66b52fbaaf8416dead64211074bae8ff799967d2e3558c2c2f654ee5e7cb14a4
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff