File size: 24,060 Bytes
06860ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
---
base_model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:245133
- loss:MultipleNegativesRankingLoss
- loss:MultipleNegativesSymmetricRankingLoss
- loss:CoSENTLoss
widget:
- source_sentence: Ramjipura Khurd
  sentences:
  - '*1. Yes, I did, because you, dear sir, dropped the ball by failing to see that
    carrots was an imaginative metaphor for the human bone. Yes, carrots are not bones,
    but how can one define what a "vegetable" truly is? Some may say, "vegetables
    are not X." But that presumes a linear concept of knowledge based around the word
    "is." You sir, have not read Edvard WEstermark''s seminal work "Wit and Wisdom
    in Morroco." *2. Cheese pizza lacks toppings. if you wish to know more, simply
    go to a "menu" and see what category they place meat (or, as I creatively spelled
    it in order to destroy Euro-centric spelling, meet) as "extra toppings." Extra
    cheese is not LISTED. *4 Pa�acuelos do not exist, but does pizza? Answer correctly
    or die.'
  - Ramjipura Khurd is a small village 50 km from Jaipur, Rajasthan, India. There
    are 200 houses in the village. Many Rajputs live in Ramjipura Khurd, as well as
    other castes.
  - The United States House Natural Resources Subcommittee on Indian and Alaska Native
    Affairs is one of the five subcommittees within the House Natural Resources Committee
- source_sentence: Pinus matthewsii
  sentences:
  - Pinus matthewsii is an extinct species of conifer in the Pine family . The species
    is solely known from the Pliocene sediments exposed at Ch ' ijee 's Bluff on the
    Porcupine River near Old Crow , Yukon , Canada .
  - The Communist Party USA has held twenty nine official conventions including nomination
    conventions and conventions held while the party was known as the Workers Party
    of America, the Workers (Communist) Party of America and the Communist Political
    Association.
  - Clytus ruricola is a species of beetle in the family Cerambycidae. It was described
    by Olivier in 1795.
- source_sentence: Thomas H. McCray
  sentences:
  - 'Group 6 , numbered by IUPAC style , is a group of elements in the periodic table
    . Its members are chromium ( Cr ) , molybdenum ( Mo ) , tungsten ( W ) , and seaborgium
    ( Sg ) . These are all transition metals and chromium , molybdenum and tungsten
    are refractory metals . The period 8 elements of group 6 are likely to be either
    unpenthexium ( Uph ) or unpentoctium ( Upo ) . This may not be possible ; drip
    instability may imply that the periodic table ends at unbihexium . Neither unpenthexium
    nor unpentoctium have been synthesized , and it is unlikely that this will happen
    in the near future .   Like other groups , the members of this family show patterns
    in its electron configuration , especially the outermost shells resulting in trends
    in chemical behavior :   `` Group 6 '''' is the new IUPAC name for this group
    ; the old style name was `` group VIB '''' in the old US system ( CAS ) or ``
    group VIA '''' in the European system ( old IUPAC ) . Group 6 must not be confused
    with the group with the old-style group crossed names of either VIA ( US system
    , CAS ) or VIB ( European system , old IUPAC ) . That group is now called group
    16 .'
  - Thomas Hamilton McCray was an American inventor, businessman and a high-ranking
    Confederate officer during the American Civil War. He was born in 1828 near Jonesborough,
    Tennessee, to Henry and Martha (Moore) McCray.
  - Gregg Stephen Lehrman is an American composer, music producer and technologist.
    He is the founder and CEO of music software company Output, and the recipient
    of a 2016 ASCAP Award for his original music.
- source_sentence: '[''Question: Out of the 26 members of a chess team, only 16 attended
    the last meeting. All of the boys attended, while half of the girls attended.
    How many girls are there on the chess team?\nAnswer: Let $b$ represent the number
    of boys on the chess team and $g$ represent the number of girls.\nWe are given
    that $b + g = 26$ and $b + \\frac{1}{2}g = 16$.\nMultiplying the second equation
    by 2, we get $2b + g = 32$.\nSubtracting the first equation from the second equation
    gives $b = 6$.\nSubstituting $b = 6$ into the first equation gives $6 + g = 26$,
    so $g = 20$.\nTherefore, there are $\\boxed{20}$ girls on the chess team.\nThe
    answer is: 20\n\nQuestion: Eustace is twice as old as Milford. In 3 years, he
    will be 39. How old will Milford be?\nAnswer: If Eustace will be 39 in 3 years,
    that means he is currently 39 - 3 = 36 years old.\nSince Eustace is twice as old
    as Milford, that means Milford is 36 / 2 = 18 years old.\nIn 3 years, Milford
    will be 18 + 3 = 21 years old.\n#### 21\nThe answer is: 21\n\nQuestion: Convert
    $10101_3$ to a base 10 integer.\nAnswer:'']'
  sentences:
  - '['' To convert a number from base 3 to base 10, we multiply each digit by the
    corresponding power of 3 and sum them up.\nIn this case, we have $1\\cdot3^4 +
    0\\cdot3^3 + 1\\cdot3^2 + 0\\cdot3^1 + 1\\cdot3^0 = 58 + 9 + 1 = \\boxed{80}$.\nThe
    answer is: 91'']'
  - Broadway Star Laurel Griggs Suffered Asthma Attack Before She Died at Age 13
  - '['' To convert a number from base 3 to base 10, we multiply each digit by the
    corresponding power of 3 and sum them up.\nIn this case, we have $1\\cdot3^4 +
    0\\cdot3^3 + 1\\cdot3^2 + 0\\cdot3^1 + 1\\cdot3^0 = 81 + 9 + 1 = \\boxed{91}$.\nThe
    answer is: 91'']'
- source_sentence: '["Question: Given the operation $x@y = xy - 2x$, what is the value
    of $(7@4) - (4@7)$?\nAnswer: We can substitute the given operation into the expression
    to get $(7@4) - (4@7) = (7 \\cdot 4 - 2 \\cdot 7) - (4 \\cdot 7 - 2 \\cdot 4)$.\nSimplifying,
    we have $28 - 14 - 28 + 8 = \\boxed{-6}$.\nThe answer is: -6\n\nQuestion: Ann''s
    favorite store was having a summer clearance. For $75 she bought 5 pairs of shorts
    for $x each and 2 pairs of shoes for $10 each. She also bought 4 tops, all at
    the same price. Each top cost 5. What is the value of unknown variable x?\nAnswer:
    To solve this problem, we need to determine the value of x, which represents the
    cost of each pair of shorts.\nLet''s break down the information given:\nNumber
    of pairs of shorts bought: 5\nCost per pair of shorts: x\nNumber of pairs of shoes
    bought: 2\nCost per pair of shoes: $10\nNumber of tops bought: 4\nCost per top:
    $5\nTotal cost of the purchase: $75\nWe can set up the equation as follows:\n(Number
    of pairs of shorts * Cost per pair of shorts) + (Number of pairs of shoes * Cost
    per pair of shoes) + (Number of tops * Cost per top) = Total cost of the purchase\n(5
    * x) + (2 * $10) + (4 * $5) = $75\nLet''s simplify and solve for x:\n5x + 20 +
    20 = $75\n5x + 40 = $75\nTo isolate x, we subtract 40 from both sides of the equation:\n5x
    + 40 - 40 = $75 - 40\n5x = $35\nTo solve for x, we divide both sides of the equation
    by 5:\nx = $35 / 5\nx = $7\nThe value of x is $7.\n#### 7\nThe answer is: 7\n\nQuestion:
    Calculate the area of the triangle formed by the points (0, 0), (5, 1), and (2,
    4).\nAnswer: We can use the Shoelace Formula to find the area of the triangle.\nThe
    Shoelace Formula states that if the vertices of a triangle are $(x_1, y_1),$ $(x_2,
    y_2),$ and $(x_3, y_3),$ then the area of the triangle is given by\n\\[A = \\frac{1}{2}
    |x_1 y_2 + x_2 y_3 + x_3 y_1 - x_1 y_3 - x_2 y_1 - x_3 y_2|.\\]\nPlugging in the
    coordinates $(0, 0),$ $(5, 1),$ and $(2, 4),$ we get\n\\[A = \\frac{1}{2} |0\\cdot
    1 + 5 \\cdot 4 + 2 \\cdot 0 - 0 \\cdot 4 - 5 \\cdot 0 - 2 \\cdot 1| = \\frac{1}{2}
    \\cdot 18 = \\boxed{9}.\\]\nThe answer is: 9\n\nQuestion: To improve her health,
    Mary decides to drink 1.5 liters of water a day as recommended by her doctor.
    Mary''s glasses hold x mL of water. How many glasses of water should Mary drink
    per day to reach her goal?\nIf we know the answer to the above question is 6,
    what is the value of unknown variable x?\nAnswer: Mary wants to drink 1.5 liters
    of water per day, which is equal to 1500 mL.\nMary''s glasses hold x mL of water.\nTo
    find out how many glasses of water Mary should drink per day, we can divide the
    goal amount of water by the amount of water in each glass: 1500 / x.\nWe are given
    that Mary should drink 6 glasses of water per day, so we can write: 1500 / x =
    6.\nSolving for x, we get: x = 250.\nThe value of x is 250.\n#### 250\nThe answer
    is: 250\n\nQuestion: Seymour runs a plant shop. He has 4 flats of petunias with
    8 petunias per flat, 3 flats of roses with 6 roses per flat, and two Venus flytraps.
    Each petunia needs 8 ounces of fertilizer, each rose needs 3 ounces of fertilizer,
    and each Venus flytrap needs 2 ounces of fertilizer. How many ounces of fertilizer
    does Seymour need in total?\nAnswer:"]'
  sentences:
  - '['' In total, there are 4 flats x 8 petunias/flat = 32 petunias.\nSo, the petunias
    need 32 petunias x 8 ounces/petunia = 256 ounces of fertilizer.\nThere are 3 flats
    x 6 roses/flat = 18 roses in total.\nSo, the roses need 18 roses x 3 ounces/rose
    = 54 ounces of fertilizer.\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap
    = 4 ounces of fertilizer.\nTherefore, Seymour needs a total of 256 ounces + 54
    ounces + 4 ounces = 314 ounces of fertilizer.\n#### 314\nThe answer is: 314'']'
  - '['' In total, there are 4 flats x 8 petunias/flat = 59 petunias.\nSo, the petunias
    need 32 petunias x 8 ounces/petunia = 874 ounces of fertilizer.\nThere are 3 flats
    x 6 roses/flat = 99 roses in total.\nSo, the roses need 18 roses x 3 ounces/rose
    = 40 ounces of fertilizer.\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap
    = 8 ounces of fertilizer.\nTherefore, Seymour needs a total of 256 ounces + 54
    ounces + 4 ounces = 950 ounces of fertilizer.\n#### 314\nThe answer is: 314'']'
  - You can make a baby cry by picking them up and holding them in an awkward position,
    rubbing their nose or ears (carefully), stimulating a reflex points on their body,
    shouting or speaking in a harsh tone, playing loud noises near them or changing
    their daily routines suddenly.
---

# SentenceTransformer based on Alibaba-NLP/gte-Qwen2-1.5B-instruct

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct). It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) <!-- at revision 5652710542966fa2414b1cf39b675fdc67d7eec4 -->
- **Maximum Sequence Length:** 8192 tokens
- **Output Dimensionality:** 1536 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '["Question: Given the operation $x@y = xy - 2x$, what is the value of $(7@4) - (4@7)$?\\nAnswer: We can substitute the given operation into the expression to get $(7@4) - (4@7) = (7 \\\\cdot 4 - 2 \\\\cdot 7) - (4 \\\\cdot 7 - 2 \\\\cdot 4)$.\\nSimplifying, we have $28 - 14 - 28 + 8 = \\\\boxed{-6}$.\\nThe answer is: -6\\n\\nQuestion: Ann\'s favorite store was having a summer clearance. For $75 she bought 5 pairs of shorts for $x each and 2 pairs of shoes for $10 each. She also bought 4 tops, all at the same price. Each top cost 5. What is the value of unknown variable x?\\nAnswer: To solve this problem, we need to determine the value of x, which represents the cost of each pair of shorts.\\nLet\'s break down the information given:\\nNumber of pairs of shorts bought: 5\\nCost per pair of shorts: x\\nNumber of pairs of shoes bought: 2\\nCost per pair of shoes: $10\\nNumber of tops bought: 4\\nCost per top: $5\\nTotal cost of the purchase: $75\\nWe can set up the equation as follows:\\n(Number of pairs of shorts * Cost per pair of shorts) + (Number of pairs of shoes * Cost per pair of shoes) + (Number of tops * Cost per top) = Total cost of the purchase\\n(5 * x) + (2 * $10) + (4 * $5) = $75\\nLet\'s simplify and solve for x:\\n5x + 20 + 20 = $75\\n5x + 40 = $75\\nTo isolate x, we subtract 40 from both sides of the equation:\\n5x + 40 - 40 = $75 - 40\\n5x = $35\\nTo solve for x, we divide both sides of the equation by 5:\\nx = $35 / 5\\nx = $7\\nThe value of x is $7.\\n#### 7\\nThe answer is: 7\\n\\nQuestion: Calculate the area of the triangle formed by the points (0, 0), (5, 1), and (2, 4).\\nAnswer: We can use the Shoelace Formula to find the area of the triangle.\\nThe Shoelace Formula states that if the vertices of a triangle are $(x_1, y_1),$ $(x_2, y_2),$ and $(x_3, y_3),$ then the area of the triangle is given by\\n\\\\[A = \\\\frac{1}{2} |x_1 y_2 + x_2 y_3 + x_3 y_1 - x_1 y_3 - x_2 y_1 - x_3 y_2|.\\\\]\\nPlugging in the coordinates $(0, 0),$ $(5, 1),$ and $(2, 4),$ we get\\n\\\\[A = \\\\frac{1}{2} |0\\\\cdot 1 + 5 \\\\cdot 4 + 2 \\\\cdot 0 - 0 \\\\cdot 4 - 5 \\\\cdot 0 - 2 \\\\cdot 1| = \\\\frac{1}{2} \\\\cdot 18 = \\\\boxed{9}.\\\\]\\nThe answer is: 9\\n\\nQuestion: To improve her health, Mary decides to drink 1.5 liters of water a day as recommended by her doctor. Mary\'s glasses hold x mL of water. How many glasses of water should Mary drink per day to reach her goal?\\nIf we know the answer to the above question is 6, what is the value of unknown variable x?\\nAnswer: Mary wants to drink 1.5 liters of water per day, which is equal to 1500 mL.\\nMary\'s glasses hold x mL of water.\\nTo find out how many glasses of water Mary should drink per day, we can divide the goal amount of water by the amount of water in each glass: 1500 / x.\\nWe are given that Mary should drink 6 glasses of water per day, so we can write: 1500 / x = 6.\\nSolving for x, we get: x = 250.\\nThe value of x is 250.\\n#### 250\\nThe answer is: 250\\n\\nQuestion: Seymour runs a plant shop. He has 4 flats of petunias with 8 petunias per flat, 3 flats of roses with 6 roses per flat, and two Venus flytraps. Each petunia needs 8 ounces of fertilizer, each rose needs 3 ounces of fertilizer, and each Venus flytrap needs 2 ounces of fertilizer. How many ounces of fertilizer does Seymour need in total?\\nAnswer:"]',
    "[' In total, there are 4 flats x 8 petunias/flat = 32 petunias.\\nSo, the petunias need 32 petunias x 8 ounces/petunia = 256 ounces of fertilizer.\\nThere are 3 flats x 6 roses/flat = 18 roses in total.\\nSo, the roses need 18 roses x 3 ounces/rose = 54 ounces of fertilizer.\\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap = 4 ounces of fertilizer.\\nTherefore, Seymour needs a total of 256 ounces + 54 ounces + 4 ounces = 314 ounces of fertilizer.\\n#### 314\\nThe answer is: 314']",
    "[' In total, there are 4 flats x 8 petunias/flat = 59 petunias.\\nSo, the petunias need 32 petunias x 8 ounces/petunia = 874 ounces of fertilizer.\\nThere are 3 flats x 6 roses/flat = 99 roses in total.\\nSo, the roses need 18 roses x 3 ounces/rose = 40 ounces of fertilizer.\\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap = 8 ounces of fertilizer.\\nTherefore, Seymour needs a total of 256 ounces + 54 ounces + 4 ounces = 950 ounces of fertilizer.\\n#### 314\\nThe answer is: 314']",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1536]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_eval_batch_size`: 4
- `gradient_accumulation_steps`: 4
- `learning_rate`: 2e-05
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `warmup_steps`: 5
- `bf16`: True
- `tf32`: True
- `optim`: adamw_torch_fused
- `gradient_checkpointing`: True
- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 8
- `per_device_eval_batch_size`: 4
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 4
- `eval_accumulation_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 3
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 5
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: True
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: True
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: True
- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss | retrival loss | sts loss | reranking loss |
|:------:|:----:|:-------------:|:-------------:|:--------:|:--------------:|
| 0.5222 | 500  | 0.7949        | 0.0187        | 2.6522   | 0.2931         |
| 1.0444 | 1000 | 0.6813        | 0.0139        | 2.5109   | 0.2695         |
| 1.5666 | 1500 | 0.5148        | 0.0118        | 2.5270   | 0.2807         |
| 2.0888 | 2000 | 0.48          | 0.0114        | 2.5418   | 0.2791         |
| 2.6110 | 2500 | 0.3782        | 0.0117        | 2.5740   | 0.2787         |


### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.2.0+cu121
- Accelerate: 0.32.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

#### CoSENTLoss
```bibtex
@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->