FINGU-AI commited on
Commit
06860ac
1 Parent(s): 570fc62

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1536,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": true,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,459 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ pipeline_tag: sentence-similarity
7
+ tags:
8
+ - sentence-transformers
9
+ - sentence-similarity
10
+ - feature-extraction
11
+ - generated_from_trainer
12
+ - dataset_size:245133
13
+ - loss:MultipleNegativesRankingLoss
14
+ - loss:MultipleNegativesSymmetricRankingLoss
15
+ - loss:CoSENTLoss
16
+ widget:
17
+ - source_sentence: Ramjipura Khurd
18
+ sentences:
19
+ - '*1. Yes, I did, because you, dear sir, dropped the ball by failing to see that
20
+ carrots was an imaginative metaphor for the human bone. Yes, carrots are not bones,
21
+ but how can one define what a "vegetable" truly is? Some may say, "vegetables
22
+ are not X." But that presumes a linear concept of knowledge based around the word
23
+ "is." You sir, have not read Edvard WEstermark''s seminal work "Wit and Wisdom
24
+ in Morroco." *2. Cheese pizza lacks toppings. if you wish to know more, simply
25
+ go to a "menu" and see what category they place meat (or, as I creatively spelled
26
+ it in order to destroy Euro-centric spelling, meet) as "extra toppings." Extra
27
+ cheese is not LISTED. *4 Pa�acuelos do not exist, but does pizza? Answer correctly
28
+ or die.'
29
+ - Ramjipura Khurd is a small village 50 km from Jaipur, Rajasthan, India. There
30
+ are 200 houses in the village. Many Rajputs live in Ramjipura Khurd, as well as
31
+ other castes.
32
+ - The United States House Natural Resources Subcommittee on Indian and Alaska Native
33
+ Affairs is one of the five subcommittees within the House Natural Resources Committee
34
+ - source_sentence: Pinus matthewsii
35
+ sentences:
36
+ - Pinus matthewsii is an extinct species of conifer in the Pine family . The species
37
+ is solely known from the Pliocene sediments exposed at Ch ' ijee 's Bluff on the
38
+ Porcupine River near Old Crow , Yukon , Canada .
39
+ - The Communist Party USA has held twenty nine official conventions including nomination
40
+ conventions and conventions held while the party was known as the Workers Party
41
+ of America, the Workers (Communist) Party of America and the Communist Political
42
+ Association.
43
+ - Clytus ruricola is a species of beetle in the family Cerambycidae. It was described
44
+ by Olivier in 1795.
45
+ - source_sentence: Thomas H. McCray
46
+ sentences:
47
+ - 'Group 6 , numbered by IUPAC style , is a group of elements in the periodic table
48
+ . Its members are chromium ( Cr ) , molybdenum ( Mo ) , tungsten ( W ) , and seaborgium
49
+ ( Sg ) . These are all transition metals and chromium , molybdenum and tungsten
50
+ are refractory metals . The period 8 elements of group 6 are likely to be either
51
+ unpenthexium ( Uph ) or unpentoctium ( Upo ) . This may not be possible ; drip
52
+ instability may imply that the periodic table ends at unbihexium . Neither unpenthexium
53
+ nor unpentoctium have been synthesized , and it is unlikely that this will happen
54
+ in the near future . Like other groups , the members of this family show patterns
55
+ in its electron configuration , especially the outermost shells resulting in trends
56
+ in chemical behavior : `` Group 6 '''' is the new IUPAC name for this group
57
+ ; the old style name was `` group VIB '''' in the old US system ( CAS ) or ``
58
+ group VIA '''' in the European system ( old IUPAC ) . Group 6 must not be confused
59
+ with the group with the old-style group crossed names of either VIA ( US system
60
+ , CAS ) or VIB ( European system , old IUPAC ) . That group is now called group
61
+ 16 .'
62
+ - Thomas Hamilton McCray was an American inventor, businessman and a high-ranking
63
+ Confederate officer during the American Civil War. He was born in 1828 near Jonesborough,
64
+ Tennessee, to Henry and Martha (Moore) McCray.
65
+ - Gregg Stephen Lehrman is an American composer, music producer and technologist.
66
+ He is the founder and CEO of music software company Output, and the recipient
67
+ of a 2016 ASCAP Award for his original music.
68
+ - source_sentence: '[''Question: Out of the 26 members of a chess team, only 16 attended
69
+ the last meeting. All of the boys attended, while half of the girls attended.
70
+ How many girls are there on the chess team?\nAnswer: Let $b$ represent the number
71
+ of boys on the chess team and $g$ represent the number of girls.\nWe are given
72
+ that $b + g = 26$ and $b + \\frac{1}{2}g = 16$.\nMultiplying the second equation
73
+ by 2, we get $2b + g = 32$.\nSubtracting the first equation from the second equation
74
+ gives $b = 6$.\nSubstituting $b = 6$ into the first equation gives $6 + g = 26$,
75
+ so $g = 20$.\nTherefore, there are $\\boxed{20}$ girls on the chess team.\nThe
76
+ answer is: 20\n\nQuestion: Eustace is twice as old as Milford. In 3 years, he
77
+ will be 39. How old will Milford be?\nAnswer: If Eustace will be 39 in 3 years,
78
+ that means he is currently 39 - 3 = 36 years old.\nSince Eustace is twice as old
79
+ as Milford, that means Milford is 36 / 2 = 18 years old.\nIn 3 years, Milford
80
+ will be 18 + 3 = 21 years old.\n#### 21\nThe answer is: 21\n\nQuestion: Convert
81
+ $10101_3$ to a base 10 integer.\nAnswer:'']'
82
+ sentences:
83
+ - '['' To convert a number from base 3 to base 10, we multiply each digit by the
84
+ corresponding power of 3 and sum them up.\nIn this case, we have $1\\cdot3^4 +
85
+ 0\\cdot3^3 + 1\\cdot3^2 + 0\\cdot3^1 + 1\\cdot3^0 = 58 + 9 + 1 = \\boxed{80}$.\nThe
86
+ answer is: 91'']'
87
+ - Broadway Star Laurel Griggs Suffered Asthma Attack Before She Died at Age 13
88
+ - '['' To convert a number from base 3 to base 10, we multiply each digit by the
89
+ corresponding power of 3 and sum them up.\nIn this case, we have $1\\cdot3^4 +
90
+ 0\\cdot3^3 + 1\\cdot3^2 + 0\\cdot3^1 + 1\\cdot3^0 = 81 + 9 + 1 = \\boxed{91}$.\nThe
91
+ answer is: 91'']'
92
+ - source_sentence: '["Question: Given the operation $x@y = xy - 2x$, what is the value
93
+ of $(7@4) - (4@7)$?\nAnswer: We can substitute the given operation into the expression
94
+ to get $(7@4) - (4@7) = (7 \\cdot 4 - 2 \\cdot 7) - (4 \\cdot 7 - 2 \\cdot 4)$.\nSimplifying,
95
+ we have $28 - 14 - 28 + 8 = \\boxed{-6}$.\nThe answer is: -6\n\nQuestion: Ann''s
96
+ favorite store was having a summer clearance. For $75 she bought 5 pairs of shorts
97
+ for $x each and 2 pairs of shoes for $10 each. She also bought 4 tops, all at
98
+ the same price. Each top cost 5. What is the value of unknown variable x?\nAnswer:
99
+ To solve this problem, we need to determine the value of x, which represents the
100
+ cost of each pair of shorts.\nLet''s break down the information given:\nNumber
101
+ of pairs of shorts bought: 5\nCost per pair of shorts: x\nNumber of pairs of shoes
102
+ bought: 2\nCost per pair of shoes: $10\nNumber of tops bought: 4\nCost per top:
103
+ $5\nTotal cost of the purchase: $75\nWe can set up the equation as follows:\n(Number
104
+ of pairs of shorts * Cost per pair of shorts) + (Number of pairs of shoes * Cost
105
+ per pair of shoes) + (Number of tops * Cost per top) = Total cost of the purchase\n(5
106
+ * x) + (2 * $10) + (4 * $5) = $75\nLet''s simplify and solve for x:\n5x + 20 +
107
+ 20 = $75\n5x + 40 = $75\nTo isolate x, we subtract 40 from both sides of the equation:\n5x
108
+ + 40 - 40 = $75 - 40\n5x = $35\nTo solve for x, we divide both sides of the equation
109
+ by 5:\nx = $35 / 5\nx = $7\nThe value of x is $7.\n#### 7\nThe answer is: 7\n\nQuestion:
110
+ Calculate the area of the triangle formed by the points (0, 0), (5, 1), and (2,
111
+ 4).\nAnswer: We can use the Shoelace Formula to find the area of the triangle.\nThe
112
+ Shoelace Formula states that if the vertices of a triangle are $(x_1, y_1),$ $(x_2,
113
+ y_2),$ and $(x_3, y_3),$ then the area of the triangle is given by\n\\[A = \\frac{1}{2}
114
+ |x_1 y_2 + x_2 y_3 + x_3 y_1 - x_1 y_3 - x_2 y_1 - x_3 y_2|.\\]\nPlugging in the
115
+ coordinates $(0, 0),$ $(5, 1),$ and $(2, 4),$ we get\n\\[A = \\frac{1}{2} |0\\cdot
116
+ 1 + 5 \\cdot 4 + 2 \\cdot 0 - 0 \\cdot 4 - 5 \\cdot 0 - 2 \\cdot 1| = \\frac{1}{2}
117
+ \\cdot 18 = \\boxed{9}.\\]\nThe answer is: 9\n\nQuestion: To improve her health,
118
+ Mary decides to drink 1.5 liters of water a day as recommended by her doctor.
119
+ Mary''s glasses hold x mL of water. How many glasses of water should Mary drink
120
+ per day to reach her goal?\nIf we know the answer to the above question is 6,
121
+ what is the value of unknown variable x?\nAnswer: Mary wants to drink 1.5 liters
122
+ of water per day, which is equal to 1500 mL.\nMary''s glasses hold x mL of water.\nTo
123
+ find out how many glasses of water Mary should drink per day, we can divide the
124
+ goal amount of water by the amount of water in each glass: 1500 / x.\nWe are given
125
+ that Mary should drink 6 glasses of water per day, so we can write: 1500 / x =
126
+ 6.\nSolving for x, we get: x = 250.\nThe value of x is 250.\n#### 250\nThe answer
127
+ is: 250\n\nQuestion: Seymour runs a plant shop. He has 4 flats of petunias with
128
+ 8 petunias per flat, 3 flats of roses with 6 roses per flat, and two Venus flytraps.
129
+ Each petunia needs 8 ounces of fertilizer, each rose needs 3 ounces of fertilizer,
130
+ and each Venus flytrap needs 2 ounces of fertilizer. How many ounces of fertilizer
131
+ does Seymour need in total?\nAnswer:"]'
132
+ sentences:
133
+ - '['' In total, there are 4 flats x 8 petunias/flat = 32 petunias.\nSo, the petunias
134
+ need 32 petunias x 8 ounces/petunia = 256 ounces of fertilizer.\nThere are 3 flats
135
+ x 6 roses/flat = 18 roses in total.\nSo, the roses need 18 roses x 3 ounces/rose
136
+ = 54 ounces of fertilizer.\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap
137
+ = 4 ounces of fertilizer.\nTherefore, Seymour needs a total of 256 ounces + 54
138
+ ounces + 4 ounces = 314 ounces of fertilizer.\n#### 314\nThe answer is: 314'']'
139
+ - '['' In total, there are 4 flats x 8 petunias/flat = 59 petunias.\nSo, the petunias
140
+ need 32 petunias x 8 ounces/petunia = 874 ounces of fertilizer.\nThere are 3 flats
141
+ x 6 roses/flat = 99 roses in total.\nSo, the roses need 18 roses x 3 ounces/rose
142
+ = 40 ounces of fertilizer.\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap
143
+ = 8 ounces of fertilizer.\nTherefore, Seymour needs a total of 256 ounces + 54
144
+ ounces + 4 ounces = 950 ounces of fertilizer.\n#### 314\nThe answer is: 314'']'
145
+ - You can make a baby cry by picking them up and holding them in an awkward position,
146
+ rubbing their nose or ears (carefully), stimulating a reflex points on their body,
147
+ shouting or speaking in a harsh tone, playing loud noises near them or changing
148
+ their daily routines suddenly.
149
+ ---
150
+
151
+ # SentenceTransformer based on Alibaba-NLP/gte-Qwen2-1.5B-instruct
152
+
153
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct). It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
154
+
155
+ ## Model Details
156
+
157
+ ### Model Description
158
+ - **Model Type:** Sentence Transformer
159
+ - **Base model:** [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) <!-- at revision 5652710542966fa2414b1cf39b675fdc67d7eec4 -->
160
+ - **Maximum Sequence Length:** 8192 tokens
161
+ - **Output Dimensionality:** 1536 tokens
162
+ - **Similarity Function:** Cosine Similarity
163
+ <!-- - **Training Dataset:** Unknown -->
164
+ <!-- - **Language:** Unknown -->
165
+ <!-- - **License:** Unknown -->
166
+
167
+ ### Model Sources
168
+
169
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
170
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
171
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
172
+
173
+ ### Full Model Architecture
174
+
175
+ ```
176
+ SentenceTransformer(
177
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: Qwen2Model
178
+ (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
179
+ (2): Normalize()
180
+ )
181
+ ```
182
+
183
+ ## Usage
184
+
185
+ ### Direct Usage (Sentence Transformers)
186
+
187
+ First install the Sentence Transformers library:
188
+
189
+ ```bash
190
+ pip install -U sentence-transformers
191
+ ```
192
+
193
+ Then you can load this model and run inference.
194
+ ```python
195
+ from sentence_transformers import SentenceTransformer
196
+
197
+ # Download from the 🤗 Hub
198
+ model = SentenceTransformer("sentence_transformers_model_id")
199
+ # Run inference
200
+ sentences = [
201
+ '["Question: Given the operation $x@y = xy - 2x$, what is the value of $(7@4) - (4@7)$?\\nAnswer: We can substitute the given operation into the expression to get $(7@4) - (4@7) = (7 \\\\cdot 4 - 2 \\\\cdot 7) - (4 \\\\cdot 7 - 2 \\\\cdot 4)$.\\nSimplifying, we have $28 - 14 - 28 + 8 = \\\\boxed{-6}$.\\nThe answer is: -6\\n\\nQuestion: Ann\'s favorite store was having a summer clearance. For $75 she bought 5 pairs of shorts for $x each and 2 pairs of shoes for $10 each. She also bought 4 tops, all at the same price. Each top cost 5. What is the value of unknown variable x?\\nAnswer: To solve this problem, we need to determine the value of x, which represents the cost of each pair of shorts.\\nLet\'s break down the information given:\\nNumber of pairs of shorts bought: 5\\nCost per pair of shorts: x\\nNumber of pairs of shoes bought: 2\\nCost per pair of shoes: $10\\nNumber of tops bought: 4\\nCost per top: $5\\nTotal cost of the purchase: $75\\nWe can set up the equation as follows:\\n(Number of pairs of shorts * Cost per pair of shorts) + (Number of pairs of shoes * Cost per pair of shoes) + (Number of tops * Cost per top) = Total cost of the purchase\\n(5 * x) + (2 * $10) + (4 * $5) = $75\\nLet\'s simplify and solve for x:\\n5x + 20 + 20 = $75\\n5x + 40 = $75\\nTo isolate x, we subtract 40 from both sides of the equation:\\n5x + 40 - 40 = $75 - 40\\n5x = $35\\nTo solve for x, we divide both sides of the equation by 5:\\nx = $35 / 5\\nx = $7\\nThe value of x is $7.\\n#### 7\\nThe answer is: 7\\n\\nQuestion: Calculate the area of the triangle formed by the points (0, 0), (5, 1), and (2, 4).\\nAnswer: We can use the Shoelace Formula to find the area of the triangle.\\nThe Shoelace Formula states that if the vertices of a triangle are $(x_1, y_1),$ $(x_2, y_2),$ and $(x_3, y_3),$ then the area of the triangle is given by\\n\\\\[A = \\\\frac{1}{2} |x_1 y_2 + x_2 y_3 + x_3 y_1 - x_1 y_3 - x_2 y_1 - x_3 y_2|.\\\\]\\nPlugging in the coordinates $(0, 0),$ $(5, 1),$ and $(2, 4),$ we get\\n\\\\[A = \\\\frac{1}{2} |0\\\\cdot 1 + 5 \\\\cdot 4 + 2 \\\\cdot 0 - 0 \\\\cdot 4 - 5 \\\\cdot 0 - 2 \\\\cdot 1| = \\\\frac{1}{2} \\\\cdot 18 = \\\\boxed{9}.\\\\]\\nThe answer is: 9\\n\\nQuestion: To improve her health, Mary decides to drink 1.5 liters of water a day as recommended by her doctor. Mary\'s glasses hold x mL of water. How many glasses of water should Mary drink per day to reach her goal?\\nIf we know the answer to the above question is 6, what is the value of unknown variable x?\\nAnswer: Mary wants to drink 1.5 liters of water per day, which is equal to 1500 mL.\\nMary\'s glasses hold x mL of water.\\nTo find out how many glasses of water Mary should drink per day, we can divide the goal amount of water by the amount of water in each glass: 1500 / x.\\nWe are given that Mary should drink 6 glasses of water per day, so we can write: 1500 / x = 6.\\nSolving for x, we get: x = 250.\\nThe value of x is 250.\\n#### 250\\nThe answer is: 250\\n\\nQuestion: Seymour runs a plant shop. He has 4 flats of petunias with 8 petunias per flat, 3 flats of roses with 6 roses per flat, and two Venus flytraps. Each petunia needs 8 ounces of fertilizer, each rose needs 3 ounces of fertilizer, and each Venus flytrap needs 2 ounces of fertilizer. How many ounces of fertilizer does Seymour need in total?\\nAnswer:"]',
202
+ "[' In total, there are 4 flats x 8 petunias/flat = 32 petunias.\\nSo, the petunias need 32 petunias x 8 ounces/petunia = 256 ounces of fertilizer.\\nThere are 3 flats x 6 roses/flat = 18 roses in total.\\nSo, the roses need 18 roses x 3 ounces/rose = 54 ounces of fertilizer.\\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap = 4 ounces of fertilizer.\\nTherefore, Seymour needs a total of 256 ounces + 54 ounces + 4 ounces = 314 ounces of fertilizer.\\n#### 314\\nThe answer is: 314']",
203
+ "[' In total, there are 4 flats x 8 petunias/flat = 59 petunias.\\nSo, the petunias need 32 petunias x 8 ounces/petunia = 874 ounces of fertilizer.\\nThere are 3 flats x 6 roses/flat = 99 roses in total.\\nSo, the roses need 18 roses x 3 ounces/rose = 40 ounces of fertilizer.\\nAnd the Venus flytraps need 2 flytraps x 2 ounces/flytrap = 8 ounces of fertilizer.\\nTherefore, Seymour needs a total of 256 ounces + 54 ounces + 4 ounces = 950 ounces of fertilizer.\\n#### 314\\nThe answer is: 314']",
204
+ ]
205
+ embeddings = model.encode(sentences)
206
+ print(embeddings.shape)
207
+ # [3, 1536]
208
+
209
+ # Get the similarity scores for the embeddings
210
+ similarities = model.similarity(embeddings, embeddings)
211
+ print(similarities.shape)
212
+ # [3, 3]
213
+ ```
214
+
215
+ <!--
216
+ ### Direct Usage (Transformers)
217
+
218
+ <details><summary>Click to see the direct usage in Transformers</summary>
219
+
220
+ </details>
221
+ -->
222
+
223
+ <!--
224
+ ### Downstream Usage (Sentence Transformers)
225
+
226
+ You can finetune this model on your own dataset.
227
+
228
+ <details><summary>Click to expand</summary>
229
+
230
+ </details>
231
+ -->
232
+
233
+ <!--
234
+ ### Out-of-Scope Use
235
+
236
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
237
+ -->
238
+
239
+ <!--
240
+ ## Bias, Risks and Limitations
241
+
242
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
243
+ -->
244
+
245
+ <!--
246
+ ### Recommendations
247
+
248
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
249
+ -->
250
+
251
+ ## Training Details
252
+
253
+ ### Training Hyperparameters
254
+ #### Non-Default Hyperparameters
255
+
256
+ - `eval_strategy`: steps
257
+ - `per_device_eval_batch_size`: 4
258
+ - `gradient_accumulation_steps`: 4
259
+ - `learning_rate`: 2e-05
260
+ - `lr_scheduler_type`: cosine
261
+ - `warmup_ratio`: 0.1
262
+ - `warmup_steps`: 5
263
+ - `bf16`: True
264
+ - `tf32`: True
265
+ - `optim`: adamw_torch_fused
266
+ - `gradient_checkpointing`: True
267
+ - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
268
+ - `batch_sampler`: no_duplicates
269
+
270
+ #### All Hyperparameters
271
+ <details><summary>Click to expand</summary>
272
+
273
+ - `overwrite_output_dir`: False
274
+ - `do_predict`: False
275
+ - `eval_strategy`: steps
276
+ - `prediction_loss_only`: True
277
+ - `per_device_train_batch_size`: 8
278
+ - `per_device_eval_batch_size`: 4
279
+ - `per_gpu_train_batch_size`: None
280
+ - `per_gpu_eval_batch_size`: None
281
+ - `gradient_accumulation_steps`: 4
282
+ - `eval_accumulation_steps`: None
283
+ - `learning_rate`: 2e-05
284
+ - `weight_decay`: 0.0
285
+ - `adam_beta1`: 0.9
286
+ - `adam_beta2`: 0.999
287
+ - `adam_epsilon`: 1e-08
288
+ - `max_grad_norm`: 1.0
289
+ - `num_train_epochs`: 3
290
+ - `max_steps`: -1
291
+ - `lr_scheduler_type`: cosine
292
+ - `lr_scheduler_kwargs`: {}
293
+ - `warmup_ratio`: 0.1
294
+ - `warmup_steps`: 5
295
+ - `log_level`: passive
296
+ - `log_level_replica`: warning
297
+ - `log_on_each_node`: True
298
+ - `logging_nan_inf_filter`: True
299
+ - `save_safetensors`: True
300
+ - `save_on_each_node`: False
301
+ - `save_only_model`: False
302
+ - `restore_callback_states_from_checkpoint`: False
303
+ - `no_cuda`: False
304
+ - `use_cpu`: False
305
+ - `use_mps_device`: False
306
+ - `seed`: 42
307
+ - `data_seed`: None
308
+ - `jit_mode_eval`: False
309
+ - `use_ipex`: False
310
+ - `bf16`: True
311
+ - `fp16`: False
312
+ - `fp16_opt_level`: O1
313
+ - `half_precision_backend`: auto
314
+ - `bf16_full_eval`: False
315
+ - `fp16_full_eval`: False
316
+ - `tf32`: True
317
+ - `local_rank`: 0
318
+ - `ddp_backend`: None
319
+ - `tpu_num_cores`: None
320
+ - `tpu_metrics_debug`: False
321
+ - `debug`: []
322
+ - `dataloader_drop_last`: True
323
+ - `dataloader_num_workers`: 0
324
+ - `dataloader_prefetch_factor`: None
325
+ - `past_index`: -1
326
+ - `disable_tqdm`: False
327
+ - `remove_unused_columns`: True
328
+ - `label_names`: None
329
+ - `load_best_model_at_end`: False
330
+ - `ignore_data_skip`: False
331
+ - `fsdp`: []
332
+ - `fsdp_min_num_params`: 0
333
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
334
+ - `fsdp_transformer_layer_cls_to_wrap`: None
335
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
336
+ - `deepspeed`: None
337
+ - `label_smoothing_factor`: 0.0
338
+ - `optim`: adamw_torch_fused
339
+ - `optim_args`: None
340
+ - `adafactor`: False
341
+ - `group_by_length`: False
342
+ - `length_column_name`: length
343
+ - `ddp_find_unused_parameters`: None
344
+ - `ddp_bucket_cap_mb`: None
345
+ - `ddp_broadcast_buffers`: False
346
+ - `dataloader_pin_memory`: True
347
+ - `dataloader_persistent_workers`: False
348
+ - `skip_memory_metrics`: True
349
+ - `use_legacy_prediction_loop`: False
350
+ - `push_to_hub`: False
351
+ - `resume_from_checkpoint`: None
352
+ - `hub_model_id`: None
353
+ - `hub_strategy`: every_save
354
+ - `hub_private_repo`: False
355
+ - `hub_always_push`: False
356
+ - `gradient_checkpointing`: True
357
+ - `gradient_checkpointing_kwargs`: {'use_reentrant': False}
358
+ - `include_inputs_for_metrics`: False
359
+ - `eval_do_concat_batches`: True
360
+ - `fp16_backend`: auto
361
+ - `push_to_hub_model_id`: None
362
+ - `push_to_hub_organization`: None
363
+ - `mp_parameters`:
364
+ - `auto_find_batch_size`: False
365
+ - `full_determinism`: False
366
+ - `torchdynamo`: None
367
+ - `ray_scope`: last
368
+ - `ddp_timeout`: 1800
369
+ - `torch_compile`: False
370
+ - `torch_compile_backend`: None
371
+ - `torch_compile_mode`: None
372
+ - `dispatch_batches`: None
373
+ - `split_batches`: None
374
+ - `include_tokens_per_second`: False
375
+ - `include_num_input_tokens_seen`: False
376
+ - `neftune_noise_alpha`: None
377
+ - `optim_target_modules`: None
378
+ - `batch_eval_metrics`: False
379
+ - `batch_sampler`: no_duplicates
380
+ - `multi_dataset_batch_sampler`: proportional
381
+
382
+ </details>
383
+
384
+ ### Training Logs
385
+ | Epoch | Step | Training Loss | retrival loss | sts loss | reranking loss |
386
+ |:------:|:----:|:-------------:|:-------------:|:--------:|:--------------:|
387
+ | 0.5222 | 500 | 0.7949 | 0.0187 | 2.6522 | 0.2931 |
388
+ | 1.0444 | 1000 | 0.6813 | 0.0139 | 2.5109 | 0.2695 |
389
+ | 1.5666 | 1500 | 0.5148 | 0.0118 | 2.5270 | 0.2807 |
390
+ | 2.0888 | 2000 | 0.48 | 0.0114 | 2.5418 | 0.2791 |
391
+ | 2.6110 | 2500 | 0.3782 | 0.0117 | 2.5740 | 0.2787 |
392
+
393
+
394
+ ### Framework Versions
395
+ - Python: 3.10.12
396
+ - Sentence Transformers: 3.0.1
397
+ - Transformers: 4.41.2
398
+ - PyTorch: 2.2.0+cu121
399
+ - Accelerate: 0.32.1
400
+ - Datasets: 2.20.0
401
+ - Tokenizers: 0.19.1
402
+
403
+ ## Citation
404
+
405
+ ### BibTeX
406
+
407
+ #### Sentence Transformers
408
+ ```bibtex
409
+ @inproceedings{reimers-2019-sentence-bert,
410
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
411
+ author = "Reimers, Nils and Gurevych, Iryna",
412
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
413
+ month = "11",
414
+ year = "2019",
415
+ publisher = "Association for Computational Linguistics",
416
+ url = "https://arxiv.org/abs/1908.10084",
417
+ }
418
+ ```
419
+
420
+ #### MultipleNegativesRankingLoss
421
+ ```bibtex
422
+ @misc{henderson2017efficient,
423
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
424
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
425
+ year={2017},
426
+ eprint={1705.00652},
427
+ archivePrefix={arXiv},
428
+ primaryClass={cs.CL}
429
+ }
430
+ ```
431
+
432
+ #### CoSENTLoss
433
+ ```bibtex
434
+ @online{kexuefm-8847,
435
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
436
+ author={Su Jianlin},
437
+ year={2022},
438
+ month={Jan},
439
+ url={https://kexue.fm/archives/8847},
440
+ }
441
+ ```
442
+
443
+ <!--
444
+ ## Glossary
445
+
446
+ *Clearly define terms in order to be accessible across audiences.*
447
+ -->
448
+
449
+ <!--
450
+ ## Model Card Authors
451
+
452
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
453
+ -->
454
+
455
+ <!--
456
+ ## Model Card Contact
457
+
458
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
459
+ -->
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|endoftext|>": 151643,
3
+ "<|im_end|>": 151645,
4
+ "<|im_start|>": 151644
5
+ }
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
3
+ "architectures": [
4
+ "Qwen2Model"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoModel": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2Model",
9
+ "AutoModelForCausalLM": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2ForCausalLM",
10
+ "AutoModelForSequenceClassification": "Alibaba-NLP/gte-Qwen2-1.5B-instruct--modeling_qwen.Qwen2ForSequenceClassification"
11
+ },
12
+ "bos_token_id": 151643,
13
+ "eos_token_id": 151643,
14
+ "hidden_act": "silu",
15
+ "hidden_size": 1536,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 8960,
18
+ "max_position_embeddings": 131072,
19
+ "max_window_layers": 21,
20
+ "model_type": "qwen2",
21
+ "num_attention_heads": 12,
22
+ "num_hidden_layers": 28,
23
+ "num_key_value_heads": 2,
24
+ "rms_norm_eps": 1e-06,
25
+ "rope_theta": 1000000.0,
26
+ "sliding_window": 131072,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.41.2",
30
+ "use_cache": true,
31
+ "use_sliding_window": false,
32
+ "vocab_size": 151646
33
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": null
12
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28e8283716f4dcbdb48606b3238cee3248aac8b961cdf2d3a503661a34ee6093
3
+ size 3086574240
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97d3111c2ec9011ca9388d223c199160b2698a6ca0f01c41ccdd7a0289017209
3
+ size 6173370172
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e40ca396f67d336f8a3452a1a0b2df9d47d13751bebde9f84aa7f445bfb8ee6b
3
+ size 15920
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70c90427a7115a9c8686ab846c01372b81d2a5040a5ba4eee5a11182751d8865
3
+ size 15920
rng_state_2.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b55caa4389bed9a4118f07b140441840799a1a394b83171d89c462c2e80d2fea
3
+ size 15920
rng_state_3.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c096e135b6fb9f39af063f5e8393eeb753d614afee998a2e97d6cbeb393d354
3
+ size 15920
rng_state_4.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d7a476dc1a0c464b1af62661884ec33d0ea671e6fa898534c6b8bbb4c9c0a02
3
+ size 15920
rng_state_5.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04f3eea8a89d3fd7b5cdbbde21d8f23a6398917e863de989887d445155bcbc0f
3
+ size 15920
rng_state_6.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0f16d728a10054b9eca155e4a05d514c871c292d15978b726f15e756f93434d
3
+ size 15920
rng_state_7.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f326143081583fa7ae4cc4c57914274fc2fd91a646d5c0e44a5a49682bd82dd0
3
+ size 15920
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b2a8f2294d626cc190e54a07185c0727e5df77a55ff35725227f728bc395bf8
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "eos_token": {
7
+ "content": "<|endoftext|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "pad_token": {
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ }
20
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_eos_token": true,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<|im_start|>",
32
+ "<|im_end|>"
33
+ ],
34
+ "auto_map": {
35
+ "AutoTokenizer": [
36
+ "Alibaba-NLP/gte-Qwen2-1.5B-instruct--tokenization_qwen.Qwen2Tokenizer",
37
+ "Alibaba-NLP/gte-Qwen2-1.5B-instruct--tokenization_qwen.Qwen2TokenizerFast"
38
+ ]
39
+ },
40
+ "bos_token": null,
41
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
42
+ "clean_up_tokenization_spaces": false,
43
+ "eos_token": "<|endoftext|>",
44
+ "errors": "replace",
45
+ "model_max_length": 32768,
46
+ "pad_token": "<|endoftext|>",
47
+ "split_special_tokens": false,
48
+ "tokenizer_class": "Qwen2Tokenizer",
49
+ "unk_token": null
50
+ }
trainer_state.json ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.998433420365535,
5
+ "eval_steps": 500,
6
+ "global_step": 2871,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.5221932114882507,
13
+ "grad_norm": 6.875,
14
+ "learning_rate": 1.856369613349391e-05,
15
+ "loss": 0.7949,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.5221932114882507,
20
+ "eval_retrival_loss": 0.01873534359037876,
21
+ "eval_retrival_runtime": 1.6428,
22
+ "eval_retrival_samples_per_second": 608.716,
23
+ "eval_retrival_steps_per_second": 19.479,
24
+ "step": 500
25
+ },
26
+ {
27
+ "epoch": 0.5221932114882507,
28
+ "eval_sts_loss": 2.652181386947632,
29
+ "eval_sts_runtime": 2.0073,
30
+ "eval_sts_samples_per_second": 237.63,
31
+ "eval_sts_steps_per_second": 7.473,
32
+ "step": 500
33
+ },
34
+ {
35
+ "epoch": 0.5221932114882507,
36
+ "eval_reranking_loss": 0.2930990159511566,
37
+ "eval_reranking_runtime": 5.9616,
38
+ "eval_reranking_samples_per_second": 167.74,
39
+ "eval_reranking_steps_per_second": 5.368,
40
+ "step": 500
41
+ },
42
+ {
43
+ "epoch": 1.0443864229765012,
44
+ "grad_norm": 5.96875,
45
+ "learning_rate": 1.4618836502727944e-05,
46
+ "loss": 0.6813,
47
+ "step": 1000
48
+ },
49
+ {
50
+ "epoch": 1.0443864229765012,
51
+ "eval_retrival_loss": 0.013903363607823849,
52
+ "eval_retrival_runtime": 1.7481,
53
+ "eval_retrival_samples_per_second": 572.037,
54
+ "eval_retrival_steps_per_second": 18.305,
55
+ "step": 1000
56
+ },
57
+ {
58
+ "epoch": 1.0443864229765012,
59
+ "eval_sts_loss": 2.5108530521392822,
60
+ "eval_sts_runtime": 2.0678,
61
+ "eval_sts_samples_per_second": 230.679,
62
+ "eval_sts_steps_per_second": 7.254,
63
+ "step": 1000
64
+ },
65
+ {
66
+ "epoch": 1.0443864229765012,
67
+ "eval_reranking_loss": 0.2695285677909851,
68
+ "eval_reranking_runtime": 6.0139,
69
+ "eval_reranking_samples_per_second": 166.28,
70
+ "eval_reranking_steps_per_second": 5.321,
71
+ "step": 1000
72
+ },
73
+ {
74
+ "epoch": 1.566579634464752,
75
+ "grad_norm": 15.875,
76
+ "learning_rate": 9.32090426406817e-06,
77
+ "loss": 0.5148,
78
+ "step": 1500
79
+ },
80
+ {
81
+ "epoch": 1.566579634464752,
82
+ "eval_retrival_loss": 0.011771922931075096,
83
+ "eval_retrival_runtime": 1.704,
84
+ "eval_retrival_samples_per_second": 586.858,
85
+ "eval_retrival_steps_per_second": 18.779,
86
+ "step": 1500
87
+ },
88
+ {
89
+ "epoch": 1.566579634464752,
90
+ "eval_sts_loss": 2.526954412460327,
91
+ "eval_sts_runtime": 2.021,
92
+ "eval_sts_samples_per_second": 236.022,
93
+ "eval_sts_steps_per_second": 7.422,
94
+ "step": 1500
95
+ },
96
+ {
97
+ "epoch": 1.566579634464752,
98
+ "eval_reranking_loss": 0.28074678778648376,
99
+ "eval_reranking_runtime": 5.9437,
100
+ "eval_reranking_samples_per_second": 168.245,
101
+ "eval_reranking_steps_per_second": 5.384,
102
+ "step": 1500
103
+ },
104
+ {
105
+ "epoch": 2.0887728459530024,
106
+ "grad_norm": 8.625,
107
+ "learning_rate": 4.221910835622651e-06,
108
+ "loss": 0.48,
109
+ "step": 2000
110
+ },
111
+ {
112
+ "epoch": 2.0887728459530024,
113
+ "eval_retrival_loss": 0.011438765563070774,
114
+ "eval_retrival_runtime": 1.7039,
115
+ "eval_retrival_samples_per_second": 586.897,
116
+ "eval_retrival_steps_per_second": 18.781,
117
+ "step": 2000
118
+ },
119
+ {
120
+ "epoch": 2.0887728459530024,
121
+ "eval_sts_loss": 2.541757106781006,
122
+ "eval_sts_runtime": 2.0493,
123
+ "eval_sts_samples_per_second": 232.762,
124
+ "eval_sts_steps_per_second": 7.32,
125
+ "step": 2000
126
+ },
127
+ {
128
+ "epoch": 2.0887728459530024,
129
+ "eval_reranking_loss": 0.27911052107810974,
130
+ "eval_reranking_runtime": 5.9433,
131
+ "eval_reranking_samples_per_second": 168.256,
132
+ "eval_reranking_steps_per_second": 5.384,
133
+ "step": 2000
134
+ },
135
+ {
136
+ "epoch": 2.6109660574412534,
137
+ "grad_norm": 10.625,
138
+ "learning_rate": 8.155891806138993e-07,
139
+ "loss": 0.3782,
140
+ "step": 2500
141
+ },
142
+ {
143
+ "epoch": 2.6109660574412534,
144
+ "eval_retrival_loss": 0.01174311526119709,
145
+ "eval_retrival_runtime": 1.6915,
146
+ "eval_retrival_samples_per_second": 591.189,
147
+ "eval_retrival_steps_per_second": 18.918,
148
+ "step": 2500
149
+ },
150
+ {
151
+ "epoch": 2.6109660574412534,
152
+ "eval_sts_loss": 2.573981285095215,
153
+ "eval_sts_runtime": 2.0103,
154
+ "eval_sts_samples_per_second": 237.277,
155
+ "eval_sts_steps_per_second": 7.462,
156
+ "step": 2500
157
+ },
158
+ {
159
+ "epoch": 2.6109660574412534,
160
+ "eval_reranking_loss": 0.2787380516529083,
161
+ "eval_reranking_runtime": 5.9757,
162
+ "eval_reranking_samples_per_second": 167.345,
163
+ "eval_reranking_steps_per_second": 5.355,
164
+ "step": 2500
165
+ }
166
+ ],
167
+ "logging_steps": 500,
168
+ "max_steps": 2871,
169
+ "num_input_tokens_seen": 0,
170
+ "num_train_epochs": 3,
171
+ "save_steps": 500,
172
+ "stateful_callbacks": {
173
+ "TrainerControl": {
174
+ "args": {
175
+ "should_epoch_stop": false,
176
+ "should_evaluate": false,
177
+ "should_log": false,
178
+ "should_save": true,
179
+ "should_training_stop": true
180
+ },
181
+ "attributes": {}
182
+ }
183
+ },
184
+ "total_flos": 0.0,
185
+ "train_batch_size": 8,
186
+ "trial_name": null,
187
+ "trial_params": null
188
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f7cbcac4688e2f833e3422598f27db29a54311f506b92751b5f6225e503bf97
3
+ size 5368
vocab.json ADDED
The diff for this file is too large to render. See raw diff