bobox commited on
Commit
a952843
1 Parent(s): c23572b

Training in progress, step 305, checkpoint

Browse files
checkpoint-305/1_AdvancedWeightedPooling/config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "embed_dim": 768,
3
+ "num_heads": 4,
4
+ "dropout": 0.025,
5
+ "bias": true,
6
+ "gate_min": 0.05,
7
+ "gate_max": 0.95,
8
+ "gate_dropout": 0.05,
9
+ "dropout_gate_open": 0.0,
10
+ "dropout_gate_close": 0.0,
11
+ "CLS_self_attn": 0
12
+ }
checkpoint-305/1_AdvancedWeightedPooling/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e2a4fc93c8370e5138de3d32718c6a165a0c5dd0d94618938b75dda07f023b9
3
+ size 14201307
checkpoint-305/README.md ADDED
@@ -0,0 +1,1132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/deberta-v3-small
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ - cosine_accuracy
16
+ - cosine_accuracy_threshold
17
+ - cosine_f1
18
+ - cosine_f1_threshold
19
+ - cosine_precision
20
+ - cosine_recall
21
+ - cosine_ap
22
+ - dot_accuracy
23
+ - dot_accuracy_threshold
24
+ - dot_f1
25
+ - dot_f1_threshold
26
+ - dot_precision
27
+ - dot_recall
28
+ - dot_ap
29
+ - manhattan_accuracy
30
+ - manhattan_accuracy_threshold
31
+ - manhattan_f1
32
+ - manhattan_f1_threshold
33
+ - manhattan_precision
34
+ - manhattan_recall
35
+ - manhattan_ap
36
+ - euclidean_accuracy
37
+ - euclidean_accuracy_threshold
38
+ - euclidean_f1
39
+ - euclidean_f1_threshold
40
+ - euclidean_precision
41
+ - euclidean_recall
42
+ - euclidean_ap
43
+ - max_accuracy
44
+ - max_accuracy_threshold
45
+ - max_f1
46
+ - max_f1_threshold
47
+ - max_precision
48
+ - max_recall
49
+ - max_ap
50
+ pipeline_tag: sentence-similarity
51
+ tags:
52
+ - sentence-transformers
53
+ - sentence-similarity
54
+ - feature-extraction
55
+ - generated_from_trainer
56
+ - dataset_size:32500
57
+ - loss:GISTEmbedLoss
58
+ widget:
59
+ - source_sentence: phase changes do not change
60
+ sentences:
61
+ - The major Atlantic slave trading nations, ordered by trade volume, were the Portuguese,
62
+ the British, the Spanish, the French, the Dutch, and the Danish. Several had established
63
+ outposts on the African coast where they purchased slaves from local African leaders.
64
+ - "phase changes do not change mass. Particles have mass, but mass is energy. \n\
65
+ \ phase changes do not change energy"
66
+ - According to the U.S. Census Bureau , the county is a total area of , which has
67
+ land and ( 0.2 % ) is water .
68
+ - source_sentence: what jobs can you get with a bachelor degree in anthropology?
69
+ sentences:
70
+ - To determine the atomic weight of an element, you should add up protons and neutrons.
71
+ - '[''Paleontologist*'', ''Archaeologist*'', ''University Professor*'', ''Market
72
+ Research Analyst*'', ''Primatologist.'', ''Forensic Scientist*'', ''Medical Anthropologist.'',
73
+ ''Museum Technician.'']'
74
+ - The wingspan flies , the moth comes depending on the location from July to August
75
+ .
76
+ - source_sentence: Identify different forms of energy (e.g., light, sound, heat).
77
+ sentences:
78
+ - '`` Irreplaceable '''' '''' remained on the chart for thirty weeks , and was certified
79
+ double-platinum by the Recording Industry Association of America ( RIAA ) , denoting
80
+ sales of two million downloads , and had sold over 3,139,000 paid digital downloads
81
+ in the US as of October 2012 , according to Nielsen SoundScan . '''''
82
+ - On Rotten Tomatoes , the film has a rating of 63 % , based on 87 reviews , with
83
+ an average rating of 5.9/10 .
84
+ - Heat, light, and sound are all different forms of energy.
85
+ - source_sentence: what is so small it can only be seen with an electron microscope?
86
+ sentences:
87
+ - "Viruses are so small that they can be seen only with an electron microscope..\
88
+ \ Where most viruses are DNA, HIV is an RNA virus. \n HIV is so small it can only\
89
+ \ be seen with an electron microscope"
90
+ - The development of modern lasers has opened many doors to both research and applications.
91
+ A laser beam was used to measure the distance from the Earth to the moon. Lasers
92
+ are important components of CD players. As the image above illustrates, lasers
93
+ can provide precise focusing of beams to selectively destroy cancer cells in patients.
94
+ The ability of a laser to focus precisely is due to high-quality crystals that
95
+ help give rise to the laser beam. A variety of techniques are used to manufacture
96
+ pure crystals for use in lasers.
97
+ - Discussion for (a) This value is the net work done on the package. The person
98
+ actually does more work than this, because friction opposes the motion. Friction
99
+ does negative work and removes some of the energy the person expends and converts
100
+ it to thermal energy. The net work equals the sum of the work done by each individual
101
+ force. Strategy and Concept for (b) The forces acting on the package are gravity,
102
+ the normal force, the force of friction, and the applied force. The normal force
103
+ and force of gravity are each perpendicular to the displacement, and therefore
104
+ do no work. Solution for (b) The applied force does work.
105
+ - source_sentence: what aspects of your environment may relate to the epidemic of
106
+ obesity
107
+ sentences:
108
+ - Jan Kromkamp ( born August 17 , 1980 in Makkinga , Netherlands ) is a Dutch footballer
109
+ .
110
+ - When chemicals in solution react, the proper way of writing the chemical formulas
111
+ of the dissolved ionic compounds is in terms of the dissociated ions, not the
112
+ complete ionic formula. A complete ionic equation is a chemical equation in which
113
+ the dissolved ionic compounds are written as separated ions. Solubility rules
114
+ are very useful in determining which ionic compounds are dissolved and which are
115
+ not. For example, when NaCl(aq) reacts with AgNO3(aq) in a double-replacement
116
+ reaction to precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation
117
+ includes NaCl, AgNO3, and NaNO3 written as separated ions:.
118
+ - Genetic changes in human populations occur too slowly to be responsible for the
119
+ obesity epidemic. Nevertheless, the variation in how people respond to the environment
120
+ that promotes physical inactivity and intake of high-calorie foods suggests that
121
+ genes do play a role in the development of obesity.
122
+ model-index:
123
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
124
+ results:
125
+ - task:
126
+ type: semantic-similarity
127
+ name: Semantic Similarity
128
+ dataset:
129
+ name: sts test
130
+ type: sts-test
131
+ metrics:
132
+ - type: pearson_cosine
133
+ value: 0.3774946012125992
134
+ name: Pearson Cosine
135
+ - type: spearman_cosine
136
+ value: 0.4056589966976888
137
+ name: Spearman Cosine
138
+ - type: pearson_manhattan
139
+ value: 0.3861982631744407
140
+ name: Pearson Manhattan
141
+ - type: spearman_manhattan
142
+ value: 0.4059364545183154
143
+ name: Spearman Manhattan
144
+ - type: pearson_euclidean
145
+ value: 0.38652243004790016
146
+ name: Pearson Euclidean
147
+ - type: spearman_euclidean
148
+ value: 0.4056589966976888
149
+ name: Spearman Euclidean
150
+ - type: pearson_dot
151
+ value: 0.3774648453085433
152
+ name: Pearson Dot
153
+ - type: spearman_dot
154
+ value: 0.40563469676275316
155
+ name: Spearman Dot
156
+ - type: pearson_max
157
+ value: 0.38652243004790016
158
+ name: Pearson Max
159
+ - type: spearman_max
160
+ value: 0.4059364545183154
161
+ name: Spearman Max
162
+ - task:
163
+ type: binary-classification
164
+ name: Binary Classification
165
+ dataset:
166
+ name: allNLI dev
167
+ type: allNLI-dev
168
+ metrics:
169
+ - type: cosine_accuracy
170
+ value: 0.67578125
171
+ name: Cosine Accuracy
172
+ - type: cosine_accuracy_threshold
173
+ value: 0.9427558183670044
174
+ name: Cosine Accuracy Threshold
175
+ - type: cosine_f1
176
+ value: 0.5225225225225225
177
+ name: Cosine F1
178
+ - type: cosine_f1_threshold
179
+ value: 0.8046966791152954
180
+ name: Cosine F1 Threshold
181
+ - type: cosine_precision
182
+ value: 0.3795811518324607
183
+ name: Cosine Precision
184
+ - type: cosine_recall
185
+ value: 0.838150289017341
186
+ name: Cosine Recall
187
+ - type: cosine_ap
188
+ value: 0.4368751759846574
189
+ name: Cosine Ap
190
+ - type: dot_accuracy
191
+ value: 0.67578125
192
+ name: Dot Accuracy
193
+ - type: dot_accuracy_threshold
194
+ value: 724.1080322265625
195
+ name: Dot Accuracy Threshold
196
+ - type: dot_f1
197
+ value: 0.5225225225225225
198
+ name: Dot F1
199
+ - type: dot_f1_threshold
200
+ value: 618.074951171875
201
+ name: Dot F1 Threshold
202
+ - type: dot_precision
203
+ value: 0.3795811518324607
204
+ name: Dot Precision
205
+ - type: dot_recall
206
+ value: 0.838150289017341
207
+ name: Dot Recall
208
+ - type: dot_ap
209
+ value: 0.436842886797982
210
+ name: Dot Ap
211
+ - type: manhattan_accuracy
212
+ value: 0.677734375
213
+ name: Manhattan Accuracy
214
+ - type: manhattan_accuracy_threshold
215
+ value: 223.6764373779297
216
+ name: Manhattan Accuracy Threshold
217
+ - type: manhattan_f1
218
+ value: 0.5239852398523985
219
+ name: Manhattan F1
220
+ - type: manhattan_f1_threshold
221
+ value: 372.31396484375
222
+ name: Manhattan F1 Threshold
223
+ - type: manhattan_precision
224
+ value: 0.38482384823848237
225
+ name: Manhattan Precision
226
+ - type: manhattan_recall
227
+ value: 0.8208092485549133
228
+ name: Manhattan Recall
229
+ - type: manhattan_ap
230
+ value: 0.43892484929307635
231
+ name: Manhattan Ap
232
+ - type: euclidean_accuracy
233
+ value: 0.67578125
234
+ name: Euclidean Accuracy
235
+ - type: euclidean_accuracy_threshold
236
+ value: 9.377331733703613
237
+ name: Euclidean Accuracy Threshold
238
+ - type: euclidean_f1
239
+ value: 0.5225225225225225
240
+ name: Euclidean F1
241
+ - type: euclidean_f1_threshold
242
+ value: 17.321048736572266
243
+ name: Euclidean F1 Threshold
244
+ - type: euclidean_precision
245
+ value: 0.3795811518324607
246
+ name: Euclidean Precision
247
+ - type: euclidean_recall
248
+ value: 0.838150289017341
249
+ name: Euclidean Recall
250
+ - type: euclidean_ap
251
+ value: 0.4368602200677977
252
+ name: Euclidean Ap
253
+ - type: max_accuracy
254
+ value: 0.677734375
255
+ name: Max Accuracy
256
+ - type: max_accuracy_threshold
257
+ value: 724.1080322265625
258
+ name: Max Accuracy Threshold
259
+ - type: max_f1
260
+ value: 0.5239852398523985
261
+ name: Max F1
262
+ - type: max_f1_threshold
263
+ value: 618.074951171875
264
+ name: Max F1 Threshold
265
+ - type: max_precision
266
+ value: 0.38482384823848237
267
+ name: Max Precision
268
+ - type: max_recall
269
+ value: 0.838150289017341
270
+ name: Max Recall
271
+ - type: max_ap
272
+ value: 0.43892484929307635
273
+ name: Max Ap
274
+ - task:
275
+ type: binary-classification
276
+ name: Binary Classification
277
+ dataset:
278
+ name: Qnli dev
279
+ type: Qnli-dev
280
+ metrics:
281
+ - type: cosine_accuracy
282
+ value: 0.646484375
283
+ name: Cosine Accuracy
284
+ - type: cosine_accuracy_threshold
285
+ value: 0.8057259321212769
286
+ name: Cosine Accuracy Threshold
287
+ - type: cosine_f1
288
+ value: 0.6688102893890675
289
+ name: Cosine F1
290
+ - type: cosine_f1_threshold
291
+ value: 0.7187118530273438
292
+ name: Cosine F1 Threshold
293
+ - type: cosine_precision
294
+ value: 0.538860103626943
295
+ name: Cosine Precision
296
+ - type: cosine_recall
297
+ value: 0.8813559322033898
298
+ name: Cosine Recall
299
+ - type: cosine_ap
300
+ value: 0.6720663622193426
301
+ name: Cosine Ap
302
+ - type: dot_accuracy
303
+ value: 0.646484375
304
+ name: Dot Accuracy
305
+ - type: dot_accuracy_threshold
306
+ value: 618.8643798828125
307
+ name: Dot Accuracy Threshold
308
+ - type: dot_f1
309
+ value: 0.6688102893890675
310
+ name: Dot F1
311
+ - type: dot_f1_threshold
312
+ value: 552.0260009765625
313
+ name: Dot F1 Threshold
314
+ - type: dot_precision
315
+ value: 0.538860103626943
316
+ name: Dot Precision
317
+ - type: dot_recall
318
+ value: 0.8813559322033898
319
+ name: Dot Recall
320
+ - type: dot_ap
321
+ value: 0.672083506527328
322
+ name: Dot Ap
323
+ - type: manhattan_accuracy
324
+ value: 0.6484375
325
+ name: Manhattan Accuracy
326
+ - type: manhattan_accuracy_threshold
327
+ value: 386.58905029296875
328
+ name: Manhattan Accuracy Threshold
329
+ - type: manhattan_f1
330
+ value: 0.6645569620253164
331
+ name: Manhattan F1
332
+ - type: manhattan_f1_threshold
333
+ value: 462.609130859375
334
+ name: Manhattan F1 Threshold
335
+ - type: manhattan_precision
336
+ value: 0.5303030303030303
337
+ name: Manhattan Precision
338
+ - type: manhattan_recall
339
+ value: 0.8898305084745762
340
+ name: Manhattan Recall
341
+ - type: manhattan_ap
342
+ value: 0.6724653688821339
343
+ name: Manhattan Ap
344
+ - type: euclidean_accuracy
345
+ value: 0.646484375
346
+ name: Euclidean Accuracy
347
+ - type: euclidean_accuracy_threshold
348
+ value: 17.27533721923828
349
+ name: Euclidean Accuracy Threshold
350
+ - type: euclidean_f1
351
+ value: 0.6688102893890675
352
+ name: Euclidean F1
353
+ - type: euclidean_f1_threshold
354
+ value: 20.787063598632812
355
+ name: Euclidean F1 Threshold
356
+ - type: euclidean_precision
357
+ value: 0.538860103626943
358
+ name: Euclidean Precision
359
+ - type: euclidean_recall
360
+ value: 0.8813559322033898
361
+ name: Euclidean Recall
362
+ - type: euclidean_ap
363
+ value: 0.6720591998758361
364
+ name: Euclidean Ap
365
+ - type: max_accuracy
366
+ value: 0.6484375
367
+ name: Max Accuracy
368
+ - type: max_accuracy_threshold
369
+ value: 618.8643798828125
370
+ name: Max Accuracy Threshold
371
+ - type: max_f1
372
+ value: 0.6688102893890675
373
+ name: Max F1
374
+ - type: max_f1_threshold
375
+ value: 552.0260009765625
376
+ name: Max F1 Threshold
377
+ - type: max_precision
378
+ value: 0.538860103626943
379
+ name: Max Precision
380
+ - type: max_recall
381
+ value: 0.8898305084745762
382
+ name: Max Recall
383
+ - type: max_ap
384
+ value: 0.6724653688821339
385
+ name: Max Ap
386
+ ---
387
+
388
+ # SentenceTransformer based on microsoft/deberta-v3-small
389
+
390
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
391
+
392
+ ## Model Details
393
+
394
+ ### Model Description
395
+ - **Model Type:** Sentence Transformer
396
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
397
+ - **Maximum Sequence Length:** 512 tokens
398
+ - **Output Dimensionality:** 768 tokens
399
+ - **Similarity Function:** Cosine Similarity
400
+ <!-- - **Training Dataset:** Unknown -->
401
+ <!-- - **Language:** Unknown -->
402
+ <!-- - **License:** Unknown -->
403
+
404
+ ### Model Sources
405
+
406
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
407
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
408
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
409
+
410
+ ### Full Model Architecture
411
+
412
+ ```
413
+ SentenceTransformer(
414
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
415
+ (1): AdvancedWeightedPooling(
416
+ (alpha_dropout_layer): Dropout(p=0.05, inplace=False)
417
+ (gate_dropout_layer): Dropout(p=0.0, inplace=False)
418
+ (linear_cls_Qpj): Linear(in_features=768, out_features=768, bias=True)
419
+ (linear_attnOut): Linear(in_features=768, out_features=768, bias=True)
420
+ (mha): MultiheadAttention(
421
+ (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
422
+ )
423
+ (layernorm_output): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
424
+ (layernorm_weightedPooing): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
425
+ (layernorm_attnOut): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
426
+ )
427
+ )
428
+ ```
429
+
430
+ ## Usage
431
+
432
+ ### Direct Usage (Sentence Transformers)
433
+
434
+ First install the Sentence Transformers library:
435
+
436
+ ```bash
437
+ pip install -U sentence-transformers
438
+ ```
439
+
440
+ Then you can load this model and run inference.
441
+ ```python
442
+ from sentence_transformers import SentenceTransformer
443
+
444
+ # Download from the 🤗 Hub
445
+ model = SentenceTransformer("bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp")
446
+ # Run inference
447
+ sentences = [
448
+ 'what aspects of your environment may relate to the epidemic of obesity',
449
+ 'Genetic changes in human populations occur too slowly to be responsible for the obesity epidemic. Nevertheless, the variation in how people respond to the environment that promotes physical inactivity and intake of high-calorie foods suggests that genes do play a role in the development of obesity.',
450
+ 'When chemicals in solution react, the proper way of writing the chemical formulas of the dissolved ionic compounds is in terms of the dissociated ions, not the complete ionic formula. A complete ionic equation is a chemical equation in which the dissolved ionic compounds are written as separated ions. Solubility rules are very useful in determining which ionic compounds are dissolved and which are not. For example, when NaCl(aq) reacts with AgNO3(aq) in a double-replacement reaction to precipitate AgCl(s) and form NaNO3(aq), the complete ionic equation includes NaCl, AgNO3, and NaNO3 written as separated ions:.',
451
+ ]
452
+ embeddings = model.encode(sentences)
453
+ print(embeddings.shape)
454
+ # [3, 768]
455
+
456
+ # Get the similarity scores for the embeddings
457
+ similarities = model.similarity(embeddings, embeddings)
458
+ print(similarities.shape)
459
+ # [3, 3]
460
+ ```
461
+
462
+ <!--
463
+ ### Direct Usage (Transformers)
464
+
465
+ <details><summary>Click to see the direct usage in Transformers</summary>
466
+
467
+ </details>
468
+ -->
469
+
470
+ <!--
471
+ ### Downstream Usage (Sentence Transformers)
472
+
473
+ You can finetune this model on your own dataset.
474
+
475
+ <details><summary>Click to expand</summary>
476
+
477
+ </details>
478
+ -->
479
+
480
+ <!--
481
+ ### Out-of-Scope Use
482
+
483
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
484
+ -->
485
+
486
+ ## Evaluation
487
+
488
+ ### Metrics
489
+
490
+ #### Semantic Similarity
491
+ * Dataset: `sts-test`
492
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
493
+
494
+ | Metric | Value |
495
+ |:--------------------|:-----------|
496
+ | pearson_cosine | 0.3775 |
497
+ | **spearman_cosine** | **0.4057** |
498
+ | pearson_manhattan | 0.3862 |
499
+ | spearman_manhattan | 0.4059 |
500
+ | pearson_euclidean | 0.3865 |
501
+ | spearman_euclidean | 0.4057 |
502
+ | pearson_dot | 0.3775 |
503
+ | spearman_dot | 0.4056 |
504
+ | pearson_max | 0.3865 |
505
+ | spearman_max | 0.4059 |
506
+
507
+ #### Binary Classification
508
+ * Dataset: `allNLI-dev`
509
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
510
+
511
+ | Metric | Value |
512
+ |:-----------------------------|:-----------|
513
+ | cosine_accuracy | 0.6758 |
514
+ | cosine_accuracy_threshold | 0.9428 |
515
+ | cosine_f1 | 0.5225 |
516
+ | cosine_f1_threshold | 0.8047 |
517
+ | cosine_precision | 0.3796 |
518
+ | cosine_recall | 0.8382 |
519
+ | cosine_ap | 0.4369 |
520
+ | dot_accuracy | 0.6758 |
521
+ | dot_accuracy_threshold | 724.108 |
522
+ | dot_f1 | 0.5225 |
523
+ | dot_f1_threshold | 618.075 |
524
+ | dot_precision | 0.3796 |
525
+ | dot_recall | 0.8382 |
526
+ | dot_ap | 0.4368 |
527
+ | manhattan_accuracy | 0.6777 |
528
+ | manhattan_accuracy_threshold | 223.6764 |
529
+ | manhattan_f1 | 0.524 |
530
+ | manhattan_f1_threshold | 372.314 |
531
+ | manhattan_precision | 0.3848 |
532
+ | manhattan_recall | 0.8208 |
533
+ | manhattan_ap | 0.4389 |
534
+ | euclidean_accuracy | 0.6758 |
535
+ | euclidean_accuracy_threshold | 9.3773 |
536
+ | euclidean_f1 | 0.5225 |
537
+ | euclidean_f1_threshold | 17.321 |
538
+ | euclidean_precision | 0.3796 |
539
+ | euclidean_recall | 0.8382 |
540
+ | euclidean_ap | 0.4369 |
541
+ | max_accuracy | 0.6777 |
542
+ | max_accuracy_threshold | 724.108 |
543
+ | max_f1 | 0.524 |
544
+ | max_f1_threshold | 618.075 |
545
+ | max_precision | 0.3848 |
546
+ | max_recall | 0.8382 |
547
+ | **max_ap** | **0.4389** |
548
+
549
+ #### Binary Classification
550
+ * Dataset: `Qnli-dev`
551
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
552
+
553
+ | Metric | Value |
554
+ |:-----------------------------|:-----------|
555
+ | cosine_accuracy | 0.6465 |
556
+ | cosine_accuracy_threshold | 0.8057 |
557
+ | cosine_f1 | 0.6688 |
558
+ | cosine_f1_threshold | 0.7187 |
559
+ | cosine_precision | 0.5389 |
560
+ | cosine_recall | 0.8814 |
561
+ | cosine_ap | 0.6721 |
562
+ | dot_accuracy | 0.6465 |
563
+ | dot_accuracy_threshold | 618.8644 |
564
+ | dot_f1 | 0.6688 |
565
+ | dot_f1_threshold | 552.026 |
566
+ | dot_precision | 0.5389 |
567
+ | dot_recall | 0.8814 |
568
+ | dot_ap | 0.6721 |
569
+ | manhattan_accuracy | 0.6484 |
570
+ | manhattan_accuracy_threshold | 386.5891 |
571
+ | manhattan_f1 | 0.6646 |
572
+ | manhattan_f1_threshold | 462.6091 |
573
+ | manhattan_precision | 0.5303 |
574
+ | manhattan_recall | 0.8898 |
575
+ | manhattan_ap | 0.6725 |
576
+ | euclidean_accuracy | 0.6465 |
577
+ | euclidean_accuracy_threshold | 17.2753 |
578
+ | euclidean_f1 | 0.6688 |
579
+ | euclidean_f1_threshold | 20.7871 |
580
+ | euclidean_precision | 0.5389 |
581
+ | euclidean_recall | 0.8814 |
582
+ | euclidean_ap | 0.6721 |
583
+ | max_accuracy | 0.6484 |
584
+ | max_accuracy_threshold | 618.8644 |
585
+ | max_f1 | 0.6688 |
586
+ | max_f1_threshold | 552.026 |
587
+ | max_precision | 0.5389 |
588
+ | max_recall | 0.8898 |
589
+ | **max_ap** | **0.6725** |
590
+
591
+ <!--
592
+ ## Bias, Risks and Limitations
593
+
594
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
595
+ -->
596
+
597
+ <!--
598
+ ### Recommendations
599
+
600
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
601
+ -->
602
+
603
+ ## Training Details
604
+
605
+ ### Training Dataset
606
+
607
+ #### Unnamed Dataset
608
+
609
+
610
+ * Size: 32,500 training samples
611
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
612
+ * Approximate statistics based on the first 1000 samples:
613
+ | | sentence1 | sentence2 |
614
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
615
+ | type | string | string |
616
+ | details | <ul><li>min: 4 tokens</li><li>mean: 29.39 tokens</li><li>max: 323 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 54.42 tokens</li><li>max: 423 tokens</li></ul> |
617
+ * Samples:
618
+ | sentence1 | sentence2 |
619
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
620
+ | <code>In which London road is Harrod’s department store?</code> | <code>Harrods, Brompton Road, London | Shopping/Department Stores in London | LondonTown.com Opening Times Britain's most famous store and possibly the most famous store in the world, Harrods features on many tourist 'must-see' lists - and with good reason. Its humble beginnings date back to 1849, when Charles Henry Harrod opened a small East End grocer and tea merchant business that emphasised impeccable service over value. Today, it occupies a vast seven floor site in London's fashionable Knightsbridge and boasts a phenomenal range of products from pianos and cooking pans to fashion and perfumery. The luxurious Urban Retreat can be found on the sixth floor while newer departments include Superbrands, with 17 boutiques from top international designers, and Salon du Parfums, housing some of the most exceptional and exclusive perfumes in the world. The Food Hall is ostentatious to the core and mouth-wateringly exotic, and the store as a whole is well served with 27 restaurants. At Christmas time the Brompton Road windows are transformed into a magical winter wonderland and Father Christmas takes up residence at the enchanting Christmas Grotto. The summer and winter sales are calendar events in the shopping year, and although both sales are extremely crowded there are some great bargains on offer. �</code> |
621
+ | <code>e.&#9;in solids the atoms are closely locked in position and can only vibrate, in liquids the atoms and molecules are more loosely connected and can collide with and move past one another, while in gases the atoms or molecules are free to move independently, colliding frequently.</code> | <code>Within a substance, atoms that collide frequently and move independently of one another are most likely in a gas</code> |
622
+ | <code>Joe Cole was unable to join West Bromwich Albion .</code> | <code>On 16th October Joe Cole took a long hard look at himself realising that he would never get the opportunity to join West Bromwich Albion and joined Coventry City instead.</code> |
623
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
624
+ ```json
625
+ {'guide': SentenceTransformer(
626
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
627
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
628
+ (2): Normalize()
629
+ ), 'temperature': 0.025}
630
+ ```
631
+
632
+ ### Training Hyperparameters
633
+ #### Non-Default Hyperparameters
634
+
635
+ - `eval_strategy`: steps
636
+ - `per_device_train_batch_size`: 32
637
+ - `per_device_eval_batch_size`: 256
638
+ - `lr_scheduler_type`: cosine_with_min_lr
639
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
640
+ - `warmup_ratio`: 0.33
641
+ - `save_safetensors`: False
642
+ - `fp16`: True
643
+ - `push_to_hub`: True
644
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
645
+ - `hub_strategy`: all_checkpoints
646
+ - `batch_sampler`: no_duplicates
647
+
648
+ #### All Hyperparameters
649
+ <details><summary>Click to expand</summary>
650
+
651
+ - `overwrite_output_dir`: False
652
+ - `do_predict`: False
653
+ - `eval_strategy`: steps
654
+ - `prediction_loss_only`: True
655
+ - `per_device_train_batch_size`: 32
656
+ - `per_device_eval_batch_size`: 256
657
+ - `per_gpu_train_batch_size`: None
658
+ - `per_gpu_eval_batch_size`: None
659
+ - `gradient_accumulation_steps`: 1
660
+ - `eval_accumulation_steps`: None
661
+ - `torch_empty_cache_steps`: None
662
+ - `learning_rate`: 5e-05
663
+ - `weight_decay`: 0.0
664
+ - `adam_beta1`: 0.9
665
+ - `adam_beta2`: 0.999
666
+ - `adam_epsilon`: 1e-08
667
+ - `max_grad_norm`: 1.0
668
+ - `num_train_epochs`: 3
669
+ - `max_steps`: -1
670
+ - `lr_scheduler_type`: cosine_with_min_lr
671
+ - `lr_scheduler_kwargs`: {'num_cycles': 0.5, 'min_lr': 3.3333333333333337e-06}
672
+ - `warmup_ratio`: 0.33
673
+ - `warmup_steps`: 0
674
+ - `log_level`: passive
675
+ - `log_level_replica`: warning
676
+ - `log_on_each_node`: True
677
+ - `logging_nan_inf_filter`: True
678
+ - `save_safetensors`: False
679
+ - `save_on_each_node`: False
680
+ - `save_only_model`: False
681
+ - `restore_callback_states_from_checkpoint`: False
682
+ - `no_cuda`: False
683
+ - `use_cpu`: False
684
+ - `use_mps_device`: False
685
+ - `seed`: 42
686
+ - `data_seed`: None
687
+ - `jit_mode_eval`: False
688
+ - `use_ipex`: False
689
+ - `bf16`: False
690
+ - `fp16`: True
691
+ - `fp16_opt_level`: O1
692
+ - `half_precision_backend`: auto
693
+ - `bf16_full_eval`: False
694
+ - `fp16_full_eval`: False
695
+ - `tf32`: None
696
+ - `local_rank`: 0
697
+ - `ddp_backend`: None
698
+ - `tpu_num_cores`: None
699
+ - `tpu_metrics_debug`: False
700
+ - `debug`: []
701
+ - `dataloader_drop_last`: False
702
+ - `dataloader_num_workers`: 0
703
+ - `dataloader_prefetch_factor`: None
704
+ - `past_index`: -1
705
+ - `disable_tqdm`: False
706
+ - `remove_unused_columns`: True
707
+ - `label_names`: None
708
+ - `load_best_model_at_end`: False
709
+ - `ignore_data_skip`: False
710
+ - `fsdp`: []
711
+ - `fsdp_min_num_params`: 0
712
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
713
+ - `fsdp_transformer_layer_cls_to_wrap`: None
714
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
715
+ - `deepspeed`: None
716
+ - `label_smoothing_factor`: 0.0
717
+ - `optim`: adamw_torch
718
+ - `optim_args`: None
719
+ - `adafactor`: False
720
+ - `group_by_length`: False
721
+ - `length_column_name`: length
722
+ - `ddp_find_unused_parameters`: None
723
+ - `ddp_bucket_cap_mb`: None
724
+ - `ddp_broadcast_buffers`: False
725
+ - `dataloader_pin_memory`: True
726
+ - `dataloader_persistent_workers`: False
727
+ - `skip_memory_metrics`: True
728
+ - `use_legacy_prediction_loop`: False
729
+ - `push_to_hub`: True
730
+ - `resume_from_checkpoint`: None
731
+ - `hub_model_id`: bobox/DeBERTa3-s-CustomPoolin-toytest4-step1-checkpoints-tmp
732
+ - `hub_strategy`: all_checkpoints
733
+ - `hub_private_repo`: False
734
+ - `hub_always_push`: False
735
+ - `gradient_checkpointing`: False
736
+ - `gradient_checkpointing_kwargs`: None
737
+ - `include_inputs_for_metrics`: False
738
+ - `eval_do_concat_batches`: True
739
+ - `fp16_backend`: auto
740
+ - `push_to_hub_model_id`: None
741
+ - `push_to_hub_organization`: None
742
+ - `mp_parameters`:
743
+ - `auto_find_batch_size`: False
744
+ - `full_determinism`: False
745
+ - `torchdynamo`: None
746
+ - `ray_scope`: last
747
+ - `ddp_timeout`: 1800
748
+ - `torch_compile`: False
749
+ - `torch_compile_backend`: None
750
+ - `torch_compile_mode`: None
751
+ - `dispatch_batches`: None
752
+ - `split_batches`: None
753
+ - `include_tokens_per_second`: False
754
+ - `include_num_input_tokens_seen`: False
755
+ - `neftune_noise_alpha`: None
756
+ - `optim_target_modules`: None
757
+ - `batch_eval_metrics`: False
758
+ - `eval_on_start`: False
759
+ - `eval_use_gather_object`: False
760
+ - `batch_sampler`: no_duplicates
761
+ - `multi_dataset_batch_sampler`: proportional
762
+
763
+ </details>
764
+
765
+ ### Training Logs
766
+ <details><summary>Click to expand</summary>
767
+
768
+ | Epoch | Step | Training Loss | sts-test_spearman_cosine | allNLI-dev_max_ap | Qnli-dev_max_ap |
769
+ |:------:|:----:|:-------------:|:------------------------:|:-----------------:|:---------------:|
770
+ | 0.0010 | 1 | 6.0688 | - | - | - |
771
+ | 0.0020 | 2 | 7.5576 | - | - | - |
772
+ | 0.0030 | 3 | 4.6849 | - | - | - |
773
+ | 0.0039 | 4 | 5.4503 | - | - | - |
774
+ | 0.0049 | 5 | 5.6057 | - | - | - |
775
+ | 0.0059 | 6 | 6.3049 | - | - | - |
776
+ | 0.0069 | 7 | 6.8336 | - | - | - |
777
+ | 0.0079 | 8 | 5.0777 | - | - | - |
778
+ | 0.0089 | 9 | 4.8358 | - | - | - |
779
+ | 0.0098 | 10 | 4.641 | - | - | - |
780
+ | 0.0108 | 11 | 4.828 | - | - | - |
781
+ | 0.0118 | 12 | 5.2269 | - | - | - |
782
+ | 0.0128 | 13 | 5.6772 | - | - | - |
783
+ | 0.0138 | 14 | 5.1422 | - | - | - |
784
+ | 0.0148 | 15 | 6.2469 | - | - | - |
785
+ | 0.0157 | 16 | 4.6802 | - | - | - |
786
+ | 0.0167 | 17 | 4.5492 | - | - | - |
787
+ | 0.0177 | 18 | 4.8062 | - | - | - |
788
+ | 0.0187 | 19 | 7.5141 | - | - | - |
789
+ | 0.0197 | 20 | 5.5202 | - | - | - |
790
+ | 0.0207 | 21 | 6.5025 | - | - | - |
791
+ | 0.0217 | 22 | 7.318 | - | - | - |
792
+ | 0.0226 | 23 | 4.6458 | - | - | - |
793
+ | 0.0236 | 24 | 4.6191 | - | - | - |
794
+ | 0.0246 | 25 | 4.3159 | - | - | - |
795
+ | 0.0256 | 26 | 6.3677 | - | - | - |
796
+ | 0.0266 | 27 | 5.6052 | - | - | - |
797
+ | 0.0276 | 28 | 4.196 | - | - | - |
798
+ | 0.0285 | 29 | 4.4802 | - | - | - |
799
+ | 0.0295 | 30 | 4.9193 | - | - | - |
800
+ | 0.0305 | 31 | 4.0996 | - | - | - |
801
+ | 0.0315 | 32 | 5.6307 | - | - | - |
802
+ | 0.0325 | 33 | 4.5745 | - | - | - |
803
+ | 0.0335 | 34 | 4.4514 | - | - | - |
804
+ | 0.0344 | 35 | 4.0617 | - | - | - |
805
+ | 0.0354 | 36 | 5.0298 | - | - | - |
806
+ | 0.0364 | 37 | 3.9815 | - | - | - |
807
+ | 0.0374 | 38 | 4.0871 | - | - | - |
808
+ | 0.0384 | 39 | 4.2378 | - | - | - |
809
+ | 0.0394 | 40 | 3.8226 | - | - | - |
810
+ | 0.0404 | 41 | 4.3519 | - | - | - |
811
+ | 0.0413 | 42 | 3.6345 | - | - | - |
812
+ | 0.0423 | 43 | 5.0829 | - | - | - |
813
+ | 0.0433 | 44 | 4.6701 | - | - | - |
814
+ | 0.0443 | 45 | 4.1371 | - | - | - |
815
+ | 0.0453 | 46 | 4.2418 | - | - | - |
816
+ | 0.0463 | 47 | 4.4766 | - | - | - |
817
+ | 0.0472 | 48 | 4.4797 | - | - | - |
818
+ | 0.0482 | 49 | 3.8471 | - | - | - |
819
+ | 0.0492 | 50 | 4.3194 | - | - | - |
820
+ | 0.0502 | 51 | 3.9426 | - | - | - |
821
+ | 0.0512 | 52 | 3.5333 | - | - | - |
822
+ | 0.0522 | 53 | 4.2426 | - | - | - |
823
+ | 0.0531 | 54 | 3.9816 | - | - | - |
824
+ | 0.0541 | 55 | 3.663 | - | - | - |
825
+ | 0.0551 | 56 | 3.9057 | - | - | - |
826
+ | 0.0561 | 57 | 4.0345 | - | - | - |
827
+ | 0.0571 | 58 | 3.5233 | - | - | - |
828
+ | 0.0581 | 59 | 3.7999 | - | - | - |
829
+ | 0.0591 | 60 | 3.1885 | - | - | - |
830
+ | 0.0600 | 61 | 3.6013 | - | - | - |
831
+ | 0.0610 | 62 | 3.392 | - | - | - |
832
+ | 0.0620 | 63 | 3.3814 | - | - | - |
833
+ | 0.0630 | 64 | 4.0428 | - | - | - |
834
+ | 0.0640 | 65 | 3.7825 | - | - | - |
835
+ | 0.0650 | 66 | 3.4181 | - | - | - |
836
+ | 0.0659 | 67 | 3.7793 | - | - | - |
837
+ | 0.0669 | 68 | 3.8344 | - | - | - |
838
+ | 0.0679 | 69 | 3.2165 | - | - | - |
839
+ | 0.0689 | 70 | 3.3811 | - | - | - |
840
+ | 0.0699 | 71 | 3.5984 | - | - | - |
841
+ | 0.0709 | 72 | 3.8583 | - | - | - |
842
+ | 0.0719 | 73 | 3.296 | - | - | - |
843
+ | 0.0728 | 74 | 2.7661 | - | - | - |
844
+ | 0.0738 | 75 | 2.9805 | - | - | - |
845
+ | 0.0748 | 76 | 2.566 | - | - | - |
846
+ | 0.0758 | 77 | 3.258 | - | - | - |
847
+ | 0.0768 | 78 | 3.3804 | - | - | - |
848
+ | 0.0778 | 79 | 2.8828 | - | - | - |
849
+ | 0.0787 | 80 | 3.1077 | - | - | - |
850
+ | 0.0797 | 81 | 2.9441 | - | - | - |
851
+ | 0.0807 | 82 | 2.9465 | - | - | - |
852
+ | 0.0817 | 83 | 2.7088 | - | - | - |
853
+ | 0.0827 | 84 | 2.9215 | - | - | - |
854
+ | 0.0837 | 85 | 3.4698 | - | - | - |
855
+ | 0.0846 | 86 | 2.2414 | - | - | - |
856
+ | 0.0856 | 87 | 3.1601 | - | - | - |
857
+ | 0.0866 | 88 | 2.7714 | - | - | - |
858
+ | 0.0876 | 89 | 3.0311 | - | - | - |
859
+ | 0.0886 | 90 | 3.0336 | - | - | - |
860
+ | 0.0896 | 91 | 1.9358 | - | - | - |
861
+ | 0.0906 | 92 | 2.6031 | - | - | - |
862
+ | 0.0915 | 93 | 2.7515 | - | - | - |
863
+ | 0.0925 | 94 | 2.8496 | - | - | - |
864
+ | 0.0935 | 95 | 1.8015 | - | - | - |
865
+ | 0.0945 | 96 | 2.8138 | - | - | - |
866
+ | 0.0955 | 97 | 2.0597 | - | - | - |
867
+ | 0.0965 | 98 | 2.1053 | - | - | - |
868
+ | 0.0974 | 99 | 2.6785 | - | - | - |
869
+ | 0.0984 | 100 | 2.588 | - | - | - |
870
+ | 0.0994 | 101 | 2.0099 | - | - | - |
871
+ | 0.1004 | 102 | 2.7947 | - | - | - |
872
+ | 0.1014 | 103 | 2.3274 | - | - | - |
873
+ | 0.1024 | 104 | 2.2545 | - | - | - |
874
+ | 0.1033 | 105 | 2.4575 | - | - | - |
875
+ | 0.1043 | 106 | 2.4413 | - | - | - |
876
+ | 0.1053 | 107 | 2.3185 | - | - | - |
877
+ | 0.1063 | 108 | 2.1577 | - | - | - |
878
+ | 0.1073 | 109 | 2.1278 | - | - | - |
879
+ | 0.1083 | 110 | 2.0967 | - | - | - |
880
+ | 0.1093 | 111 | 2.6142 | - | - | - |
881
+ | 0.1102 | 112 | 1.8553 | - | - | - |
882
+ | 0.1112 | 113 | 2.1523 | - | - | - |
883
+ | 0.1122 | 114 | 2.1726 | - | - | - |
884
+ | 0.1132 | 115 | 1.8564 | - | - | - |
885
+ | 0.1142 | 116 | 1.8413 | - | - | - |
886
+ | 0.1152 | 117 | 2.0441 | - | - | - |
887
+ | 0.1161 | 118 | 2.2159 | - | - | - |
888
+ | 0.1171 | 119 | 2.6779 | - | - | - |
889
+ | 0.1181 | 120 | 2.2976 | - | - | - |
890
+ | 0.1191 | 121 | 1.9407 | - | - | - |
891
+ | 0.1201 | 122 | 1.9019 | - | - | - |
892
+ | 0.1211 | 123 | 2.2149 | - | - | - |
893
+ | 0.1220 | 124 | 1.6823 | - | - | - |
894
+ | 0.1230 | 125 | 1.8402 | - | - | - |
895
+ | 0.1240 | 126 | 1.6914 | - | - | - |
896
+ | 0.125 | 127 | 2.1626 | - | - | - |
897
+ | 0.1260 | 128 | 1.6414 | - | - | - |
898
+ | 0.1270 | 129 | 2.2043 | - | - | - |
899
+ | 0.1280 | 130 | 1.9987 | - | - | - |
900
+ | 0.1289 | 131 | 1.8868 | - | - | - |
901
+ | 0.1299 | 132 | 1.8262 | - | - | - |
902
+ | 0.1309 | 133 | 2.0404 | - | - | - |
903
+ | 0.1319 | 134 | 1.9134 | - | - | - |
904
+ | 0.1329 | 135 | 2.3725 | - | - | - |
905
+ | 0.1339 | 136 | 1.4127 | - | - | - |
906
+ | 0.1348 | 137 | 1.6876 | - | - | - |
907
+ | 0.1358 | 138 | 1.8376 | - | - | - |
908
+ | 0.1368 | 139 | 1.6992 | - | - | - |
909
+ | 0.1378 | 140 | 1.5032 | - | - | - |
910
+ | 0.1388 | 141 | 2.0334 | - | - | - |
911
+ | 0.1398 | 142 | 2.3581 | - | - | - |
912
+ | 0.1407 | 143 | 1.4236 | - | - | - |
913
+ | 0.1417 | 144 | 2.202 | - | - | - |
914
+ | 0.1427 | 145 | 1.7654 | - | - | - |
915
+ | 0.1437 | 146 | 1.5748 | - | - | - |
916
+ | 0.1447 | 147 | 1.7996 | - | - | - |
917
+ | 0.1457 | 148 | 1.7517 | - | - | - |
918
+ | 0.1467 | 149 | 1.8933 | - | - | - |
919
+ | 0.1476 | 150 | 1.2836 | - | - | - |
920
+ | 0.1486 | 151 | 1.7145 | - | - | - |
921
+ | 0.1496 | 152 | 1.6499 | - | - | - |
922
+ | 0.1506 | 153 | 1.8273 | 0.4057 | 0.4389 | 0.6725 |
923
+ | 0.1516 | 154 | 2.2859 | - | - | - |
924
+ | 0.1526 | 155 | 1.0833 | - | - | - |
925
+ | 0.1535 | 156 | 1.6829 | - | - | - |
926
+ | 0.1545 | 157 | 2.1464 | - | - | - |
927
+ | 0.1555 | 158 | 1.745 | - | - | - |
928
+ | 0.1565 | 159 | 1.7319 | - | - | - |
929
+ | 0.1575 | 160 | 1.6968 | - | - | - |
930
+ | 0.1585 | 161 | 1.7401 | - | - | - |
931
+ | 0.1594 | 162 | 1.729 | - | - | - |
932
+ | 0.1604 | 163 | 2.0782 | - | - | - |
933
+ | 0.1614 | 164 | 2.6545 | - | - | - |
934
+ | 0.1624 | 165 | 1.4045 | - | - | - |
935
+ | 0.1634 | 166 | 1.2937 | - | - | - |
936
+ | 0.1644 | 167 | 1.1171 | - | - | - |
937
+ | 0.1654 | 168 | 1.3537 | - | - | - |
938
+ | 0.1663 | 169 | 1.7028 | - | - | - |
939
+ | 0.1673 | 170 | 1.4143 | - | - | - |
940
+ | 0.1683 | 171 | 1.8648 | - | - | - |
941
+ | 0.1693 | 172 | 1.6768 | - | - | - |
942
+ | 0.1703 | 173 | 1.9528 | - | - | - |
943
+ | 0.1713 | 174 | 1.1718 | - | - | - |
944
+ | 0.1722 | 175 | 1.8176 | - | - | - |
945
+ | 0.1732 | 176 | 0.8439 | - | - | - |
946
+ | 0.1742 | 177 | 1.5092 | - | - | - |
947
+ | 0.1752 | 178 | 1.1947 | - | - | - |
948
+ | 0.1762 | 179 | 1.6395 | - | - | - |
949
+ | 0.1772 | 180 | 1.4394 | - | - | - |
950
+ | 0.1781 | 181 | 1.7548 | - | - | - |
951
+ | 0.1791 | 182 | 1.1181 | - | - | - |
952
+ | 0.1801 | 183 | 1.0271 | - | - | - |
953
+ | 0.1811 | 184 | 2.3108 | - | - | - |
954
+ | 0.1821 | 185 | 2.1242 | - | - | - |
955
+ | 0.1831 | 186 | 1.9822 | - | - | - |
956
+ | 0.1841 | 187 | 2.3605 | - | - | - |
957
+ | 0.1850 | 188 | 1.5251 | - | - | - |
958
+ | 0.1860 | 189 | 1.2351 | - | - | - |
959
+ | 0.1870 | 190 | 1.5859 | - | - | - |
960
+ | 0.1880 | 191 | 1.8056 | - | - | - |
961
+ | 0.1890 | 192 | 1.349 | - | - | - |
962
+ | 0.1900 | 193 | 0.893 | - | - | - |
963
+ | 0.1909 | 194 | 1.5122 | - | - | - |
964
+ | 0.1919 | 195 | 1.3875 | - | - | - |
965
+ | 0.1929 | 196 | 1.29 | - | - | - |
966
+ | 0.1939 | 197 | 2.2931 | - | - | - |
967
+ | 0.1949 | 198 | 1.2663 | - | - | - |
968
+ | 0.1959 | 199 | 1.9712 | - | - | - |
969
+ | 0.1969 | 200 | 2.3307 | - | - | - |
970
+ | 0.1978 | 201 | 1.6544 | - | - | - |
971
+ | 0.1988 | 202 | 1.638 | - | - | - |
972
+ | 0.1998 | 203 | 1.3412 | - | - | - |
973
+ | 0.2008 | 204 | 1.4454 | - | - | - |
974
+ | 0.2018 | 205 | 1.5437 | - | - | - |
975
+ | 0.2028 | 206 | 1.4921 | - | - | - |
976
+ | 0.2037 | 207 | 1.4298 | - | - | - |
977
+ | 0.2047 | 208 | 1.6174 | - | - | - |
978
+ | 0.2057 | 209 | 1.4137 | - | - | - |
979
+ | 0.2067 | 210 | 1.5652 | - | - | - |
980
+ | 0.2077 | 211 | 1.1631 | - | - | - |
981
+ | 0.2087 | 212 | 1.2351 | - | - | - |
982
+ | 0.2096 | 213 | 1.7537 | - | - | - |
983
+ | 0.2106 | 214 | 1.3186 | - | - | - |
984
+ | 0.2116 | 215 | 1.2258 | - | - | - |
985
+ | 0.2126 | 216 | 0.7695 | - | - | - |
986
+ | 0.2136 | 217 | 1.2775 | - | - | - |
987
+ | 0.2146 | 218 | 1.6795 | - | - | - |
988
+ | 0.2156 | 219 | 1.2862 | - | - | - |
989
+ | 0.2165 | 220 | 1.1723 | - | - | - |
990
+ | 0.2175 | 221 | 1.3322 | - | - | - |
991
+ | 0.2185 | 222 | 1.7564 | - | - | - |
992
+ | 0.2195 | 223 | 1.1071 | - | - | - |
993
+ | 0.2205 | 224 | 1.2011 | - | - | - |
994
+ | 0.2215 | 225 | 1.2303 | - | - | - |
995
+ | 0.2224 | 226 | 1.212 | - | - | - |
996
+ | 0.2234 | 227 | 1.0117 | - | - | - |
997
+ | 0.2244 | 228 | 1.1907 | - | - | - |
998
+ | 0.2254 | 229 | 2.1293 | - | - | - |
999
+ | 0.2264 | 230 | 1.3063 | - | - | - |
1000
+ | 0.2274 | 231 | 1.2841 | - | - | - |
1001
+ | 0.2283 | 232 | 1.3778 | - | - | - |
1002
+ | 0.2293 | 233 | 1.2242 | - | - | - |
1003
+ | 0.2303 | 234 | 0.9227 | - | - | - |
1004
+ | 0.2313 | 235 | 1.2221 | - | - | - |
1005
+ | 0.2323 | 236 | 2.1041 | - | - | - |
1006
+ | 0.2333 | 237 | 1.3341 | - | - | - |
1007
+ | 0.2343 | 238 | 1.0876 | - | - | - |
1008
+ | 0.2352 | 239 | 1.3328 | - | - | - |
1009
+ | 0.2362 | 240 | 1.2958 | - | - | - |
1010
+ | 0.2372 | 241 | 1.1522 | - | - | - |
1011
+ | 0.2382 | 242 | 1.7942 | - | - | - |
1012
+ | 0.2392 | 243 | 1.1325 | - | - | - |
1013
+ | 0.2402 | 244 | 1.6466 | - | - | - |
1014
+ | 0.2411 | 245 | 1.4608 | - | - | - |
1015
+ | 0.2421 | 246 | 0.6375 | - | - | - |
1016
+ | 0.2431 | 247 | 2.0177 | - | - | - |
1017
+ | 0.2441 | 248 | 1.2069 | - | - | - |
1018
+ | 0.2451 | 249 | 0.7639 | - | - | - |
1019
+ | 0.2461 | 250 | 1.3465 | - | - | - |
1020
+ | 0.2470 | 251 | 1.064 | - | - | - |
1021
+ | 0.2480 | 252 | 1.3757 | - | - | - |
1022
+ | 0.2490 | 253 | 1.612 | - | - | - |
1023
+ | 0.25 | 254 | 0.7917 | - | - | - |
1024
+ | 0.2510 | 255 | 1.5515 | - | - | - |
1025
+ | 0.2520 | 256 | 0.799 | - | - | - |
1026
+ | 0.2530 | 257 | 0.9882 | - | - | - |
1027
+ | 0.2539 | 258 | 1.1814 | - | - | - |
1028
+ | 0.2549 | 259 | 0.6394 | - | - | - |
1029
+ | 0.2559 | 260 | 1.4756 | - | - | - |
1030
+ | 0.2569 | 261 | 0.5338 | - | - | - |
1031
+ | 0.2579 | 262 | 0.9779 | - | - | - |
1032
+ | 0.2589 | 263 | 1.5307 | - | - | - |
1033
+ | 0.2598 | 264 | 1.1213 | - | - | - |
1034
+ | 0.2608 | 265 | 0.9482 | - | - | - |
1035
+ | 0.2618 | 266 | 0.9599 | - | - | - |
1036
+ | 0.2628 | 267 | 1.4455 | - | - | - |
1037
+ | 0.2638 | 268 | 1.6496 | - | - | - |
1038
+ | 0.2648 | 269 | 0.7402 | - | - | - |
1039
+ | 0.2657 | 270 | 0.7835 | - | - | - |
1040
+ | 0.2667 | 271 | 0.7821 | - | - | - |
1041
+ | 0.2677 | 272 | 1.5422 | - | - | - |
1042
+ | 0.2687 | 273 | 1.0995 | - | - | - |
1043
+ | 0.2697 | 274 | 1.378 | - | - | - |
1044
+ | 0.2707 | 275 | 1.3562 | - | - | - |
1045
+ | 0.2717 | 276 | 0.7376 | - | - | - |
1046
+ | 0.2726 | 277 | 1.1678 | - | - | - |
1047
+ | 0.2736 | 278 | 1.2989 | - | - | - |
1048
+ | 0.2746 | 279 | 1.9559 | - | - | - |
1049
+ | 0.2756 | 280 | 1.1237 | - | - | - |
1050
+ | 0.2766 | 281 | 0.952 | - | - | - |
1051
+ | 0.2776 | 282 | 1.6629 | - | - | - |
1052
+ | 0.2785 | 283 | 1.871 | - | - | - |
1053
+ | 0.2795 | 284 | 1.5946 | - | - | - |
1054
+ | 0.2805 | 285 | 1.4456 | - | - | - |
1055
+ | 0.2815 | 286 | 1.4085 | - | - | - |
1056
+ | 0.2825 | 287 | 1.1394 | - | - | - |
1057
+ | 0.2835 | 288 | 1.0315 | - | - | - |
1058
+ | 0.2844 | 289 | 1.488 | - | - | - |
1059
+ | 0.2854 | 290 | 1.4006 | - | - | - |
1060
+ | 0.2864 | 291 | 0.9237 | - | - | - |
1061
+ | 0.2874 | 292 | 1.163 | - | - | - |
1062
+ | 0.2884 | 293 | 1.7037 | - | - | - |
1063
+ | 0.2894 | 294 | 0.8715 | - | - | - |
1064
+ | 0.2904 | 295 | 1.2101 | - | - | - |
1065
+ | 0.2913 | 296 | 1.1179 | - | - | - |
1066
+ | 0.2923 | 297 | 1.3986 | - | - | - |
1067
+ | 0.2933 | 298 | 1.7068 | - | - | - |
1068
+ | 0.2943 | 299 | 0.8695 | - | - | - |
1069
+ | 0.2953 | 300 | 1.3778 | - | - | - |
1070
+ | 0.2963 | 301 | 1.2834 | - | - | - |
1071
+ | 0.2972 | 302 | 0.8123 | - | - | - |
1072
+ | 0.2982 | 303 | 1.6521 | - | - | - |
1073
+ | 0.2992 | 304 | 1.1064 | - | - | - |
1074
+ | 0.3002 | 305 | 0.9578 | - | - | - |
1075
+
1076
+ </details>
1077
+
1078
+ ### Framework Versions
1079
+ - Python: 3.10.12
1080
+ - Sentence Transformers: 3.2.1
1081
+ - Transformers: 4.44.2
1082
+ - PyTorch: 2.5.0+cu121
1083
+ - Accelerate: 0.34.2
1084
+ - Datasets: 3.0.2
1085
+ - Tokenizers: 0.19.1
1086
+
1087
+ ## Citation
1088
+
1089
+ ### BibTeX
1090
+
1091
+ #### Sentence Transformers
1092
+ ```bibtex
1093
+ @inproceedings{reimers-2019-sentence-bert,
1094
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1095
+ author = "Reimers, Nils and Gurevych, Iryna",
1096
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1097
+ month = "11",
1098
+ year = "2019",
1099
+ publisher = "Association for Computational Linguistics",
1100
+ url = "https://arxiv.org/abs/1908.10084",
1101
+ }
1102
+ ```
1103
+
1104
+ #### GISTEmbedLoss
1105
+ ```bibtex
1106
+ @misc{solatorio2024gistembed,
1107
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
1108
+ author={Aivin V. Solatorio},
1109
+ year={2024},
1110
+ eprint={2402.16829},
1111
+ archivePrefix={arXiv},
1112
+ primaryClass={cs.LG}
1113
+ }
1114
+ ```
1115
+
1116
+ <!--
1117
+ ## Glossary
1118
+
1119
+ *Clearly define terms in order to be accessible across audiences.*
1120
+ -->
1121
+
1122
+ <!--
1123
+ ## Model Card Authors
1124
+
1125
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1126
+ -->
1127
+
1128
+ <!--
1129
+ ## Model Card Contact
1130
+
1131
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1132
+ -->
checkpoint-305/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
checkpoint-305/config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.44.2",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
checkpoint-305/config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
checkpoint-305/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_AdvancedWeightedPooling",
12
+ "type": "__main__.AdvancedWeightedPooling"
13
+ }
14
+ ]
checkpoint-305/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40d96590c8a23f81ef1d1d7cf8887c6c2fe8f1e82ec3125086369be9140a5064
3
+ size 141824506
checkpoint-305/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5cdb4c1fd9f1c98ad21daa454a7a507fe4e5cffc616dd6827e0a70dade6a898
3
+ size 565251810
checkpoint-305/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08264892b9cb4b7a4e4c175ee7bfe5b88ecc57e10f027b10e81e11e42c424200
3
+ size 14244
checkpoint-305/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30f526cd22b9ea7b7b16277caa0b815306677474381771eb078b44e3891ede9f
3
+ size 1256
checkpoint-305/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-305/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
checkpoint-305/spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
checkpoint-305/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-305/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }
checkpoint-305/trainer_state.json ADDED
@@ -0,0 +1,2256 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.3001968503937008,
5
+ "eval_steps": 153,
6
+ "global_step": 305,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.000984251968503937,
13
+ "grad_norm": NaN,
14
+ "learning_rate": 0.0,
15
+ "loss": 6.0688,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.001968503937007874,
20
+ "grad_norm": NaN,
21
+ "learning_rate": 0.0,
22
+ "loss": 7.5576,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.002952755905511811,
27
+ "grad_norm": Infinity,
28
+ "learning_rate": 0.0,
29
+ "loss": 4.6849,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.003937007874015748,
34
+ "grad_norm": Infinity,
35
+ "learning_rate": 0.0,
36
+ "loss": 5.4503,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.004921259842519685,
41
+ "grad_norm": 32.947513580322266,
42
+ "learning_rate": 9.940357852882705e-10,
43
+ "loss": 5.6057,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.005905511811023622,
48
+ "grad_norm": 40.39633560180664,
49
+ "learning_rate": 1.988071570576541e-09,
50
+ "loss": 6.3049,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.006889763779527559,
55
+ "grad_norm": 44.38965606689453,
56
+ "learning_rate": 2.9821073558648116e-09,
57
+ "loss": 6.8336,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.007874015748031496,
62
+ "grad_norm": 28.46092987060547,
63
+ "learning_rate": 3.976143141153082e-09,
64
+ "loss": 5.0777,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.008858267716535433,
69
+ "grad_norm": 25.817373275756836,
70
+ "learning_rate": 4.970178926441353e-09,
71
+ "loss": 4.8358,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.00984251968503937,
76
+ "grad_norm": 27.06571388244629,
77
+ "learning_rate": 5.964214711729623e-09,
78
+ "loss": 4.641,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.010826771653543307,
83
+ "grad_norm": 27.857297897338867,
84
+ "learning_rate": 6.9582504970178946e-09,
85
+ "loss": 4.828,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.011811023622047244,
90
+ "grad_norm": 29.35377311706543,
91
+ "learning_rate": 7.952286282306164e-09,
92
+ "loss": 5.2269,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.012795275590551181,
97
+ "grad_norm": 32.06682586669922,
98
+ "learning_rate": 8.946322067594435e-09,
99
+ "loss": 5.6772,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.013779527559055118,
104
+ "grad_norm": 30.774206161499023,
105
+ "learning_rate": 9.940357852882705e-09,
106
+ "loss": 5.1422,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.014763779527559055,
111
+ "grad_norm": 45.917659759521484,
112
+ "learning_rate": 1.0934393638170978e-08,
113
+ "loss": 6.2469,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.015748031496062992,
118
+ "grad_norm": 30.189363479614258,
119
+ "learning_rate": 1.1928429423459246e-08,
120
+ "loss": 4.6802,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.01673228346456693,
125
+ "grad_norm": 25.023223876953125,
126
+ "learning_rate": 1.2922465208747517e-08,
127
+ "loss": 4.5492,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.017716535433070866,
132
+ "grad_norm": 26.336748123168945,
133
+ "learning_rate": 1.3916500994035789e-08,
134
+ "loss": 4.8062,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.018700787401574805,
139
+ "grad_norm": 51.350032806396484,
140
+ "learning_rate": 1.4910536779324056e-08,
141
+ "loss": 7.5141,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.01968503937007874,
146
+ "grad_norm": 33.953224182128906,
147
+ "learning_rate": 1.590457256461233e-08,
148
+ "loss": 5.5202,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.02066929133858268,
153
+ "grad_norm": 36.828433990478516,
154
+ "learning_rate": 1.68986083499006e-08,
155
+ "loss": 6.5025,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.021653543307086614,
160
+ "grad_norm": 43.68119812011719,
161
+ "learning_rate": 1.789264413518887e-08,
162
+ "loss": 7.318,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.022637795275590553,
167
+ "grad_norm": 23.9952392578125,
168
+ "learning_rate": 1.888667992047714e-08,
169
+ "loss": 4.6458,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.023622047244094488,
174
+ "grad_norm": 19.315523147583008,
175
+ "learning_rate": 1.988071570576541e-08,
176
+ "loss": 4.6191,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.024606299212598427,
181
+ "grad_norm": 22.747207641601562,
182
+ "learning_rate": 2.087475149105368e-08,
183
+ "loss": 4.3159,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.025590551181102362,
188
+ "grad_norm": 32.90699768066406,
189
+ "learning_rate": 2.1868787276341955e-08,
190
+ "loss": 6.3677,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.0265748031496063,
195
+ "grad_norm": 25.224443435668945,
196
+ "learning_rate": 2.2862823061630224e-08,
197
+ "loss": 5.6052,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.027559055118110236,
202
+ "grad_norm": 16.32752799987793,
203
+ "learning_rate": 2.3856858846918493e-08,
204
+ "loss": 4.196,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.028543307086614175,
209
+ "grad_norm": 19.50617218017578,
210
+ "learning_rate": 2.4850894632206765e-08,
211
+ "loss": 4.4802,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.02952755905511811,
216
+ "grad_norm": 21.613880157470703,
217
+ "learning_rate": 2.5844930417495034e-08,
218
+ "loss": 4.9193,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.03051181102362205,
223
+ "grad_norm": 16.059247970581055,
224
+ "learning_rate": 2.6838966202783303e-08,
225
+ "loss": 4.0996,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.031496062992125984,
230
+ "grad_norm": 27.45536994934082,
231
+ "learning_rate": 2.7833001988071578e-08,
232
+ "loss": 5.6307,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.03248031496062992,
237
+ "grad_norm": 19.249103546142578,
238
+ "learning_rate": 2.8827037773359847e-08,
239
+ "loss": 4.5745,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.03346456692913386,
244
+ "grad_norm": 17.516521453857422,
245
+ "learning_rate": 2.982107355864811e-08,
246
+ "loss": 4.4514,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.0344488188976378,
251
+ "grad_norm": 11.684857368469238,
252
+ "learning_rate": 3.081510934393639e-08,
253
+ "loss": 4.0617,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.03543307086614173,
258
+ "grad_norm": 16.813453674316406,
259
+ "learning_rate": 3.180914512922466e-08,
260
+ "loss": 5.0298,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.03641732283464567,
265
+ "grad_norm": 10.92259407043457,
266
+ "learning_rate": 3.280318091451293e-08,
267
+ "loss": 3.9815,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.03740157480314961,
272
+ "grad_norm": 11.449631690979004,
273
+ "learning_rate": 3.37972166998012e-08,
274
+ "loss": 4.0871,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.038385826771653545,
279
+ "grad_norm": 11.223074913024902,
280
+ "learning_rate": 3.479125248508947e-08,
281
+ "loss": 4.2378,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.03937007874015748,
286
+ "grad_norm": 9.864290237426758,
287
+ "learning_rate": 3.578528827037774e-08,
288
+ "loss": 3.8226,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.040354330708661415,
293
+ "grad_norm": 11.084328651428223,
294
+ "learning_rate": 3.6779324055666005e-08,
295
+ "loss": 4.3519,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.04133858267716536,
300
+ "grad_norm": 9.965168952941895,
301
+ "learning_rate": 3.777335984095428e-08,
302
+ "loss": 3.6345,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.04232283464566929,
307
+ "grad_norm": 18.375577926635742,
308
+ "learning_rate": 3.8767395626242556e-08,
309
+ "loss": 5.0829,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.04330708661417323,
314
+ "grad_norm": 11.232723236083984,
315
+ "learning_rate": 3.976143141153082e-08,
316
+ "loss": 4.6701,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.04429133858267716,
321
+ "grad_norm": 10.896126747131348,
322
+ "learning_rate": 4.0755467196819094e-08,
323
+ "loss": 4.1371,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.045275590551181105,
328
+ "grad_norm": 10.237566947937012,
329
+ "learning_rate": 4.174950298210736e-08,
330
+ "loss": 4.2418,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.04625984251968504,
335
+ "grad_norm": 13.484391212463379,
336
+ "learning_rate": 4.274353876739563e-08,
337
+ "loss": 4.4766,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.047244094488188976,
342
+ "grad_norm": 9.769365310668945,
343
+ "learning_rate": 4.373757455268391e-08,
344
+ "loss": 4.4797,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.04822834645669291,
349
+ "grad_norm": 9.953036308288574,
350
+ "learning_rate": 4.4731610337972176e-08,
351
+ "loss": 3.8471,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.04921259842519685,
356
+ "grad_norm": 15.378759384155273,
357
+ "learning_rate": 4.572564612326045e-08,
358
+ "loss": 4.3194,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.05019685039370079,
363
+ "grad_norm": 9.474905967712402,
364
+ "learning_rate": 4.6719681908548713e-08,
365
+ "loss": 3.9426,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.051181102362204724,
370
+ "grad_norm": 8.882713317871094,
371
+ "learning_rate": 4.7713717693836986e-08,
372
+ "loss": 3.5333,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.05216535433070866,
377
+ "grad_norm": 14.956167221069336,
378
+ "learning_rate": 4.870775347912525e-08,
379
+ "loss": 4.2426,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.0531496062992126,
384
+ "grad_norm": 9.333856582641602,
385
+ "learning_rate": 4.970178926441353e-08,
386
+ "loss": 3.9816,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.054133858267716536,
391
+ "grad_norm": 9.034503936767578,
392
+ "learning_rate": 5.06958250497018e-08,
393
+ "loss": 3.663,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.05511811023622047,
398
+ "grad_norm": 10.114209175109863,
399
+ "learning_rate": 5.168986083499007e-08,
400
+ "loss": 3.9057,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.05610236220472441,
405
+ "grad_norm": 9.802759170532227,
406
+ "learning_rate": 5.268389662027834e-08,
407
+ "loss": 4.0345,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.05708661417322835,
412
+ "grad_norm": 10.378804206848145,
413
+ "learning_rate": 5.3677932405566605e-08,
414
+ "loss": 3.5233,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 0.058070866141732284,
419
+ "grad_norm": 10.237386703491211,
420
+ "learning_rate": 5.467196819085488e-08,
421
+ "loss": 3.7999,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 0.05905511811023622,
426
+ "grad_norm": 11.103493690490723,
427
+ "learning_rate": 5.5666003976143156e-08,
428
+ "loss": 3.1885,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 0.060039370078740155,
433
+ "grad_norm": 11.967391014099121,
434
+ "learning_rate": 5.666003976143142e-08,
435
+ "loss": 3.6013,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 0.0610236220472441,
440
+ "grad_norm": 10.79468822479248,
441
+ "learning_rate": 5.7654075546719694e-08,
442
+ "loss": 3.392,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 0.06200787401574803,
447
+ "grad_norm": 13.18323802947998,
448
+ "learning_rate": 5.864811133200796e-08,
449
+ "loss": 3.3814,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 0.06299212598425197,
454
+ "grad_norm": 15.541842460632324,
455
+ "learning_rate": 5.964214711729623e-08,
456
+ "loss": 4.0428,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 0.0639763779527559,
461
+ "grad_norm": 13.166780471801758,
462
+ "learning_rate": 6.06361829025845e-08,
463
+ "loss": 3.7825,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 0.06496062992125984,
468
+ "grad_norm": 12.48865795135498,
469
+ "learning_rate": 6.163021868787278e-08,
470
+ "loss": 3.4181,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 0.06594488188976377,
475
+ "grad_norm": 14.15877628326416,
476
+ "learning_rate": 6.262425447316104e-08,
477
+ "loss": 3.7793,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 0.06692913385826772,
482
+ "grad_norm": 18.16546630859375,
483
+ "learning_rate": 6.361829025844931e-08,
484
+ "loss": 3.8344,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 0.06791338582677166,
489
+ "grad_norm": 14.758286476135254,
490
+ "learning_rate": 6.461232604373759e-08,
491
+ "loss": 3.2165,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 0.0688976377952756,
496
+ "grad_norm": 15.924538612365723,
497
+ "learning_rate": 6.560636182902586e-08,
498
+ "loss": 3.3811,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 0.06988188976377953,
503
+ "grad_norm": 17.181415557861328,
504
+ "learning_rate": 6.660039761431412e-08,
505
+ "loss": 3.5984,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 0.07086614173228346,
510
+ "grad_norm": 16.691272735595703,
511
+ "learning_rate": 6.75944333996024e-08,
512
+ "loss": 3.8583,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 0.0718503937007874,
517
+ "grad_norm": 17.478809356689453,
518
+ "learning_rate": 6.858846918489067e-08,
519
+ "loss": 3.296,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 0.07283464566929133,
524
+ "grad_norm": 15.149248123168945,
525
+ "learning_rate": 6.958250497017893e-08,
526
+ "loss": 2.7661,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 0.07381889763779527,
531
+ "grad_norm": 15.139701843261719,
532
+ "learning_rate": 7.057654075546721e-08,
533
+ "loss": 2.9805,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 0.07480314960629922,
538
+ "grad_norm": 14.794472694396973,
539
+ "learning_rate": 7.157057654075548e-08,
540
+ "loss": 2.566,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 0.07578740157480315,
545
+ "grad_norm": 15.433120727539062,
546
+ "learning_rate": 7.256461232604374e-08,
547
+ "loss": 3.258,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 0.07677165354330709,
552
+ "grad_norm": 22.558115005493164,
553
+ "learning_rate": 7.355864811133201e-08,
554
+ "loss": 3.3804,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 0.07775590551181102,
559
+ "grad_norm": 19.630340576171875,
560
+ "learning_rate": 7.455268389662029e-08,
561
+ "loss": 2.8828,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 0.07874015748031496,
566
+ "grad_norm": 17.1998233795166,
567
+ "learning_rate": 7.554671968190855e-08,
568
+ "loss": 3.1077,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 0.0797244094488189,
573
+ "grad_norm": 13.535666465759277,
574
+ "learning_rate": 7.654075546719683e-08,
575
+ "loss": 2.9441,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 0.08070866141732283,
580
+ "grad_norm": 14.031355857849121,
581
+ "learning_rate": 7.753479125248511e-08,
582
+ "loss": 2.9465,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 0.08169291338582677,
587
+ "grad_norm": 13.701343536376953,
588
+ "learning_rate": 7.852882703777338e-08,
589
+ "loss": 2.7088,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 0.08267716535433071,
594
+ "grad_norm": 15.102946281433105,
595
+ "learning_rate": 7.952286282306164e-08,
596
+ "loss": 2.9215,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 0.08366141732283465,
601
+ "grad_norm": 20.15928077697754,
602
+ "learning_rate": 8.051689860834992e-08,
603
+ "loss": 3.4698,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 0.08464566929133858,
608
+ "grad_norm": 14.807561874389648,
609
+ "learning_rate": 8.151093439363819e-08,
610
+ "loss": 2.2414,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 0.08562992125984252,
615
+ "grad_norm": 18.390995025634766,
616
+ "learning_rate": 8.250497017892645e-08,
617
+ "loss": 3.1601,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 0.08661417322834646,
622
+ "grad_norm": 17.47426986694336,
623
+ "learning_rate": 8.349900596421472e-08,
624
+ "loss": 2.7714,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 0.08759842519685039,
629
+ "grad_norm": 19.336660385131836,
630
+ "learning_rate": 8.4493041749503e-08,
631
+ "loss": 3.0311,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 0.08858267716535433,
636
+ "grad_norm": 22.308195114135742,
637
+ "learning_rate": 8.548707753479126e-08,
638
+ "loss": 3.0336,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 0.08956692913385826,
643
+ "grad_norm": 16.85112762451172,
644
+ "learning_rate": 8.648111332007953e-08,
645
+ "loss": 1.9358,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 0.09055118110236221,
650
+ "grad_norm": 18.357120513916016,
651
+ "learning_rate": 8.747514910536782e-08,
652
+ "loss": 2.6031,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 0.09153543307086615,
657
+ "grad_norm": 21.082101821899414,
658
+ "learning_rate": 8.846918489065609e-08,
659
+ "loss": 2.7515,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 0.09251968503937008,
664
+ "grad_norm": 21.116485595703125,
665
+ "learning_rate": 8.946322067594435e-08,
666
+ "loss": 2.8496,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 0.09350393700787402,
671
+ "grad_norm": 15.967782020568848,
672
+ "learning_rate": 9.045725646123262e-08,
673
+ "loss": 1.8015,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 0.09448818897637795,
678
+ "grad_norm": 20.430376052856445,
679
+ "learning_rate": 9.14512922465209e-08,
680
+ "loss": 2.8138,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 0.09547244094488189,
685
+ "grad_norm": 18.769359588623047,
686
+ "learning_rate": 9.244532803180916e-08,
687
+ "loss": 2.0597,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 0.09645669291338582,
692
+ "grad_norm": 19.360021591186523,
693
+ "learning_rate": 9.343936381709743e-08,
694
+ "loss": 2.1053,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 0.09744094488188976,
699
+ "grad_norm": 22.48013687133789,
700
+ "learning_rate": 9.44333996023857e-08,
701
+ "loss": 2.6785,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 0.0984251968503937,
706
+ "grad_norm": 21.50940704345703,
707
+ "learning_rate": 9.542743538767397e-08,
708
+ "loss": 2.588,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 0.09940944881889764,
713
+ "grad_norm": 17.409393310546875,
714
+ "learning_rate": 9.642147117296224e-08,
715
+ "loss": 2.0099,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 0.10039370078740158,
720
+ "grad_norm": 22.641996383666992,
721
+ "learning_rate": 9.74155069582505e-08,
722
+ "loss": 2.7947,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 0.10137795275590551,
727
+ "grad_norm": 18.91701889038086,
728
+ "learning_rate": 9.840954274353878e-08,
729
+ "loss": 2.3274,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 0.10236220472440945,
734
+ "grad_norm": 21.317251205444336,
735
+ "learning_rate": 9.940357852882706e-08,
736
+ "loss": 2.2545,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 0.10334645669291338,
741
+ "grad_norm": 24.298248291015625,
742
+ "learning_rate": 1.0039761431411533e-07,
743
+ "loss": 2.4575,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 0.10433070866141732,
748
+ "grad_norm": 19.284252166748047,
749
+ "learning_rate": 1.013916500994036e-07,
750
+ "loss": 2.4413,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 0.10531496062992125,
755
+ "grad_norm": 19.619426727294922,
756
+ "learning_rate": 1.0238568588469187e-07,
757
+ "loss": 2.3185,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 0.1062992125984252,
762
+ "grad_norm": 20.323917388916016,
763
+ "learning_rate": 1.0337972166998014e-07,
764
+ "loss": 2.1577,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 0.10728346456692914,
769
+ "grad_norm": 20.885499954223633,
770
+ "learning_rate": 1.043737574552684e-07,
771
+ "loss": 2.1278,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 0.10826771653543307,
776
+ "grad_norm": 21.687036514282227,
777
+ "learning_rate": 1.0536779324055668e-07,
778
+ "loss": 2.0967,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 0.10925196850393701,
783
+ "grad_norm": 23.031898498535156,
784
+ "learning_rate": 1.0636182902584495e-07,
785
+ "loss": 2.6142,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 0.11023622047244094,
790
+ "grad_norm": 20.64027976989746,
791
+ "learning_rate": 1.0735586481113321e-07,
792
+ "loss": 1.8553,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 0.11122047244094488,
797
+ "grad_norm": 18.268877029418945,
798
+ "learning_rate": 1.0834990059642149e-07,
799
+ "loss": 2.1523,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 0.11220472440944881,
804
+ "grad_norm": 22.190549850463867,
805
+ "learning_rate": 1.0934393638170976e-07,
806
+ "loss": 2.1726,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 0.11318897637795275,
811
+ "grad_norm": 18.314851760864258,
812
+ "learning_rate": 1.1033797216699802e-07,
813
+ "loss": 1.8564,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 0.1141732283464567,
818
+ "grad_norm": 19.17725944519043,
819
+ "learning_rate": 1.1133200795228631e-07,
820
+ "loss": 1.8413,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 0.11515748031496063,
825
+ "grad_norm": 24.518449783325195,
826
+ "learning_rate": 1.1232604373757458e-07,
827
+ "loss": 2.0441,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 0.11614173228346457,
832
+ "grad_norm": 24.78040313720703,
833
+ "learning_rate": 1.1332007952286284e-07,
834
+ "loss": 2.2159,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 0.1171259842519685,
839
+ "grad_norm": 28.552289962768555,
840
+ "learning_rate": 1.1431411530815111e-07,
841
+ "loss": 2.6779,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 0.11811023622047244,
846
+ "grad_norm": 21.049219131469727,
847
+ "learning_rate": 1.1530815109343939e-07,
848
+ "loss": 2.2976,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 0.11909448818897637,
853
+ "grad_norm": 23.962575912475586,
854
+ "learning_rate": 1.1630218687872765e-07,
855
+ "loss": 1.9407,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 0.12007874015748031,
860
+ "grad_norm": 20.15193748474121,
861
+ "learning_rate": 1.1729622266401592e-07,
862
+ "loss": 1.9019,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 0.12106299212598425,
867
+ "grad_norm": 22.563459396362305,
868
+ "learning_rate": 1.182902584493042e-07,
869
+ "loss": 2.2149,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 0.1220472440944882,
874
+ "grad_norm": 16.86638832092285,
875
+ "learning_rate": 1.1928429423459245e-07,
876
+ "loss": 1.6823,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 0.12303149606299213,
881
+ "grad_norm": 20.249584197998047,
882
+ "learning_rate": 1.2027833001988073e-07,
883
+ "loss": 1.8402,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 0.12401574803149606,
888
+ "grad_norm": 19.5487117767334,
889
+ "learning_rate": 1.21272365805169e-07,
890
+ "loss": 1.6914,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 0.125,
895
+ "grad_norm": 19.093505859375,
896
+ "learning_rate": 1.222664015904573e-07,
897
+ "loss": 2.1626,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 0.12598425196850394,
902
+ "grad_norm": 17.598743438720703,
903
+ "learning_rate": 1.2326043737574557e-07,
904
+ "loss": 1.6414,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 0.12696850393700787,
909
+ "grad_norm": 22.712181091308594,
910
+ "learning_rate": 1.2425447316103382e-07,
911
+ "loss": 2.2043,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 0.1279527559055118,
916
+ "grad_norm": 20.516571044921875,
917
+ "learning_rate": 1.2524850894632207e-07,
918
+ "loss": 1.9987,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 0.12893700787401574,
923
+ "grad_norm": 22.58023452758789,
924
+ "learning_rate": 1.2624254473161035e-07,
925
+ "loss": 1.8868,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 0.12992125984251968,
930
+ "grad_norm": 17.800390243530273,
931
+ "learning_rate": 1.2723658051689863e-07,
932
+ "loss": 1.8262,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 0.1309055118110236,
937
+ "grad_norm": 20.7684383392334,
938
+ "learning_rate": 1.282306163021869e-07,
939
+ "loss": 2.0404,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 0.13188976377952755,
944
+ "grad_norm": 20.88973617553711,
945
+ "learning_rate": 1.2922465208747519e-07,
946
+ "loss": 1.9134,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 0.1328740157480315,
951
+ "grad_norm": 21.78728485107422,
952
+ "learning_rate": 1.3021868787276344e-07,
953
+ "loss": 2.3725,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 0.13385826771653545,
958
+ "grad_norm": 16.885374069213867,
959
+ "learning_rate": 1.3121272365805172e-07,
960
+ "loss": 1.4127,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 0.13484251968503938,
965
+ "grad_norm": 18.944265365600586,
966
+ "learning_rate": 1.3220675944333997e-07,
967
+ "loss": 1.6876,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 0.13582677165354332,
972
+ "grad_norm": 18.579692840576172,
973
+ "learning_rate": 1.3320079522862825e-07,
974
+ "loss": 1.8376,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 0.13681102362204725,
979
+ "grad_norm": 19.68845558166504,
980
+ "learning_rate": 1.3419483101391653e-07,
981
+ "loss": 1.6992,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 0.1377952755905512,
986
+ "grad_norm": 21.27842903137207,
987
+ "learning_rate": 1.351888667992048e-07,
988
+ "loss": 1.5032,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 0.13877952755905512,
993
+ "grad_norm": 28.506824493408203,
994
+ "learning_rate": 1.3618290258449306e-07,
995
+ "loss": 2.0334,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 0.13976377952755906,
1000
+ "grad_norm": 24.97527313232422,
1001
+ "learning_rate": 1.3717693836978134e-07,
1002
+ "loss": 2.3581,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 0.140748031496063,
1007
+ "grad_norm": 22.391145706176758,
1008
+ "learning_rate": 1.381709741550696e-07,
1009
+ "loss": 1.4236,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 0.14173228346456693,
1014
+ "grad_norm": 22.977148056030273,
1015
+ "learning_rate": 1.3916500994035787e-07,
1016
+ "loss": 2.202,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 0.14271653543307086,
1021
+ "grad_norm": 25.096189498901367,
1022
+ "learning_rate": 1.4015904572564615e-07,
1023
+ "loss": 1.7654,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 0.1437007874015748,
1028
+ "grad_norm": 28.528640747070312,
1029
+ "learning_rate": 1.4115308151093443e-07,
1030
+ "loss": 1.5748,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 0.14468503937007873,
1035
+ "grad_norm": 30.80350685119629,
1036
+ "learning_rate": 1.421471172962227e-07,
1037
+ "loss": 1.7996,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 0.14566929133858267,
1042
+ "grad_norm": 22.726850509643555,
1043
+ "learning_rate": 1.4314115308151096e-07,
1044
+ "loss": 1.7517,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 0.1466535433070866,
1049
+ "grad_norm": 24.926170349121094,
1050
+ "learning_rate": 1.4413518886679924e-07,
1051
+ "loss": 1.8933,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 0.14763779527559054,
1056
+ "grad_norm": 20.570375442504883,
1057
+ "learning_rate": 1.451292246520875e-07,
1058
+ "loss": 1.2836,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 0.1486220472440945,
1063
+ "grad_norm": 21.08650779724121,
1064
+ "learning_rate": 1.4612326043737577e-07,
1065
+ "loss": 1.7145,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 0.14960629921259844,
1070
+ "grad_norm": 24.37209701538086,
1071
+ "learning_rate": 1.4711729622266402e-07,
1072
+ "loss": 1.6499,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 0.15059055118110237,
1077
+ "grad_norm": 26.801511764526367,
1078
+ "learning_rate": 1.4811133200795232e-07,
1079
+ "loss": 1.8273,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 0.15059055118110237,
1084
+ "eval_Qnli-dev_cosine_accuracy": 0.646484375,
1085
+ "eval_Qnli-dev_cosine_accuracy_threshold": 0.8057259321212769,
1086
+ "eval_Qnli-dev_cosine_ap": 0.6720663622193426,
1087
+ "eval_Qnli-dev_cosine_f1": 0.6688102893890675,
1088
+ "eval_Qnli-dev_cosine_f1_threshold": 0.7187118530273438,
1089
+ "eval_Qnli-dev_cosine_precision": 0.538860103626943,
1090
+ "eval_Qnli-dev_cosine_recall": 0.8813559322033898,
1091
+ "eval_Qnli-dev_dot_accuracy": 0.646484375,
1092
+ "eval_Qnli-dev_dot_accuracy_threshold": 618.8643798828125,
1093
+ "eval_Qnli-dev_dot_ap": 0.672083506527328,
1094
+ "eval_Qnli-dev_dot_f1": 0.6688102893890675,
1095
+ "eval_Qnli-dev_dot_f1_threshold": 552.0260009765625,
1096
+ "eval_Qnli-dev_dot_precision": 0.538860103626943,
1097
+ "eval_Qnli-dev_dot_recall": 0.8813559322033898,
1098
+ "eval_Qnli-dev_euclidean_accuracy": 0.646484375,
1099
+ "eval_Qnli-dev_euclidean_accuracy_threshold": 17.27533721923828,
1100
+ "eval_Qnli-dev_euclidean_ap": 0.6720591998758361,
1101
+ "eval_Qnli-dev_euclidean_f1": 0.6688102893890675,
1102
+ "eval_Qnli-dev_euclidean_f1_threshold": 20.787063598632812,
1103
+ "eval_Qnli-dev_euclidean_precision": 0.538860103626943,
1104
+ "eval_Qnli-dev_euclidean_recall": 0.8813559322033898,
1105
+ "eval_Qnli-dev_manhattan_accuracy": 0.6484375,
1106
+ "eval_Qnli-dev_manhattan_accuracy_threshold": 386.58905029296875,
1107
+ "eval_Qnli-dev_manhattan_ap": 0.6724653688821339,
1108
+ "eval_Qnli-dev_manhattan_f1": 0.6645569620253164,
1109
+ "eval_Qnli-dev_manhattan_f1_threshold": 462.609130859375,
1110
+ "eval_Qnli-dev_manhattan_precision": 0.5303030303030303,
1111
+ "eval_Qnli-dev_manhattan_recall": 0.8898305084745762,
1112
+ "eval_Qnli-dev_max_accuracy": 0.6484375,
1113
+ "eval_Qnli-dev_max_accuracy_threshold": 618.8643798828125,
1114
+ "eval_Qnli-dev_max_ap": 0.6724653688821339,
1115
+ "eval_Qnli-dev_max_f1": 0.6688102893890675,
1116
+ "eval_Qnli-dev_max_f1_threshold": 552.0260009765625,
1117
+ "eval_Qnli-dev_max_precision": 0.538860103626943,
1118
+ "eval_Qnli-dev_max_recall": 0.8898305084745762,
1119
+ "eval_allNLI-dev_cosine_accuracy": 0.67578125,
1120
+ "eval_allNLI-dev_cosine_accuracy_threshold": 0.9427558183670044,
1121
+ "eval_allNLI-dev_cosine_ap": 0.4368751759846574,
1122
+ "eval_allNLI-dev_cosine_f1": 0.5225225225225225,
1123
+ "eval_allNLI-dev_cosine_f1_threshold": 0.8046966791152954,
1124
+ "eval_allNLI-dev_cosine_precision": 0.3795811518324607,
1125
+ "eval_allNLI-dev_cosine_recall": 0.838150289017341,
1126
+ "eval_allNLI-dev_dot_accuracy": 0.67578125,
1127
+ "eval_allNLI-dev_dot_accuracy_threshold": 724.1080322265625,
1128
+ "eval_allNLI-dev_dot_ap": 0.436842886797982,
1129
+ "eval_allNLI-dev_dot_f1": 0.5225225225225225,
1130
+ "eval_allNLI-dev_dot_f1_threshold": 618.074951171875,
1131
+ "eval_allNLI-dev_dot_precision": 0.3795811518324607,
1132
+ "eval_allNLI-dev_dot_recall": 0.838150289017341,
1133
+ "eval_allNLI-dev_euclidean_accuracy": 0.67578125,
1134
+ "eval_allNLI-dev_euclidean_accuracy_threshold": 9.377331733703613,
1135
+ "eval_allNLI-dev_euclidean_ap": 0.4368602200677977,
1136
+ "eval_allNLI-dev_euclidean_f1": 0.5225225225225225,
1137
+ "eval_allNLI-dev_euclidean_f1_threshold": 17.321048736572266,
1138
+ "eval_allNLI-dev_euclidean_precision": 0.3795811518324607,
1139
+ "eval_allNLI-dev_euclidean_recall": 0.838150289017341,
1140
+ "eval_allNLI-dev_manhattan_accuracy": 0.677734375,
1141
+ "eval_allNLI-dev_manhattan_accuracy_threshold": 223.6764373779297,
1142
+ "eval_allNLI-dev_manhattan_ap": 0.43892484929307635,
1143
+ "eval_allNLI-dev_manhattan_f1": 0.5239852398523985,
1144
+ "eval_allNLI-dev_manhattan_f1_threshold": 372.31396484375,
1145
+ "eval_allNLI-dev_manhattan_precision": 0.38482384823848237,
1146
+ "eval_allNLI-dev_manhattan_recall": 0.8208092485549133,
1147
+ "eval_allNLI-dev_max_accuracy": 0.677734375,
1148
+ "eval_allNLI-dev_max_accuracy_threshold": 724.1080322265625,
1149
+ "eval_allNLI-dev_max_ap": 0.43892484929307635,
1150
+ "eval_allNLI-dev_max_f1": 0.5239852398523985,
1151
+ "eval_allNLI-dev_max_f1_threshold": 618.074951171875,
1152
+ "eval_allNLI-dev_max_precision": 0.38482384823848237,
1153
+ "eval_allNLI-dev_max_recall": 0.838150289017341,
1154
+ "eval_runtime": 1.2344,
1155
+ "eval_samples_per_second": 0.0,
1156
+ "eval_sequential_score": 0.6724653688821339,
1157
+ "eval_steps_per_second": 0.0,
1158
+ "eval_sts-test_pearson_cosine": 0.3774946012125992,
1159
+ "eval_sts-test_pearson_dot": 0.3774648453085433,
1160
+ "eval_sts-test_pearson_euclidean": 0.38652243004790016,
1161
+ "eval_sts-test_pearson_manhattan": 0.3861982631744407,
1162
+ "eval_sts-test_pearson_max": 0.38652243004790016,
1163
+ "eval_sts-test_spearman_cosine": 0.4056589966976888,
1164
+ "eval_sts-test_spearman_dot": 0.40563469676275316,
1165
+ "eval_sts-test_spearman_euclidean": 0.4056589966976888,
1166
+ "eval_sts-test_spearman_manhattan": 0.4059364545183154,
1167
+ "eval_sts-test_spearman_max": 0.4059364545183154,
1168
+ "step": 153
1169
+ },
1170
+ {
1171
+ "epoch": 0.1515748031496063,
1172
+ "grad_norm": 27.144521713256836,
1173
+ "learning_rate": 1.4910536779324058e-07,
1174
+ "loss": 2.2859,
1175
+ "step": 154
1176
+ },
1177
+ {
1178
+ "epoch": 0.15255905511811024,
1179
+ "grad_norm": 16.310321807861328,
1180
+ "learning_rate": 1.5009940357852886e-07,
1181
+ "loss": 1.0833,
1182
+ "step": 155
1183
+ },
1184
+ {
1185
+ "epoch": 0.15354330708661418,
1186
+ "grad_norm": 18.788604736328125,
1187
+ "learning_rate": 1.510934393638171e-07,
1188
+ "loss": 1.6829,
1189
+ "step": 156
1190
+ },
1191
+ {
1192
+ "epoch": 0.1545275590551181,
1193
+ "grad_norm": 23.34528923034668,
1194
+ "learning_rate": 1.5208747514910539e-07,
1195
+ "loss": 2.1464,
1196
+ "step": 157
1197
+ },
1198
+ {
1199
+ "epoch": 0.15551181102362205,
1200
+ "grad_norm": 19.06661033630371,
1201
+ "learning_rate": 1.5308151093439367e-07,
1202
+ "loss": 1.745,
1203
+ "step": 158
1204
+ },
1205
+ {
1206
+ "epoch": 0.15649606299212598,
1207
+ "grad_norm": 23.650297164916992,
1208
+ "learning_rate": 1.5407554671968192e-07,
1209
+ "loss": 1.7319,
1210
+ "step": 159
1211
+ },
1212
+ {
1213
+ "epoch": 0.15748031496062992,
1214
+ "grad_norm": 22.01628303527832,
1215
+ "learning_rate": 1.5506958250497022e-07,
1216
+ "loss": 1.6968,
1217
+ "step": 160
1218
+ },
1219
+ {
1220
+ "epoch": 0.15846456692913385,
1221
+ "grad_norm": 20.905855178833008,
1222
+ "learning_rate": 1.5606361829025848e-07,
1223
+ "loss": 1.7401,
1224
+ "step": 161
1225
+ },
1226
+ {
1227
+ "epoch": 0.1594488188976378,
1228
+ "grad_norm": 22.40980339050293,
1229
+ "learning_rate": 1.5705765407554675e-07,
1230
+ "loss": 1.729,
1231
+ "step": 162
1232
+ },
1233
+ {
1234
+ "epoch": 0.16043307086614172,
1235
+ "grad_norm": 23.574447631835938,
1236
+ "learning_rate": 1.58051689860835e-07,
1237
+ "loss": 2.0782,
1238
+ "step": 163
1239
+ },
1240
+ {
1241
+ "epoch": 0.16141732283464566,
1242
+ "grad_norm": 28.43741798400879,
1243
+ "learning_rate": 1.5904572564612329e-07,
1244
+ "loss": 2.6545,
1245
+ "step": 164
1246
+ },
1247
+ {
1248
+ "epoch": 0.1624015748031496,
1249
+ "grad_norm": 18.536827087402344,
1250
+ "learning_rate": 1.6003976143141154e-07,
1251
+ "loss": 1.4045,
1252
+ "step": 165
1253
+ },
1254
+ {
1255
+ "epoch": 0.16338582677165353,
1256
+ "grad_norm": 17.013965606689453,
1257
+ "learning_rate": 1.6103379721669984e-07,
1258
+ "loss": 1.2937,
1259
+ "step": 166
1260
+ },
1261
+ {
1262
+ "epoch": 0.1643700787401575,
1263
+ "grad_norm": 16.021455764770508,
1264
+ "learning_rate": 1.620278330019881e-07,
1265
+ "loss": 1.1171,
1266
+ "step": 167
1267
+ },
1268
+ {
1269
+ "epoch": 0.16535433070866143,
1270
+ "grad_norm": 20.248655319213867,
1271
+ "learning_rate": 1.6302186878727637e-07,
1272
+ "loss": 1.3537,
1273
+ "step": 168
1274
+ },
1275
+ {
1276
+ "epoch": 0.16633858267716536,
1277
+ "grad_norm": 21.731477737426758,
1278
+ "learning_rate": 1.6401590457256465e-07,
1279
+ "loss": 1.7028,
1280
+ "step": 169
1281
+ },
1282
+ {
1283
+ "epoch": 0.1673228346456693,
1284
+ "grad_norm": 20.372398376464844,
1285
+ "learning_rate": 1.650099403578529e-07,
1286
+ "loss": 1.4143,
1287
+ "step": 170
1288
+ },
1289
+ {
1290
+ "epoch": 0.16830708661417323,
1291
+ "grad_norm": 21.89658546447754,
1292
+ "learning_rate": 1.6600397614314118e-07,
1293
+ "loss": 1.8648,
1294
+ "step": 171
1295
+ },
1296
+ {
1297
+ "epoch": 0.16929133858267717,
1298
+ "grad_norm": 24.402984619140625,
1299
+ "learning_rate": 1.6699801192842944e-07,
1300
+ "loss": 1.6768,
1301
+ "step": 172
1302
+ },
1303
+ {
1304
+ "epoch": 0.1702755905511811,
1305
+ "grad_norm": 28.650039672851562,
1306
+ "learning_rate": 1.6799204771371774e-07,
1307
+ "loss": 1.9528,
1308
+ "step": 173
1309
+ },
1310
+ {
1311
+ "epoch": 0.17125984251968504,
1312
+ "grad_norm": 18.586729049682617,
1313
+ "learning_rate": 1.68986083499006e-07,
1314
+ "loss": 1.1718,
1315
+ "step": 174
1316
+ },
1317
+ {
1318
+ "epoch": 0.17224409448818898,
1319
+ "grad_norm": 22.66750717163086,
1320
+ "learning_rate": 1.6998011928429427e-07,
1321
+ "loss": 1.8176,
1322
+ "step": 175
1323
+ },
1324
+ {
1325
+ "epoch": 0.1732283464566929,
1326
+ "grad_norm": 15.157270431518555,
1327
+ "learning_rate": 1.7097415506958253e-07,
1328
+ "loss": 0.8439,
1329
+ "step": 176
1330
+ },
1331
+ {
1332
+ "epoch": 0.17421259842519685,
1333
+ "grad_norm": 23.08325958251953,
1334
+ "learning_rate": 1.719681908548708e-07,
1335
+ "loss": 1.5092,
1336
+ "step": 177
1337
+ },
1338
+ {
1339
+ "epoch": 0.17519685039370078,
1340
+ "grad_norm": 19.215696334838867,
1341
+ "learning_rate": 1.7296222664015906e-07,
1342
+ "loss": 1.1947,
1343
+ "step": 178
1344
+ },
1345
+ {
1346
+ "epoch": 0.17618110236220472,
1347
+ "grad_norm": 23.268930435180664,
1348
+ "learning_rate": 1.7395626242544734e-07,
1349
+ "loss": 1.6395,
1350
+ "step": 179
1351
+ },
1352
+ {
1353
+ "epoch": 0.17716535433070865,
1354
+ "grad_norm": 24.91169548034668,
1355
+ "learning_rate": 1.7495029821073564e-07,
1356
+ "loss": 1.4394,
1357
+ "step": 180
1358
+ },
1359
+ {
1360
+ "epoch": 0.1781496062992126,
1361
+ "grad_norm": 26.92424201965332,
1362
+ "learning_rate": 1.759443339960239e-07,
1363
+ "loss": 1.7548,
1364
+ "step": 181
1365
+ },
1366
+ {
1367
+ "epoch": 0.17913385826771652,
1368
+ "grad_norm": 22.831315994262695,
1369
+ "learning_rate": 1.7693836978131217e-07,
1370
+ "loss": 1.1181,
1371
+ "step": 182
1372
+ },
1373
+ {
1374
+ "epoch": 0.18011811023622049,
1375
+ "grad_norm": 18.39922332763672,
1376
+ "learning_rate": 1.7793240556660042e-07,
1377
+ "loss": 1.0271,
1378
+ "step": 183
1379
+ },
1380
+ {
1381
+ "epoch": 0.18110236220472442,
1382
+ "grad_norm": 30.0064697265625,
1383
+ "learning_rate": 1.789264413518887e-07,
1384
+ "loss": 2.3108,
1385
+ "step": 184
1386
+ },
1387
+ {
1388
+ "epoch": 0.18208661417322836,
1389
+ "grad_norm": 25.531293869018555,
1390
+ "learning_rate": 1.7992047713717695e-07,
1391
+ "loss": 2.1242,
1392
+ "step": 185
1393
+ },
1394
+ {
1395
+ "epoch": 0.1830708661417323,
1396
+ "grad_norm": 25.3813533782959,
1397
+ "learning_rate": 1.8091451292246523e-07,
1398
+ "loss": 1.9822,
1399
+ "step": 186
1400
+ },
1401
+ {
1402
+ "epoch": 0.18405511811023623,
1403
+ "grad_norm": 22.24681282043457,
1404
+ "learning_rate": 1.819085487077535e-07,
1405
+ "loss": 2.3605,
1406
+ "step": 187
1407
+ },
1408
+ {
1409
+ "epoch": 0.18503937007874016,
1410
+ "grad_norm": 19.291316986083984,
1411
+ "learning_rate": 1.829025844930418e-07,
1412
+ "loss": 1.5251,
1413
+ "step": 188
1414
+ },
1415
+ {
1416
+ "epoch": 0.1860236220472441,
1417
+ "grad_norm": 18.280820846557617,
1418
+ "learning_rate": 1.8389662027833004e-07,
1419
+ "loss": 1.2351,
1420
+ "step": 189
1421
+ },
1422
+ {
1423
+ "epoch": 0.18700787401574803,
1424
+ "grad_norm": 18.341115951538086,
1425
+ "learning_rate": 1.8489065606361832e-07,
1426
+ "loss": 1.5859,
1427
+ "step": 190
1428
+ },
1429
+ {
1430
+ "epoch": 0.18799212598425197,
1431
+ "grad_norm": 21.981475830078125,
1432
+ "learning_rate": 1.8588469184890657e-07,
1433
+ "loss": 1.8056,
1434
+ "step": 191
1435
+ },
1436
+ {
1437
+ "epoch": 0.1889763779527559,
1438
+ "grad_norm": 23.482725143432617,
1439
+ "learning_rate": 1.8687872763419485e-07,
1440
+ "loss": 1.349,
1441
+ "step": 192
1442
+ },
1443
+ {
1444
+ "epoch": 0.18996062992125984,
1445
+ "grad_norm": 15.451715469360352,
1446
+ "learning_rate": 1.8787276341948313e-07,
1447
+ "loss": 0.893,
1448
+ "step": 193
1449
+ },
1450
+ {
1451
+ "epoch": 0.19094488188976377,
1452
+ "grad_norm": 25.24738121032715,
1453
+ "learning_rate": 1.888667992047714e-07,
1454
+ "loss": 1.5122,
1455
+ "step": 194
1456
+ },
1457
+ {
1458
+ "epoch": 0.1919291338582677,
1459
+ "grad_norm": 22.348981857299805,
1460
+ "learning_rate": 1.898608349900597e-07,
1461
+ "loss": 1.3875,
1462
+ "step": 195
1463
+ },
1464
+ {
1465
+ "epoch": 0.19291338582677164,
1466
+ "grad_norm": 22.770132064819336,
1467
+ "learning_rate": 1.9085487077534794e-07,
1468
+ "loss": 1.29,
1469
+ "step": 196
1470
+ },
1471
+ {
1472
+ "epoch": 0.19389763779527558,
1473
+ "grad_norm": 31.219194412231445,
1474
+ "learning_rate": 1.9184890656063622e-07,
1475
+ "loss": 2.2931,
1476
+ "step": 197
1477
+ },
1478
+ {
1479
+ "epoch": 0.19488188976377951,
1480
+ "grad_norm": 24.657859802246094,
1481
+ "learning_rate": 1.9284294234592447e-07,
1482
+ "loss": 1.2663,
1483
+ "step": 198
1484
+ },
1485
+ {
1486
+ "epoch": 0.19586614173228348,
1487
+ "grad_norm": 26.825185775756836,
1488
+ "learning_rate": 1.9383697813121275e-07,
1489
+ "loss": 1.9712,
1490
+ "step": 199
1491
+ },
1492
+ {
1493
+ "epoch": 0.1968503937007874,
1494
+ "grad_norm": 30.97873306274414,
1495
+ "learning_rate": 1.94831013916501e-07,
1496
+ "loss": 2.3307,
1497
+ "step": 200
1498
+ },
1499
+ {
1500
+ "epoch": 0.19783464566929135,
1501
+ "grad_norm": 24.922365188598633,
1502
+ "learning_rate": 1.958250497017893e-07,
1503
+ "loss": 1.6544,
1504
+ "step": 201
1505
+ },
1506
+ {
1507
+ "epoch": 0.19881889763779528,
1508
+ "grad_norm": 22.842063903808594,
1509
+ "learning_rate": 1.9681908548707756e-07,
1510
+ "loss": 1.638,
1511
+ "step": 202
1512
+ },
1513
+ {
1514
+ "epoch": 0.19980314960629922,
1515
+ "grad_norm": 17.376962661743164,
1516
+ "learning_rate": 1.9781312127236584e-07,
1517
+ "loss": 1.3412,
1518
+ "step": 203
1519
+ },
1520
+ {
1521
+ "epoch": 0.20078740157480315,
1522
+ "grad_norm": 17.828535079956055,
1523
+ "learning_rate": 1.9880715705765412e-07,
1524
+ "loss": 1.4454,
1525
+ "step": 204
1526
+ },
1527
+ {
1528
+ "epoch": 0.2017716535433071,
1529
+ "grad_norm": 18.73072624206543,
1530
+ "learning_rate": 1.9980119284294237e-07,
1531
+ "loss": 1.5437,
1532
+ "step": 205
1533
+ },
1534
+ {
1535
+ "epoch": 0.20275590551181102,
1536
+ "grad_norm": 22.05230140686035,
1537
+ "learning_rate": 2.0079522862823065e-07,
1538
+ "loss": 1.4921,
1539
+ "step": 206
1540
+ },
1541
+ {
1542
+ "epoch": 0.20374015748031496,
1543
+ "grad_norm": 18.090892791748047,
1544
+ "learning_rate": 2.017892644135189e-07,
1545
+ "loss": 1.4298,
1546
+ "step": 207
1547
+ },
1548
+ {
1549
+ "epoch": 0.2047244094488189,
1550
+ "grad_norm": 22.94427490234375,
1551
+ "learning_rate": 2.027833001988072e-07,
1552
+ "loss": 1.6174,
1553
+ "step": 208
1554
+ },
1555
+ {
1556
+ "epoch": 0.20570866141732283,
1557
+ "grad_norm": 20.903331756591797,
1558
+ "learning_rate": 2.0377733598409546e-07,
1559
+ "loss": 1.4137,
1560
+ "step": 209
1561
+ },
1562
+ {
1563
+ "epoch": 0.20669291338582677,
1564
+ "grad_norm": 22.67809295654297,
1565
+ "learning_rate": 2.0477137176938374e-07,
1566
+ "loss": 1.5652,
1567
+ "step": 210
1568
+ },
1569
+ {
1570
+ "epoch": 0.2076771653543307,
1571
+ "grad_norm": 16.824064254760742,
1572
+ "learning_rate": 2.05765407554672e-07,
1573
+ "loss": 1.1631,
1574
+ "step": 211
1575
+ },
1576
+ {
1577
+ "epoch": 0.20866141732283464,
1578
+ "grad_norm": 21.46413230895996,
1579
+ "learning_rate": 2.0675944333996027e-07,
1580
+ "loss": 1.2351,
1581
+ "step": 212
1582
+ },
1583
+ {
1584
+ "epoch": 0.20964566929133857,
1585
+ "grad_norm": 22.461610794067383,
1586
+ "learning_rate": 2.0775347912524852e-07,
1587
+ "loss": 1.7537,
1588
+ "step": 213
1589
+ },
1590
+ {
1591
+ "epoch": 0.2106299212598425,
1592
+ "grad_norm": 22.150938034057617,
1593
+ "learning_rate": 2.087475149105368e-07,
1594
+ "loss": 1.3186,
1595
+ "step": 214
1596
+ },
1597
+ {
1598
+ "epoch": 0.21161417322834647,
1599
+ "grad_norm": 22.82622718811035,
1600
+ "learning_rate": 2.097415506958251e-07,
1601
+ "loss": 1.2258,
1602
+ "step": 215
1603
+ },
1604
+ {
1605
+ "epoch": 0.2125984251968504,
1606
+ "grad_norm": 13.514169692993164,
1607
+ "learning_rate": 2.1073558648111336e-07,
1608
+ "loss": 0.7695,
1609
+ "step": 216
1610
+ },
1611
+ {
1612
+ "epoch": 0.21358267716535434,
1613
+ "grad_norm": 22.254596710205078,
1614
+ "learning_rate": 2.1172962226640164e-07,
1615
+ "loss": 1.2775,
1616
+ "step": 217
1617
+ },
1618
+ {
1619
+ "epoch": 0.21456692913385828,
1620
+ "grad_norm": 23.31028938293457,
1621
+ "learning_rate": 2.127236580516899e-07,
1622
+ "loss": 1.6795,
1623
+ "step": 218
1624
+ },
1625
+ {
1626
+ "epoch": 0.2155511811023622,
1627
+ "grad_norm": 24.057025909423828,
1628
+ "learning_rate": 2.1371769383697817e-07,
1629
+ "loss": 1.2862,
1630
+ "step": 219
1631
+ },
1632
+ {
1633
+ "epoch": 0.21653543307086615,
1634
+ "grad_norm": 17.363821029663086,
1635
+ "learning_rate": 2.1471172962226642e-07,
1636
+ "loss": 1.1723,
1637
+ "step": 220
1638
+ },
1639
+ {
1640
+ "epoch": 0.21751968503937008,
1641
+ "grad_norm": 22.69175148010254,
1642
+ "learning_rate": 2.1570576540755473e-07,
1643
+ "loss": 1.3322,
1644
+ "step": 221
1645
+ },
1646
+ {
1647
+ "epoch": 0.21850393700787402,
1648
+ "grad_norm": 24.083465576171875,
1649
+ "learning_rate": 2.1669980119284298e-07,
1650
+ "loss": 1.7564,
1651
+ "step": 222
1652
+ },
1653
+ {
1654
+ "epoch": 0.21948818897637795,
1655
+ "grad_norm": 17.775938034057617,
1656
+ "learning_rate": 2.1769383697813126e-07,
1657
+ "loss": 1.1071,
1658
+ "step": 223
1659
+ },
1660
+ {
1661
+ "epoch": 0.2204724409448819,
1662
+ "grad_norm": 18.539440155029297,
1663
+ "learning_rate": 2.186878727634195e-07,
1664
+ "loss": 1.2011,
1665
+ "step": 224
1666
+ },
1667
+ {
1668
+ "epoch": 0.22145669291338582,
1669
+ "grad_norm": 18.234455108642578,
1670
+ "learning_rate": 2.196819085487078e-07,
1671
+ "loss": 1.2303,
1672
+ "step": 225
1673
+ },
1674
+ {
1675
+ "epoch": 0.22244094488188976,
1676
+ "grad_norm": 20.291522979736328,
1677
+ "learning_rate": 2.2067594433399604e-07,
1678
+ "loss": 1.212,
1679
+ "step": 226
1680
+ },
1681
+ {
1682
+ "epoch": 0.2234251968503937,
1683
+ "grad_norm": 15.60973072052002,
1684
+ "learning_rate": 2.2166998011928432e-07,
1685
+ "loss": 1.0117,
1686
+ "step": 227
1687
+ },
1688
+ {
1689
+ "epoch": 0.22440944881889763,
1690
+ "grad_norm": 20.482995986938477,
1691
+ "learning_rate": 2.2266401590457263e-07,
1692
+ "loss": 1.1907,
1693
+ "step": 228
1694
+ },
1695
+ {
1696
+ "epoch": 0.22539370078740156,
1697
+ "grad_norm": 24.126340866088867,
1698
+ "learning_rate": 2.2365805168986088e-07,
1699
+ "loss": 2.1293,
1700
+ "step": 229
1701
+ },
1702
+ {
1703
+ "epoch": 0.2263779527559055,
1704
+ "grad_norm": 21.279150009155273,
1705
+ "learning_rate": 2.2465208747514916e-07,
1706
+ "loss": 1.3063,
1707
+ "step": 230
1708
+ },
1709
+ {
1710
+ "epoch": 0.22736220472440946,
1711
+ "grad_norm": 20.62582778930664,
1712
+ "learning_rate": 2.256461232604374e-07,
1713
+ "loss": 1.2841,
1714
+ "step": 231
1715
+ },
1716
+ {
1717
+ "epoch": 0.2283464566929134,
1718
+ "grad_norm": 20.63606834411621,
1719
+ "learning_rate": 2.266401590457257e-07,
1720
+ "loss": 1.3778,
1721
+ "step": 232
1722
+ },
1723
+ {
1724
+ "epoch": 0.22933070866141733,
1725
+ "grad_norm": 18.099950790405273,
1726
+ "learning_rate": 2.2763419483101394e-07,
1727
+ "loss": 1.2242,
1728
+ "step": 233
1729
+ },
1730
+ {
1731
+ "epoch": 0.23031496062992127,
1732
+ "grad_norm": 18.96858024597168,
1733
+ "learning_rate": 2.2862823061630222e-07,
1734
+ "loss": 0.9227,
1735
+ "step": 234
1736
+ },
1737
+ {
1738
+ "epoch": 0.2312992125984252,
1739
+ "grad_norm": 22.201169967651367,
1740
+ "learning_rate": 2.296222664015905e-07,
1741
+ "loss": 1.2221,
1742
+ "step": 235
1743
+ },
1744
+ {
1745
+ "epoch": 0.23228346456692914,
1746
+ "grad_norm": 26.657480239868164,
1747
+ "learning_rate": 2.3061630218687878e-07,
1748
+ "loss": 2.1041,
1749
+ "step": 236
1750
+ },
1751
+ {
1752
+ "epoch": 0.23326771653543307,
1753
+ "grad_norm": 25.558757781982422,
1754
+ "learning_rate": 2.3161033797216703e-07,
1755
+ "loss": 1.3341,
1756
+ "step": 237
1757
+ },
1758
+ {
1759
+ "epoch": 0.234251968503937,
1760
+ "grad_norm": 22.0543212890625,
1761
+ "learning_rate": 2.326043737574553e-07,
1762
+ "loss": 1.0876,
1763
+ "step": 238
1764
+ },
1765
+ {
1766
+ "epoch": 0.23523622047244094,
1767
+ "grad_norm": 22.258333206176758,
1768
+ "learning_rate": 2.335984095427436e-07,
1769
+ "loss": 1.3328,
1770
+ "step": 239
1771
+ },
1772
+ {
1773
+ "epoch": 0.23622047244094488,
1774
+ "grad_norm": 21.519908905029297,
1775
+ "learning_rate": 2.3459244532803184e-07,
1776
+ "loss": 1.2958,
1777
+ "step": 240
1778
+ },
1779
+ {
1780
+ "epoch": 0.2372047244094488,
1781
+ "grad_norm": 23.345388412475586,
1782
+ "learning_rate": 2.3558648111332012e-07,
1783
+ "loss": 1.1522,
1784
+ "step": 241
1785
+ },
1786
+ {
1787
+ "epoch": 0.23818897637795275,
1788
+ "grad_norm": 25.306140899658203,
1789
+ "learning_rate": 2.365805168986084e-07,
1790
+ "loss": 1.7942,
1791
+ "step": 242
1792
+ },
1793
+ {
1794
+ "epoch": 0.23917322834645668,
1795
+ "grad_norm": 21.840364456176758,
1796
+ "learning_rate": 2.3757455268389668e-07,
1797
+ "loss": 1.1325,
1798
+ "step": 243
1799
+ },
1800
+ {
1801
+ "epoch": 0.24015748031496062,
1802
+ "grad_norm": 26.5495662689209,
1803
+ "learning_rate": 2.385685884691849e-07,
1804
+ "loss": 1.6466,
1805
+ "step": 244
1806
+ },
1807
+ {
1808
+ "epoch": 0.24114173228346455,
1809
+ "grad_norm": 21.928937911987305,
1810
+ "learning_rate": 2.395626242544732e-07,
1811
+ "loss": 1.4608,
1812
+ "step": 245
1813
+ },
1814
+ {
1815
+ "epoch": 0.2421259842519685,
1816
+ "grad_norm": 16.23383903503418,
1817
+ "learning_rate": 2.4055666003976146e-07,
1818
+ "loss": 0.6375,
1819
+ "step": 246
1820
+ },
1821
+ {
1822
+ "epoch": 0.24311023622047245,
1823
+ "grad_norm": 26.922739028930664,
1824
+ "learning_rate": 2.4155069582504976e-07,
1825
+ "loss": 2.0177,
1826
+ "step": 247
1827
+ },
1828
+ {
1829
+ "epoch": 0.2440944881889764,
1830
+ "grad_norm": 23.482742309570312,
1831
+ "learning_rate": 2.42544731610338e-07,
1832
+ "loss": 1.2069,
1833
+ "step": 248
1834
+ },
1835
+ {
1836
+ "epoch": 0.24507874015748032,
1837
+ "grad_norm": 14.740228652954102,
1838
+ "learning_rate": 2.4353876739562627e-07,
1839
+ "loss": 0.7639,
1840
+ "step": 249
1841
+ },
1842
+ {
1843
+ "epoch": 0.24606299212598426,
1844
+ "grad_norm": 20.238948822021484,
1845
+ "learning_rate": 2.445328031809146e-07,
1846
+ "loss": 1.3465,
1847
+ "step": 250
1848
+ },
1849
+ {
1850
+ "epoch": 0.2470472440944882,
1851
+ "grad_norm": 16.91303253173828,
1852
+ "learning_rate": 2.4552683896620283e-07,
1853
+ "loss": 1.064,
1854
+ "step": 251
1855
+ },
1856
+ {
1857
+ "epoch": 0.24803149606299213,
1858
+ "grad_norm": 22.081308364868164,
1859
+ "learning_rate": 2.4652087475149113e-07,
1860
+ "loss": 1.3757,
1861
+ "step": 252
1862
+ },
1863
+ {
1864
+ "epoch": 0.24901574803149606,
1865
+ "grad_norm": 21.756267547607422,
1866
+ "learning_rate": 2.475149105367794e-07,
1867
+ "loss": 1.612,
1868
+ "step": 253
1869
+ },
1870
+ {
1871
+ "epoch": 0.25,
1872
+ "grad_norm": 16.87566566467285,
1873
+ "learning_rate": 2.4850894632206764e-07,
1874
+ "loss": 0.7917,
1875
+ "step": 254
1876
+ },
1877
+ {
1878
+ "epoch": 0.25098425196850394,
1879
+ "grad_norm": 22.13374900817871,
1880
+ "learning_rate": 2.495029821073559e-07,
1881
+ "loss": 1.5515,
1882
+ "step": 255
1883
+ },
1884
+ {
1885
+ "epoch": 0.25196850393700787,
1886
+ "grad_norm": 18.96138572692871,
1887
+ "learning_rate": 2.5049701789264414e-07,
1888
+ "loss": 0.799,
1889
+ "step": 256
1890
+ },
1891
+ {
1892
+ "epoch": 0.2529527559055118,
1893
+ "grad_norm": 17.10991859436035,
1894
+ "learning_rate": 2.5149105367793245e-07,
1895
+ "loss": 0.9882,
1896
+ "step": 257
1897
+ },
1898
+ {
1899
+ "epoch": 0.25393700787401574,
1900
+ "grad_norm": 15.347769737243652,
1901
+ "learning_rate": 2.524850894632207e-07,
1902
+ "loss": 1.1814,
1903
+ "step": 258
1904
+ },
1905
+ {
1906
+ "epoch": 0.2549212598425197,
1907
+ "grad_norm": 11.857462882995605,
1908
+ "learning_rate": 2.53479125248509e-07,
1909
+ "loss": 0.6394,
1910
+ "step": 259
1911
+ },
1912
+ {
1913
+ "epoch": 0.2559055118110236,
1914
+ "grad_norm": 20.00450325012207,
1915
+ "learning_rate": 2.5447316103379726e-07,
1916
+ "loss": 1.4756,
1917
+ "step": 260
1918
+ },
1919
+ {
1920
+ "epoch": 0.25688976377952755,
1921
+ "grad_norm": 12.062335968017578,
1922
+ "learning_rate": 2.554671968190855e-07,
1923
+ "loss": 0.5338,
1924
+ "step": 261
1925
+ },
1926
+ {
1927
+ "epoch": 0.2578740157480315,
1928
+ "grad_norm": 16.912168502807617,
1929
+ "learning_rate": 2.564612326043738e-07,
1930
+ "loss": 0.9779,
1931
+ "step": 262
1932
+ },
1933
+ {
1934
+ "epoch": 0.2588582677165354,
1935
+ "grad_norm": 22.713825225830078,
1936
+ "learning_rate": 2.5745526838966207e-07,
1937
+ "loss": 1.5307,
1938
+ "step": 263
1939
+ },
1940
+ {
1941
+ "epoch": 0.25984251968503935,
1942
+ "grad_norm": 22.88834571838379,
1943
+ "learning_rate": 2.5844930417495037e-07,
1944
+ "loss": 1.1213,
1945
+ "step": 264
1946
+ },
1947
+ {
1948
+ "epoch": 0.2608267716535433,
1949
+ "grad_norm": 18.133310317993164,
1950
+ "learning_rate": 2.5944333996023857e-07,
1951
+ "loss": 0.9482,
1952
+ "step": 265
1953
+ },
1954
+ {
1955
+ "epoch": 0.2618110236220472,
1956
+ "grad_norm": 19.092018127441406,
1957
+ "learning_rate": 2.604373757455269e-07,
1958
+ "loss": 0.9599,
1959
+ "step": 266
1960
+ },
1961
+ {
1962
+ "epoch": 0.26279527559055116,
1963
+ "grad_norm": 25.283647537231445,
1964
+ "learning_rate": 2.614314115308152e-07,
1965
+ "loss": 1.4455,
1966
+ "step": 267
1967
+ },
1968
+ {
1969
+ "epoch": 0.2637795275590551,
1970
+ "grad_norm": 28.519742965698242,
1971
+ "learning_rate": 2.6242544731610343e-07,
1972
+ "loss": 1.6496,
1973
+ "step": 268
1974
+ },
1975
+ {
1976
+ "epoch": 0.26476377952755903,
1977
+ "grad_norm": 15.56734561920166,
1978
+ "learning_rate": 2.634194831013917e-07,
1979
+ "loss": 0.7402,
1980
+ "step": 269
1981
+ },
1982
+ {
1983
+ "epoch": 0.265748031496063,
1984
+ "grad_norm": 16.450769424438477,
1985
+ "learning_rate": 2.6441351888667994e-07,
1986
+ "loss": 0.7835,
1987
+ "step": 270
1988
+ },
1989
+ {
1990
+ "epoch": 0.26673228346456695,
1991
+ "grad_norm": 18.447402954101562,
1992
+ "learning_rate": 2.6540755467196824e-07,
1993
+ "loss": 0.7821,
1994
+ "step": 271
1995
+ },
1996
+ {
1997
+ "epoch": 0.2677165354330709,
1998
+ "grad_norm": 22.634441375732422,
1999
+ "learning_rate": 2.664015904572565e-07,
2000
+ "loss": 1.5422,
2001
+ "step": 272
2002
+ },
2003
+ {
2004
+ "epoch": 0.2687007874015748,
2005
+ "grad_norm": 20.091428756713867,
2006
+ "learning_rate": 2.6739562624254475e-07,
2007
+ "loss": 1.0995,
2008
+ "step": 273
2009
+ },
2010
+ {
2011
+ "epoch": 0.26968503937007876,
2012
+ "grad_norm": 22.89579200744629,
2013
+ "learning_rate": 2.6838966202783305e-07,
2014
+ "loss": 1.378,
2015
+ "step": 274
2016
+ },
2017
+ {
2018
+ "epoch": 0.2706692913385827,
2019
+ "grad_norm": 25.36764144897461,
2020
+ "learning_rate": 2.693836978131213e-07,
2021
+ "loss": 1.3562,
2022
+ "step": 275
2023
+ },
2024
+ {
2025
+ "epoch": 0.27165354330708663,
2026
+ "grad_norm": 15.717765808105469,
2027
+ "learning_rate": 2.703777335984096e-07,
2028
+ "loss": 0.7376,
2029
+ "step": 276
2030
+ },
2031
+ {
2032
+ "epoch": 0.27263779527559057,
2033
+ "grad_norm": 21.739999771118164,
2034
+ "learning_rate": 2.7137176938369786e-07,
2035
+ "loss": 1.1678,
2036
+ "step": 277
2037
+ },
2038
+ {
2039
+ "epoch": 0.2736220472440945,
2040
+ "grad_norm": 21.12672233581543,
2041
+ "learning_rate": 2.723658051689861e-07,
2042
+ "loss": 1.2989,
2043
+ "step": 278
2044
+ },
2045
+ {
2046
+ "epoch": 0.27460629921259844,
2047
+ "grad_norm": 22.614158630371094,
2048
+ "learning_rate": 2.7335984095427437e-07,
2049
+ "loss": 1.9559,
2050
+ "step": 279
2051
+ },
2052
+ {
2053
+ "epoch": 0.2755905511811024,
2054
+ "grad_norm": 18.742107391357422,
2055
+ "learning_rate": 2.743538767395627e-07,
2056
+ "loss": 1.1237,
2057
+ "step": 280
2058
+ },
2059
+ {
2060
+ "epoch": 0.2765748031496063,
2061
+ "grad_norm": 17.969554901123047,
2062
+ "learning_rate": 2.75347912524851e-07,
2063
+ "loss": 0.952,
2064
+ "step": 281
2065
+ },
2066
+ {
2067
+ "epoch": 0.27755905511811024,
2068
+ "grad_norm": 24.01414680480957,
2069
+ "learning_rate": 2.763419483101392e-07,
2070
+ "loss": 1.6629,
2071
+ "step": 282
2072
+ },
2073
+ {
2074
+ "epoch": 0.2785433070866142,
2075
+ "grad_norm": 21.362796783447266,
2076
+ "learning_rate": 2.773359840954275e-07,
2077
+ "loss": 1.871,
2078
+ "step": 283
2079
+ },
2080
+ {
2081
+ "epoch": 0.2795275590551181,
2082
+ "grad_norm": 19.05644989013672,
2083
+ "learning_rate": 2.7833001988071574e-07,
2084
+ "loss": 1.5946,
2085
+ "step": 284
2086
+ },
2087
+ {
2088
+ "epoch": 0.28051181102362205,
2089
+ "grad_norm": 18.40645408630371,
2090
+ "learning_rate": 2.7932405566600404e-07,
2091
+ "loss": 1.4456,
2092
+ "step": 285
2093
+ },
2094
+ {
2095
+ "epoch": 0.281496062992126,
2096
+ "grad_norm": 18.53467559814453,
2097
+ "learning_rate": 2.803180914512923e-07,
2098
+ "loss": 1.4085,
2099
+ "step": 286
2100
+ },
2101
+ {
2102
+ "epoch": 0.2824803149606299,
2103
+ "grad_norm": 18.1446533203125,
2104
+ "learning_rate": 2.8131212723658055e-07,
2105
+ "loss": 1.1394,
2106
+ "step": 287
2107
+ },
2108
+ {
2109
+ "epoch": 0.28346456692913385,
2110
+ "grad_norm": 15.109747886657715,
2111
+ "learning_rate": 2.8230616302186885e-07,
2112
+ "loss": 1.0315,
2113
+ "step": 288
2114
+ },
2115
+ {
2116
+ "epoch": 0.2844488188976378,
2117
+ "grad_norm": 22.63469696044922,
2118
+ "learning_rate": 2.833001988071571e-07,
2119
+ "loss": 1.488,
2120
+ "step": 289
2121
+ },
2122
+ {
2123
+ "epoch": 0.2854330708661417,
2124
+ "grad_norm": 18.841909408569336,
2125
+ "learning_rate": 2.842942345924454e-07,
2126
+ "loss": 1.4006,
2127
+ "step": 290
2128
+ },
2129
+ {
2130
+ "epoch": 0.28641732283464566,
2131
+ "grad_norm": 20.073745727539062,
2132
+ "learning_rate": 2.852882703777336e-07,
2133
+ "loss": 0.9237,
2134
+ "step": 291
2135
+ },
2136
+ {
2137
+ "epoch": 0.2874015748031496,
2138
+ "grad_norm": 18.586259841918945,
2139
+ "learning_rate": 2.862823061630219e-07,
2140
+ "loss": 1.163,
2141
+ "step": 292
2142
+ },
2143
+ {
2144
+ "epoch": 0.28838582677165353,
2145
+ "grad_norm": 23.599292755126953,
2146
+ "learning_rate": 2.8727634194831017e-07,
2147
+ "loss": 1.7037,
2148
+ "step": 293
2149
+ },
2150
+ {
2151
+ "epoch": 0.28937007874015747,
2152
+ "grad_norm": 16.384347915649414,
2153
+ "learning_rate": 2.8827037773359847e-07,
2154
+ "loss": 0.8715,
2155
+ "step": 294
2156
+ },
2157
+ {
2158
+ "epoch": 0.2903543307086614,
2159
+ "grad_norm": 20.26093101501465,
2160
+ "learning_rate": 2.892644135188867e-07,
2161
+ "loss": 1.2101,
2162
+ "step": 295
2163
+ },
2164
+ {
2165
+ "epoch": 0.29133858267716534,
2166
+ "grad_norm": 19.598831176757812,
2167
+ "learning_rate": 2.90258449304175e-07,
2168
+ "loss": 1.1179,
2169
+ "step": 296
2170
+ },
2171
+ {
2172
+ "epoch": 0.29232283464566927,
2173
+ "grad_norm": 22.65572738647461,
2174
+ "learning_rate": 2.912524850894633e-07,
2175
+ "loss": 1.3986,
2176
+ "step": 297
2177
+ },
2178
+ {
2179
+ "epoch": 0.2933070866141732,
2180
+ "grad_norm": 22.658931732177734,
2181
+ "learning_rate": 2.9224652087475153e-07,
2182
+ "loss": 1.7068,
2183
+ "step": 298
2184
+ },
2185
+ {
2186
+ "epoch": 0.29429133858267714,
2187
+ "grad_norm": 16.94671058654785,
2188
+ "learning_rate": 2.9324055666003984e-07,
2189
+ "loss": 0.8695,
2190
+ "step": 299
2191
+ },
2192
+ {
2193
+ "epoch": 0.2952755905511811,
2194
+ "grad_norm": 20.13766860961914,
2195
+ "learning_rate": 2.9423459244532804e-07,
2196
+ "loss": 1.3778,
2197
+ "step": 300
2198
+ },
2199
+ {
2200
+ "epoch": 0.296259842519685,
2201
+ "grad_norm": 16.939062118530273,
2202
+ "learning_rate": 2.9522862823061634e-07,
2203
+ "loss": 1.2834,
2204
+ "step": 301
2205
+ },
2206
+ {
2207
+ "epoch": 0.297244094488189,
2208
+ "grad_norm": 14.732866287231445,
2209
+ "learning_rate": 2.9622266401590465e-07,
2210
+ "loss": 0.8123,
2211
+ "step": 302
2212
+ },
2213
+ {
2214
+ "epoch": 0.29822834645669294,
2215
+ "grad_norm": 23.4341983795166,
2216
+ "learning_rate": 2.972166998011929e-07,
2217
+ "loss": 1.6521,
2218
+ "step": 303
2219
+ },
2220
+ {
2221
+ "epoch": 0.2992125984251969,
2222
+ "grad_norm": 15.455023765563965,
2223
+ "learning_rate": 2.9821073558648115e-07,
2224
+ "loss": 1.1064,
2225
+ "step": 304
2226
+ },
2227
+ {
2228
+ "epoch": 0.3001968503937008,
2229
+ "grad_norm": 15.683412551879883,
2230
+ "learning_rate": 2.992047713717694e-07,
2231
+ "loss": 0.9578,
2232
+ "step": 305
2233
+ }
2234
+ ],
2235
+ "logging_steps": 1,
2236
+ "max_steps": 3048,
2237
+ "num_input_tokens_seen": 0,
2238
+ "num_train_epochs": 3,
2239
+ "save_steps": 305,
2240
+ "stateful_callbacks": {
2241
+ "TrainerControl": {
2242
+ "args": {
2243
+ "should_epoch_stop": false,
2244
+ "should_evaluate": false,
2245
+ "should_log": false,
2246
+ "should_save": true,
2247
+ "should_training_stop": false
2248
+ },
2249
+ "attributes": {}
2250
+ }
2251
+ },
2252
+ "total_flos": 0.0,
2253
+ "train_batch_size": 32,
2254
+ "trial_name": null,
2255
+ "trial_params": null
2256
+ }
checkpoint-305/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4e5a2c389c69a314b44e2abcb7834dfc6e25823a2070d8ca3efd3fc97499c9b
3
+ size 5688