pawasthy commited on
Commit
0bc9461
·
verified ·
1 Parent(s): dfbe76b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1948 -0
README.md CHANGED
@@ -1,3 +1,1951 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - granite
9
+ - embeddings
10
+ model-index:
11
+ - name: ibm-granite/granite-embedding-30m-english
12
+ results:
13
+ - dataset:
14
+ type: mteb/arguana
15
+ name: MTEB ArguaAna
16
+ config: default
17
+ split: test
18
+ task:
19
+ type: Retrieval
20
+ metrics:
21
+ - type: map_at_1
22
+ value: 0.31792
23
+ - type: map_at_10
24
+ value: 0.47599
25
+ - type: map_at_100
26
+ value: 0.48425
27
+ - type: map_at_1000
28
+ value: 0.48427
29
+ - type: map_at_3
30
+ value: 0.42757
31
+ - type: map_at_5
32
+ value: 0.45634
33
+ - type: mrr_at_1
34
+ value: 0.32788
35
+ - type: mrr_at_10
36
+ value: 0.47974
37
+ - type: mrr_at_100
38
+ value: 0.48801
39
+ - type: mrr_at_1000
40
+ value: 0.48802
41
+ - type: mrr_at_3
42
+ value: 0.43065
43
+ - type: mrr_at_5
44
+ value: 0.45999
45
+ - type: ndcg_at_1
46
+ value: 0.31792
47
+ - type: ndcg_at_10
48
+ value: 0.56356
49
+ - type: ndcg_at_100
50
+ value: 0.59789
51
+ - type: ndcg_at_1000
52
+ value: 0.59857
53
+ - type: ndcg_at_3
54
+ value: 0.46453
55
+ - type: ndcg_at_5
56
+ value: 0.51623
57
+ - type: precision_at_1
58
+ value: 0.31792
59
+ - type: precision_at_10
60
+ value: 0.08428
61
+ - type: precision_at_100
62
+ value: 0.00991
63
+ - type: precision_at_1000
64
+ value: 0.001
65
+ - type: precision_at_3
66
+ value: 0.19061
67
+ - type: precision_at_5
68
+ value: 0.1394
69
+ - type: recall_at_1
70
+ value: 0.31792
71
+ - type: recall_at_10
72
+ value: 0.84282
73
+ - type: recall_at_100
74
+ value: 0.99075
75
+ - type: recall_at_1000
76
+ value: 0.99644
77
+ - type: recall_at_3
78
+ value: 0.57183
79
+ - type: recall_at_5
80
+ value: 0.69701
81
+ - dataset:
82
+ type: mteb/climate-fever
83
+ name: MTEB ClimateFEVER
84
+ config: default
85
+ split: test
86
+ task:
87
+ type: Retrieval
88
+ metrics:
89
+ - type: map_at_1
90
+ value: 0.13189
91
+ - type: map_at_10
92
+ value: 0.21789
93
+ - type: map_at_100
94
+ value: 0.2358
95
+ - type: map_at_1000
96
+ value: 0.23772
97
+ - type: map_at_3
98
+ value: 0.18513
99
+ - type: map_at_5
100
+ value: 0.20212
101
+ - type: mrr_at_1
102
+ value: 0.29837
103
+ - type: mrr_at_10
104
+ value: 0.41376
105
+ - type: mrr_at_100
106
+ value: 0.42282
107
+ - type: mrr_at_1000
108
+ value: 0.42319
109
+ - type: mrr_at_3
110
+ value: 0.38284
111
+ - type: mrr_at_5
112
+ value: 0.40301
113
+ - type: ndcg_at_1
114
+ value: 0.29837
115
+ - type: ndcg_at_10
116
+ value: 0.30263
117
+ - type: ndcg_at_100
118
+ value: 0.37228
119
+ - type: ndcg_at_1000
120
+ value: 0.40677
121
+ - type: ndcg_at_3
122
+ value: 0.25392
123
+ - type: ndcg_at_5
124
+ value: 0.27153
125
+ - type: precision_at_1
126
+ value: 0.29837
127
+ - type: precision_at_10
128
+ value: 0.09179
129
+ - type: precision_at_100
130
+ value: 0.01659
131
+ - type: precision_at_1000
132
+ value: 0.0023
133
+ - type: precision_at_3
134
+ value: 0.18545
135
+ - type: precision_at_5
136
+ value: 0.14241
137
+ - type: recall_at_1
138
+ value: 0.13189
139
+ - type: recall_at_10
140
+ value: 0.35355
141
+ - type: recall_at_100
142
+ value: 0.59255
143
+ - type: recall_at_1000
144
+ value: 0.78637
145
+ - type: recall_at_3
146
+ value: 0.23255
147
+ - type: recall_at_5
148
+ value: 0.28446
149
+ - dataset:
150
+ type: mteb/cqadupstack-android
151
+ name: MTEB CQADupstackAndroidRetrieval
152
+ config: default
153
+ split: test
154
+ task:
155
+ type: Retrieval
156
+ metrics:
157
+ - type: map_at_1
158
+ value: 0.35797
159
+ - type: map_at_10
160
+ value: 0.47793
161
+ - type: map_at_100
162
+ value: 0.49422
163
+ - type: map_at_1000
164
+ value: 0.49546
165
+ - type: map_at_3
166
+ value: 0.44137
167
+ - type: map_at_5
168
+ value: 0.46063
169
+ - type: mrr_at_1
170
+ value: 0.44206
171
+ - type: mrr_at_10
172
+ value: 0.53808
173
+ - type: mrr_at_100
174
+ value: 0.5454
175
+ - type: mrr_at_1000
176
+ value: 0.54578
177
+ - type: mrr_at_3
178
+ value: 0.51431
179
+ - type: mrr_at_5
180
+ value: 0.5284
181
+ - type: ndcg_at_1
182
+ value: 0.44206
183
+ - type: ndcg_at_10
184
+ value: 0.54106
185
+ - type: ndcg_at_100
186
+ value: 0.59335
187
+ - type: ndcg_at_1000
188
+ value: 0.61015
189
+ - type: ndcg_at_3
190
+ value: 0.49365
191
+ - type: ndcg_at_5
192
+ value: 0.51429
193
+ - type: precision_at_1
194
+ value: 0.44206
195
+ - type: precision_at_10
196
+ value: 0.10443
197
+ - type: precision_at_100
198
+ value: 0.01631
199
+ - type: precision_at_1000
200
+ value: 0.00214
201
+ - type: precision_at_3
202
+ value: 0.23653
203
+ - type: precision_at_5
204
+ value: 0.1691
205
+ - type: recall_at_1
206
+ value: 0.35797
207
+ - type: recall_at_10
208
+ value: 0.65182
209
+ - type: recall_at_100
210
+ value: 0.86654
211
+ - type: recall_at_1000
212
+ value: 0.97131
213
+ - type: recall_at_3
214
+ value: 0.51224
215
+ - type: recall_at_5
216
+ value: 0.57219
217
+ - dataset:
218
+ type: mteb/cqadupstack-english
219
+ name: MTEB CQADupstackEnglishRetrieval
220
+ config: default
221
+ split: test
222
+ task:
223
+ type: Retrieval
224
+ metrics:
225
+ - type: map_at_1
226
+ value: 0.32748
227
+ - type: map_at_10
228
+ value: 0.44138
229
+ - type: map_at_100
230
+ value: 0.45565
231
+ - type: map_at_1000
232
+ value: 0.45698
233
+ - type: map_at_3
234
+ value: 0.40916
235
+ - type: map_at_5
236
+ value: 0.42621
237
+ - type: mrr_at_1
238
+ value: 0.41274
239
+ - type: mrr_at_10
240
+ value: 0.5046
241
+ - type: mrr_at_100
242
+ value: 0.5107
243
+ - type: mrr_at_1000
244
+ value: 0.51109
245
+ - type: mrr_at_3
246
+ value: 0.48238
247
+ - type: mrr_at_5
248
+ value: 0.49563
249
+ - type: ndcg_at_1
250
+ value: 0.41274
251
+ - type: ndcg_at_10
252
+ value: 0.50251
253
+ - type: ndcg_at_100
254
+ value: 0.54725
255
+ - type: ndcg_at_1000
256
+ value: 0.56635
257
+ - type: ndcg_at_3
258
+ value: 0.46023
259
+ - type: ndcg_at_5
260
+ value: 0.47883
261
+ - type: precision_at_1
262
+ value: 0.41274
263
+ - type: precision_at_10
264
+ value: 0.09828
265
+ - type: precision_at_100
266
+ value: 0.01573
267
+ - type: precision_at_1000
268
+ value: 0.00202
269
+ - type: precision_at_3
270
+ value: 0.22718
271
+ - type: precision_at_5
272
+ value: 0.16064
273
+ - type: recall_at_1
274
+ value: 0.32748
275
+ - type: recall_at_10
276
+ value: 0.60322
277
+ - type: recall_at_100
278
+ value: 0.79669
279
+ - type: recall_at_1000
280
+ value: 0.9173
281
+ - type: recall_at_3
282
+ value: 0.47523
283
+ - type: recall_at_5
284
+ value: 0.52957
285
+ - dataset:
286
+ type: mteb/cqadupstack-gaming
287
+ name: MTEB CQADupstackGamingRetrieval
288
+ config: default
289
+ split: test
290
+ task:
291
+ type: Retrieval
292
+ metrics:
293
+ - type: map_at_1
294
+ value: 0.41126
295
+ - type: map_at_10
296
+ value: 0.53661
297
+ - type: map_at_100
298
+ value: 0.54588
299
+ - type: map_at_1000
300
+ value: 0.54638
301
+ - type: map_at_3
302
+ value: 0.50389
303
+ - type: map_at_5
304
+ value: 0.52286
305
+ - type: mrr_at_1
306
+ value: 0.47147
307
+ - type: mrr_at_10
308
+ value: 0.5685
309
+ - type: mrr_at_100
310
+ value: 0.57458
311
+ - type: mrr_at_1000
312
+ value: 0.57487
313
+ - type: mrr_at_3
314
+ value: 0.54431
315
+ - type: mrr_at_5
316
+ value: 0.55957
317
+ - type: ndcg_at_1
318
+ value: 0.47147
319
+ - type: ndcg_at_10
320
+ value: 0.59318
321
+ - type: ndcg_at_100
322
+ value: 0.62972
323
+ - type: ndcg_at_1000
324
+ value: 0.64033
325
+ - type: ndcg_at_3
326
+ value: 0.53969
327
+ - type: ndcg_at_5
328
+ value: 0.56743
329
+ - type: precision_at_1
330
+ value: 0.47147
331
+ - type: precision_at_10
332
+ value: 0.09549
333
+ - type: precision_at_100
334
+ value: 0.01224
335
+ - type: precision_at_1000
336
+ value: 0.00135
337
+ - type: precision_at_3
338
+ value: 0.24159
339
+ - type: precision_at_5
340
+ value: 0.16577
341
+ - type: recall_at_1
342
+ value: 0.41126
343
+ - type: recall_at_10
344
+ value: 0.72691
345
+ - type: recall_at_100
346
+ value: 0.88692
347
+ - type: recall_at_1000
348
+ value: 0.96232
349
+ - type: recall_at_3
350
+ value: 0.58374
351
+ - type: recall_at_5
352
+ value: 0.65226
353
+ - dataset:
354
+ type: mteb/cqadupstack-gis
355
+ name: MTEB CQADupstackGisRetrieval
356
+ config: default
357
+ split: test
358
+ task:
359
+ type: Retrieval
360
+ metrics:
361
+ - type: map_at_1
362
+ value: 0.28464
363
+ - type: map_at_10
364
+ value: 0.3828
365
+ - type: map_at_100
366
+ value: 0.39277
367
+ - type: map_at_1000
368
+ value: 0.39355
369
+ - type: map_at_3
370
+ value: 0.35704
371
+ - type: map_at_5
372
+ value: 0.37116
373
+ - type: mrr_at_1
374
+ value: 0.30734
375
+ - type: mrr_at_10
376
+ value: 0.40422
377
+ - type: mrr_at_100
378
+ value: 0.41297
379
+ - type: mrr_at_1000
380
+ value: 0.41355
381
+ - type: mrr_at_3
382
+ value: 0.38136
383
+ - type: mrr_at_5
384
+ value: 0.39362
385
+ - type: ndcg_at_1
386
+ value: 0.30734
387
+ - type: ndcg_at_10
388
+ value: 0.43564
389
+ - type: ndcg_at_100
390
+ value: 0.48419
391
+ - type: ndcg_at_1000
392
+ value: 0.50404
393
+ - type: ndcg_at_3
394
+ value: 0.38672
395
+ - type: ndcg_at_5
396
+ value: 0.40954
397
+ - type: precision_at_1
398
+ value: 0.30734
399
+ - type: precision_at_10
400
+ value: 0.06633
401
+ - type: precision_at_100
402
+ value: 0.00956
403
+ - type: precision_at_1000
404
+ value: 0.00116
405
+ - type: precision_at_3
406
+ value: 0.16497
407
+ - type: precision_at_5
408
+ value: 0.11254
409
+ - type: recall_at_1
410
+ value: 0.28464
411
+ - type: recall_at_10
412
+ value: 0.57621
413
+ - type: recall_at_100
414
+ value: 0.7966
415
+ - type: recall_at_1000
416
+ value: 0.94633
417
+ - type: recall_at_3
418
+ value: 0.44588
419
+ - type: recall_at_5
420
+ value: 0.50031
421
+ - dataset:
422
+ type: mteb/cqadupstack-mathematica
423
+ name: MTEB CQADupstackMathematicaRetrieval
424
+ config: default
425
+ split: test
426
+ task:
427
+ type: Retrieval
428
+ metrics:
429
+ - type: map_at_1
430
+ value: 0.18119
431
+ - type: map_at_10
432
+ value: 0.27055
433
+ - type: map_at_100
434
+ value: 0.28461
435
+ - type: map_at_1000
436
+ value: 0.28577
437
+ - type: map_at_3
438
+ value: 0.24341
439
+ - type: map_at_5
440
+ value: 0.25861
441
+ - type: mrr_at_1
442
+ value: 0.22886
443
+ - type: mrr_at_10
444
+ value: 0.32234
445
+ - type: mrr_at_100
446
+ value: 0.3328
447
+ - type: mrr_at_1000
448
+ value: 0.3334
449
+ - type: mrr_at_3
450
+ value: 0.29664
451
+ - type: mrr_at_5
452
+ value: 0.31107
453
+ - type: ndcg_at_1
454
+ value: 0.22886
455
+ - type: ndcg_at_10
456
+ value: 0.32749
457
+ - type: ndcg_at_100
458
+ value: 0.39095
459
+ - type: ndcg_at_1000
460
+ value: 0.41656
461
+ - type: ndcg_at_3
462
+ value: 0.27864
463
+ - type: ndcg_at_5
464
+ value: 0.30177
465
+ - type: precision_at_1
466
+ value: 0.22886
467
+ - type: precision_at_10
468
+ value: 0.06169
469
+ - type: precision_at_100
470
+ value: 0.0107
471
+ - type: precision_at_1000
472
+ value: 0.00143
473
+ - type: precision_at_3
474
+ value: 0.13682
475
+ - type: precision_at_5
476
+ value: 0.0995
477
+ - type: recall_at_1
478
+ value: 0.18119
479
+ - type: recall_at_10
480
+ value: 0.44983
481
+ - type: recall_at_100
482
+ value: 0.72396
483
+ - type: recall_at_1000
484
+ value: 0.90223
485
+ - type: recall_at_3
486
+ value: 0.31633
487
+ - type: recall_at_5
488
+ value: 0.37532
489
+ - dataset:
490
+ type: mteb/cqadupstack-physics
491
+ name: MTEB CQADupstackPhysicsRetrieval
492
+ config: default
493
+ split: test
494
+ task:
495
+ type: Retrieval
496
+ metrics:
497
+ - type: map_at_1
498
+ value: 0.30517
499
+ - type: map_at_10
500
+ value: 0.42031
501
+ - type: map_at_100
502
+ value: 0.43415
503
+ - type: map_at_1000
504
+ value: 0.43525
505
+ - type: map_at_3
506
+ value: 0.38443
507
+ - type: map_at_5
508
+ value: 0.40685
509
+ - type: mrr_at_1
510
+ value: 0.38114
511
+ - type: mrr_at_10
512
+ value: 0.47783
513
+ - type: mrr_at_100
514
+ value: 0.48647
515
+ - type: mrr_at_1000
516
+ value: 0.48688
517
+ - type: mrr_at_3
518
+ value: 0.45172
519
+ - type: mrr_at_5
520
+ value: 0.46817
521
+ - type: ndcg_at_1
522
+ value: 0.38114
523
+ - type: ndcg_at_10
524
+ value: 0.4834
525
+ - type: ndcg_at_100
526
+ value: 0.53861
527
+ - type: ndcg_at_1000
528
+ value: 0.55701
529
+ - type: ndcg_at_3
530
+ value: 0.42986
531
+ - type: ndcg_at_5
532
+ value: 0.45893
533
+ - type: precision_at_1
534
+ value: 0.38114
535
+ - type: precision_at_10
536
+ value: 0.08893
537
+ - type: precision_at_100
538
+ value: 0.01375
539
+ - type: precision_at_1000
540
+ value: 0.00172
541
+ - type: precision_at_3
542
+ value: 0.20821
543
+ - type: precision_at_5
544
+ value: 0.15034
545
+ - type: recall_at_1
546
+ value: 0.30517
547
+ - type: recall_at_10
548
+ value: 0.61332
549
+ - type: recall_at_100
550
+ value: 0.84051
551
+ - type: recall_at_1000
552
+ value: 0.95826
553
+ - type: recall_at_3
554
+ value: 0.46015
555
+ - type: recall_at_5
556
+ value: 0.53801
557
+ - dataset:
558
+ type: mteb/cqadupstack-programmers
559
+ name: MTEB CQADupstackProgrammersRetrieval
560
+ config: default
561
+ split: test
562
+ task:
563
+ type: Retrieval
564
+ metrics:
565
+ - type: map_at_1
566
+ value: 0.27396
567
+ - type: map_at_10
568
+ value: 0.38043
569
+ - type: map_at_100
570
+ value: 0.39341
571
+ - type: map_at_1000
572
+ value: 0.39454
573
+ - type: map_at_3
574
+ value: 0.34783
575
+ - type: map_at_5
576
+ value: 0.3663
577
+ - type: mrr_at_1
578
+ value: 0.34247
579
+ - type: mrr_at_10
580
+ value: 0.43681
581
+ - type: mrr_at_100
582
+ value: 0.4451
583
+ - type: mrr_at_1000
584
+ value: 0.44569
585
+ - type: mrr_at_3
586
+ value: 0.41172
587
+ - type: mrr_at_5
588
+ value: 0.42702
589
+ - type: ndcg_at_1
590
+ value: 0.34247
591
+ - type: ndcg_at_10
592
+ value: 0.44065
593
+ - type: ndcg_at_100
594
+ value: 0.49434
595
+ - type: ndcg_at_1000
596
+ value: 0.51682
597
+ - type: ndcg_at_3
598
+ value: 0.38976
599
+ - type: ndcg_at_5
600
+ value: 0.41332
601
+ - type: precision_at_1
602
+ value: 0.34247
603
+ - type: precision_at_10
604
+ value: 0.08059
605
+ - type: precision_at_100
606
+ value: 0.01258
607
+ - type: precision_at_1000
608
+ value: 0.00162
609
+ - type: precision_at_3
610
+ value: 0.1876
611
+ - type: precision_at_5
612
+ value: 0.13333
613
+ - type: recall_at_1
614
+ value: 0.27396
615
+ - type: recall_at_10
616
+ value: 0.56481
617
+ - type: recall_at_100
618
+ value: 0.79012
619
+ - type: recall_at_1000
620
+ value: 0.94182
621
+ - type: recall_at_3
622
+ value: 0.41785
623
+ - type: recall_at_5
624
+ value: 0.48303
625
+ - dataset:
626
+ type: mteb/cqadupstack-stats
627
+ name: MTEB CQADupstackStatsRetrieval
628
+ config: default
629
+ split: test
630
+ task:
631
+ type: Retrieval
632
+ metrics:
633
+ - type: map_at_1
634
+ value: 0.25728
635
+ - type: map_at_10
636
+ value: 0.33903
637
+ - type: map_at_100
638
+ value: 0.34853
639
+ - type: map_at_1000
640
+ value: 0.34944
641
+ - type: map_at_3
642
+ value: 0.31268
643
+ - type: map_at_5
644
+ value: 0.32596
645
+ - type: mrr_at_1
646
+ value: 0.29141
647
+ - type: mrr_at_10
648
+ value: 0.36739
649
+ - type: mrr_at_100
650
+ value: 0.37545
651
+ - type: mrr_at_1000
652
+ value: 0.37608
653
+ - type: mrr_at_3
654
+ value: 0.34407
655
+ - type: mrr_at_5
656
+ value: 0.3568
657
+ - type: ndcg_at_1
658
+ value: 0.29141
659
+ - type: ndcg_at_10
660
+ value: 0.38596
661
+ - type: ndcg_at_100
662
+ value: 0.43375
663
+ - type: ndcg_at_1000
664
+ value: 0.45562
665
+ - type: ndcg_at_3
666
+ value: 0.33861
667
+ - type: ndcg_at_5
668
+ value: 0.35887
669
+ - type: precision_at_1
670
+ value: 0.29141
671
+ - type: precision_at_10
672
+ value: 0.06334
673
+ - type: precision_at_100
674
+ value: 0.00952
675
+ - type: precision_at_1000
676
+ value: 0.00121
677
+ - type: precision_at_3
678
+ value: 0.14826
679
+ - type: precision_at_5
680
+ value: 0.10429
681
+ - type: recall_at_1
682
+ value: 0.25728
683
+ - type: recall_at_10
684
+ value: 0.50121
685
+ - type: recall_at_100
686
+ value: 0.72382
687
+ - type: recall_at_1000
688
+ value: 0.88306
689
+ - type: recall_at_3
690
+ value: 0.36638
691
+ - type: recall_at_5
692
+ value: 0.41689
693
+ - dataset:
694
+ type: mteb/cqadupstack-tex
695
+ name: MTEB CQADupstackTexRetrieval
696
+ config: default
697
+ split: test
698
+ task:
699
+ type: Retrieval
700
+ metrics:
701
+ - type: map_at_1
702
+ value: 0.19911
703
+ - type: map_at_10
704
+ value: 0.2856
705
+ - type: map_at_100
706
+ value: 0.29785
707
+ - type: map_at_1000
708
+ value: 0.29911
709
+ - type: map_at_3
710
+ value: 0.25875
711
+ - type: map_at_5
712
+ value: 0.2741
713
+ - type: mrr_at_1
714
+ value: 0.24054
715
+ - type: mrr_at_10
716
+ value: 0.32483
717
+ - type: mrr_at_100
718
+ value: 0.33464
719
+ - type: mrr_at_1000
720
+ value: 0.33534
721
+ - type: mrr_at_3
722
+ value: 0.30162
723
+ - type: mrr_at_5
724
+ value: 0.31506
725
+ - type: ndcg_at_1
726
+ value: 0.24054
727
+ - type: ndcg_at_10
728
+ value: 0.33723
729
+ - type: ndcg_at_100
730
+ value: 0.39362
731
+ - type: ndcg_at_1000
732
+ value: 0.42065
733
+ - type: ndcg_at_3
734
+ value: 0.29116
735
+ - type: ndcg_at_5
736
+ value: 0.31299
737
+ - type: precision_at_1
738
+ value: 0.24054
739
+ - type: precision_at_10
740
+ value: 0.06194
741
+ - type: precision_at_100
742
+ value: 0.01058
743
+ - type: precision_at_1000
744
+ value: 0.00148
745
+ - type: precision_at_3
746
+ value: 0.13914
747
+ - type: precision_at_5
748
+ value: 0.10076
749
+ - type: recall_at_1
750
+ value: 0.19911
751
+ - type: recall_at_10
752
+ value: 0.45183
753
+ - type: recall_at_100
754
+ value: 0.7025
755
+ - type: recall_at_1000
756
+ value: 0.89222
757
+ - type: recall_at_3
758
+ value: 0.32195
759
+ - type: recall_at_5
760
+ value: 0.37852
761
+ - dataset:
762
+ type: mteb/cqadupstack-unix
763
+ name: MTEB CQADupstackUnixRetrieval
764
+ config: default
765
+ split: test
766
+ task:
767
+ type: Retrieval
768
+ metrics:
769
+ - type: map_at_1
770
+ value: 0.29819
771
+ - type: map_at_10
772
+ value: 0.40073
773
+ - type: map_at_100
774
+ value: 0.41289
775
+ - type: map_at_1000
776
+ value: 0.41375
777
+ - type: map_at_3
778
+ value: 0.36572
779
+ - type: map_at_5
780
+ value: 0.38386
781
+ - type: mrr_at_1
782
+ value: 0.35168
783
+ - type: mrr_at_10
784
+ value: 0.44381
785
+ - type: mrr_at_100
786
+ value: 0.45191
787
+ - type: mrr_at_1000
788
+ value: 0.45234
789
+ - type: mrr_at_3
790
+ value: 0.41402
791
+ - type: mrr_at_5
792
+ value: 0.43039
793
+ - type: ndcg_at_1
794
+ value: 0.35168
795
+ - type: ndcg_at_10
796
+ value: 0.46071
797
+ - type: ndcg_at_100
798
+ value: 0.51351
799
+ - type: ndcg_at_1000
800
+ value: 0.5317
801
+ - type: ndcg_at_3
802
+ value: 0.39972
803
+ - type: ndcg_at_5
804
+ value: 0.42586
805
+ - type: precision_at_1
806
+ value: 0.35168
807
+ - type: precision_at_10
808
+ value: 0.07985
809
+ - type: precision_at_100
810
+ value: 0.01185
811
+ - type: precision_at_1000
812
+ value: 0.00144
813
+ - type: precision_at_3
814
+ value: 0.18221
815
+ - type: precision_at_5
816
+ value: 0.12892
817
+ - type: recall_at_1
818
+ value: 0.29819
819
+ - type: recall_at_10
820
+ value: 0.60075
821
+ - type: recall_at_100
822
+ value: 0.82771
823
+ - type: recall_at_1000
824
+ value: 0.95219
825
+ - type: recall_at_3
826
+ value: 0.43245
827
+ - type: recall_at_5
828
+ value: 0.49931
829
+ - dataset:
830
+ type: mteb/cqadupstack-webmasters
831
+ name: MTEB CQADupstackWebmastersRetrieval
832
+ config: default
833
+ split: test
834
+ task:
835
+ type: Retrieval
836
+ metrics:
837
+ - type: map_at_1
838
+ value: 0.28409
839
+ - type: map_at_10
840
+ value: 0.37621
841
+ - type: map_at_100
842
+ value: 0.39233
843
+ - type: map_at_1000
844
+ value: 0.39471
845
+ - type: map_at_3
846
+ value: 0.34337
847
+ - type: map_at_5
848
+ value: 0.35985
849
+ - type: mrr_at_1
850
+ value: 0.33794
851
+ - type: mrr_at_10
852
+ value: 0.42349
853
+ - type: mrr_at_100
854
+ value: 0.43196
855
+ - type: mrr_at_1000
856
+ value: 0.43237
857
+ - type: mrr_at_3
858
+ value: 0.39526
859
+ - type: mrr_at_5
860
+ value: 0.41087
861
+ - type: ndcg_at_1
862
+ value: 0.33794
863
+ - type: ndcg_at_10
864
+ value: 0.43832
865
+ - type: ndcg_at_100
866
+ value: 0.49514
867
+ - type: ndcg_at_1000
868
+ value: 0.51742
869
+ - type: ndcg_at_3
870
+ value: 0.38442
871
+ - type: ndcg_at_5
872
+ value: 0.40737
873
+ - type: precision_at_1
874
+ value: 0.33794
875
+ - type: precision_at_10
876
+ value: 0.08597
877
+ - type: precision_at_100
878
+ value: 0.01652
879
+ - type: precision_at_1000
880
+ value: 0.00251
881
+ - type: precision_at_3
882
+ value: 0.17787
883
+ - type: precision_at_5
884
+ value: 0.13241
885
+ - type: recall_at_1
886
+ value: 0.28409
887
+ - type: recall_at_10
888
+ value: 0.55388
889
+ - type: recall_at_100
890
+ value: 0.81517
891
+ - type: recall_at_1000
892
+ value: 0.95038
893
+ - type: recall_at_3
894
+ value: 0.40133
895
+ - type: recall_at_5
896
+ value: 0.45913
897
+ - dataset:
898
+ type: mteb/cqadupstack-wordpress
899
+ name: MTEB CQADupstackWordpressRetrieval
900
+ config: default
901
+ split: test
902
+ task:
903
+ type: Retrieval
904
+ metrics:
905
+ - type: map_at_1
906
+ value: 0.24067
907
+ - type: map_at_10
908
+ value: 0.32184
909
+ - type: map_at_100
910
+ value: 0.33357
911
+ - type: map_at_1000
912
+ value: 0.33458
913
+ - type: map_at_3
914
+ value: 0.29492
915
+ - type: map_at_5
916
+ value: 0.3111
917
+ - type: mrr_at_1
918
+ value: 0.26248
919
+ - type: mrr_at_10
920
+ value: 0.34149
921
+ - type: mrr_at_100
922
+ value: 0.35189
923
+ - type: mrr_at_1000
924
+ value: 0.35251
925
+ - type: mrr_at_3
926
+ value: 0.31639
927
+ - type: mrr_at_5
928
+ value: 0.33182
929
+ - type: ndcg_at_1
930
+ value: 0.26248
931
+ - type: ndcg_at_10
932
+ value: 0.36889
933
+ - type: ndcg_at_100
934
+ value: 0.42426
935
+ - type: ndcg_at_1000
936
+ value: 0.44745
937
+ - type: ndcg_at_3
938
+ value: 0.31799
939
+ - type: ndcg_at_5
940
+ value: 0.34563
941
+ - type: precision_at_1
942
+ value: 0.26248
943
+ - type: precision_at_10
944
+ value: 0.05712
945
+ - type: precision_at_100
946
+ value: 0.00915
947
+ - type: precision_at_1000
948
+ value: 0.00123
949
+ - type: precision_at_3
950
+ value: 0.13309
951
+ - type: precision_at_5
952
+ value: 0.09649
953
+ - type: recall_at_1
954
+ value: 0.24067
955
+ - type: recall_at_10
956
+ value: 0.49344
957
+ - type: recall_at_100
958
+ value: 0.7412
959
+ - type: recall_at_1000
960
+ value: 0.91276
961
+ - type: recall_at_3
962
+ value: 0.36272
963
+ - type: recall_at_5
964
+ value: 0.4277
965
+ - dataset:
966
+ type: mteb/dbpedia
967
+ name: MTEB DBPedia
968
+ config: default
969
+ split: test
970
+ task:
971
+ type: Retrieval
972
+ metrics:
973
+ - type: map_at_1
974
+ value: 0.08651
975
+ - type: map_at_10
976
+ value: 0.17628
977
+ - type: map_at_100
978
+ value: 0.23354
979
+ - type: map_at_1000
980
+ value: 0.24827
981
+ - type: map_at_3
982
+ value: 0.1351
983
+ - type: map_at_5
984
+ value: 0.15468
985
+ - type: mrr_at_1
986
+ value: 0.645
987
+ - type: mrr_at_10
988
+ value: 0.71989
989
+ - type: mrr_at_100
990
+ value: 0.72332
991
+ - type: mrr_at_1000
992
+ value: 0.72346
993
+ - type: mrr_at_3
994
+ value: 0.7025
995
+ - type: mrr_at_5
996
+ value: 0.71275
997
+ - type: ndcg_at_1
998
+ value: 0.51375
999
+ - type: ndcg_at_10
1000
+ value: 0.3596
1001
+ - type: ndcg_at_100
1002
+ value: 0.39878
1003
+ - type: ndcg_at_1000
1004
+ value: 0.47931
1005
+ - type: ndcg_at_3
1006
+ value: 0.41275
1007
+ - type: ndcg_at_5
1008
+ value: 0.38297
1009
+ - type: precision_at_1
1010
+ value: 0.645
1011
+ - type: precision_at_10
1012
+ value: 0.2745
1013
+ - type: precision_at_100
1014
+ value: 0.08405
1015
+ - type: precision_at_1000
1016
+ value: 0.01923
1017
+ - type: precision_at_3
1018
+ value: 0.44417
1019
+ - type: precision_at_5
1020
+ value: 0.366
1021
+ - type: recall_at_1
1022
+ value: 0.08651
1023
+ - type: recall_at_10
1024
+ value: 0.22416
1025
+ - type: recall_at_100
1026
+ value: 0.46381
1027
+ - type: recall_at_1000
1028
+ value: 0.71557
1029
+ - type: recall_at_3
1030
+ value: 0.14847
1031
+ - type: recall_at_5
1032
+ value: 0.1804
1033
+ - dataset:
1034
+ type: mteb/fever
1035
+ name: MTEB FEVER
1036
+ config: default
1037
+ split: test
1038
+ task:
1039
+ type: Retrieval
1040
+ metrics:
1041
+ - type: map_at_1
1042
+ value: 0.73211
1043
+ - type: map_at_10
1044
+ value: 0.81463
1045
+ - type: map_at_100
1046
+ value: 0.81622
1047
+ - type: map_at_1000
1048
+ value: 0.81634
1049
+ - type: map_at_3
1050
+ value: 0.805
1051
+ - type: map_at_5
1052
+ value: 0.81134
1053
+ - type: mrr_at_1
1054
+ value: 0.79088
1055
+ - type: mrr_at_10
1056
+ value: 0.86943
1057
+ - type: mrr_at_100
1058
+ value: 0.87017
1059
+ - type: mrr_at_1000
1060
+ value: 0.87018
1061
+ - type: mrr_at_3
1062
+ value: 0.86154
1063
+ - type: mrr_at_5
1064
+ value: 0.867
1065
+ - type: ndcg_at_1
1066
+ value: 0.79088
1067
+ - type: ndcg_at_10
1068
+ value: 0.85528
1069
+ - type: ndcg_at_100
1070
+ value: 0.86134
1071
+ - type: ndcg_at_1000
1072
+ value: 0.86367
1073
+ - type: ndcg_at_3
1074
+ value: 0.83943
1075
+ - type: ndcg_at_5
1076
+ value: 0.84878
1077
+ - type: precision_at_1
1078
+ value: 0.79088
1079
+ - type: precision_at_10
1080
+ value: 0.10132
1081
+ - type: precision_at_100
1082
+ value: 0.01055
1083
+ - type: precision_at_1000
1084
+ value: 0.00109
1085
+ - type: precision_at_3
1086
+ value: 0.31963
1087
+ - type: precision_at_5
1088
+ value: 0.19769
1089
+ - type: recall_at_1
1090
+ value: 0.73211
1091
+ - type: recall_at_10
1092
+ value: 0.92797
1093
+ - type: recall_at_100
1094
+ value: 0.95263
1095
+ - type: recall_at_1000
1096
+ value: 0.96738
1097
+ - type: recall_at_3
1098
+ value: 0.88328
1099
+ - type: recall_at_5
1100
+ value: 0.90821
1101
+ - dataset:
1102
+ type: mteb/fiqa
1103
+ name: MTEB FiQA2018
1104
+ config: default
1105
+ split: test
1106
+ task:
1107
+ type: Retrieval
1108
+ metrics:
1109
+ - type: map_at_1
1110
+ value: 0.18311
1111
+ - type: map_at_10
1112
+ value: 0.29201
1113
+ - type: map_at_100
1114
+ value: 0.3093
1115
+ - type: map_at_1000
1116
+ value: 0.31116
1117
+ - type: map_at_3
1118
+ value: 0.24778
1119
+ - type: map_at_5
1120
+ value: 0.27453
1121
+ - type: mrr_at_1
1122
+ value: 0.35494
1123
+ - type: mrr_at_10
1124
+ value: 0.44489
1125
+ - type: mrr_at_100
1126
+ value: 0.4532
1127
+ - type: mrr_at_1000
1128
+ value: 0.45369
1129
+ - type: mrr_at_3
1130
+ value: 0.41667
1131
+ - type: mrr_at_5
1132
+ value: 0.43418
1133
+ - type: ndcg_at_1
1134
+ value: 0.35494
1135
+ - type: ndcg_at_10
1136
+ value: 0.36868
1137
+ - type: ndcg_at_100
1138
+ value: 0.43463
1139
+ - type: ndcg_at_1000
1140
+ value: 0.46766
1141
+ - type: ndcg_at_3
1142
+ value: 0.32305
1143
+ - type: ndcg_at_5
1144
+ value: 0.34332
1145
+ - type: precision_at_1
1146
+ value: 0.35494
1147
+ - type: precision_at_10
1148
+ value: 0.10324
1149
+ - type: precision_at_100
1150
+ value: 0.01707
1151
+ - type: precision_at_1000
1152
+ value: 0.00229
1153
+ - type: precision_at_3
1154
+ value: 0.21142
1155
+ - type: precision_at_5
1156
+ value: 0.16327
1157
+ - type: recall_at_1
1158
+ value: 0.18311
1159
+ - type: recall_at_10
1160
+ value: 0.43881
1161
+ - type: recall_at_100
1162
+ value: 0.68593
1163
+ - type: recall_at_1000
1164
+ value: 0.8855
1165
+ - type: recall_at_3
1166
+ value: 0.28824
1167
+ - type: recall_at_5
1168
+ value: 0.36178
1169
+ - dataset:
1170
+ type: mteb/hotpotqa
1171
+ name: MTEB HotpotQA
1172
+ config: default
1173
+ split: test
1174
+ task:
1175
+ type: Retrieval
1176
+ metrics:
1177
+ - type: map_at_1
1178
+ value: 0.36766
1179
+ - type: map_at_10
1180
+ value: 0.53639
1181
+ - type: map_at_100
1182
+ value: 0.54532
1183
+ - type: map_at_1000
1184
+ value: 0.54608
1185
+ - type: map_at_3
1186
+ value: 0.50427
1187
+ - type: map_at_5
1188
+ value: 0.5245
1189
+ - type: mrr_at_1
1190
+ value: 0.73531
1191
+ - type: mrr_at_10
1192
+ value: 0.80104
1193
+ - type: mrr_at_100
1194
+ value: 0.80341
1195
+ - type: mrr_at_1000
1196
+ value: 0.80351
1197
+ - type: mrr_at_3
1198
+ value: 0.78949
1199
+ - type: mrr_at_5
1200
+ value: 0.79729
1201
+ - type: ndcg_at_1
1202
+ value: 0.73531
1203
+ - type: ndcg_at_10
1204
+ value: 0.62918
1205
+ - type: ndcg_at_100
1206
+ value: 0.66056
1207
+ - type: ndcg_at_1000
1208
+ value: 0.67554
1209
+ - type: ndcg_at_3
1210
+ value: 0.58247
1211
+ - type: ndcg_at_5
1212
+ value: 0.60905
1213
+ - type: precision_at_1
1214
+ value: 0.73531
1215
+ - type: precision_at_10
1216
+ value: 0.1302
1217
+ - type: precision_at_100
1218
+ value: 0.01546
1219
+ - type: precision_at_1000
1220
+ value: 0.00175
1221
+ - type: precision_at_3
1222
+ value: 0.36556
1223
+ - type: precision_at_5
1224
+ value: 0.24032
1225
+ - type: recall_at_1
1226
+ value: 0.36766
1227
+ - type: recall_at_10
1228
+ value: 0.65098
1229
+ - type: recall_at_100
1230
+ value: 0.77306
1231
+ - type: recall_at_1000
1232
+ value: 0.87252
1233
+ - type: recall_at_3
1234
+ value: 0.54835
1235
+ - type: recall_at_5
1236
+ value: 0.60081
1237
+ - dataset:
1238
+ type: mteb/msmarco
1239
+ name: MTEB MSMARCO
1240
+ config: default
1241
+ split: dev
1242
+ task:
1243
+ type: Retrieval
1244
+ metrics:
1245
+ - type: map_at_1
1246
+ value: 0.14654
1247
+ - type: map_at_10
1248
+ value: 0.2472
1249
+ - type: map_at_100
1250
+ value: 0.25994
1251
+ - type: map_at_1000
1252
+ value: 0.26067
1253
+ - type: map_at_3
1254
+ value: 0.21234
1255
+ - type: map_at_5
1256
+ value: 0.2319
1257
+ - type: mrr_at_1
1258
+ value: 0.15086
1259
+ - type: mrr_at_10
1260
+ value: 0.25184
1261
+ - type: mrr_at_100
1262
+ value: 0.26422
1263
+ - type: mrr_at_1000
1264
+ value: 0.26489
1265
+ - type: mrr_at_3
1266
+ value: 0.21731
1267
+ - type: mrr_at_5
1268
+ value: 0.23674
1269
+ - type: ndcg_at_1
1270
+ value: 0.15086
1271
+ - type: ndcg_at_10
1272
+ value: 0.30711
1273
+ - type: ndcg_at_100
1274
+ value: 0.37221
1275
+ - type: ndcg_at_1000
1276
+ value: 0.39133
1277
+ - type: ndcg_at_3
1278
+ value: 0.23567
1279
+ - type: ndcg_at_5
1280
+ value: 0.27066
1281
+ - type: precision_at_1
1282
+ value: 0.15086
1283
+ - type: precision_at_10
1284
+ value: 0.05132
1285
+ - type: precision_at_100
1286
+ value: 0.00845
1287
+ - type: precision_at_1000
1288
+ value: 0.00101
1289
+ - type: precision_at_3
1290
+ value: 0.10277
1291
+ - type: precision_at_5
1292
+ value: 0.07923
1293
+ - type: recall_at_1
1294
+ value: 0.14654
1295
+ - type: recall_at_10
1296
+ value: 0.49341
1297
+ - type: recall_at_100
1298
+ value: 0.80224
1299
+ - type: recall_at_1000
1300
+ value: 0.95037
1301
+ - type: recall_at_3
1302
+ value: 0.29862
1303
+ - type: recall_at_5
1304
+ value: 0.38274
1305
+ - dataset:
1306
+ type: mteb/nfcorpus
1307
+ name: MTEB NFCorpus
1308
+ config: default
1309
+ split: test
1310
+ task:
1311
+ type: Retrieval
1312
+ metrics:
1313
+ - type: map_at_1
1314
+ value: 0.05452
1315
+ - type: map_at_10
1316
+ value: 0.12758
1317
+ - type: map_at_100
1318
+ value: 0.1593
1319
+ - type: map_at_1000
1320
+ value: 0.17422
1321
+ - type: map_at_3
1322
+ value: 0.0945
1323
+ - type: map_at_5
1324
+ value: 0.1092
1325
+ - type: mrr_at_1
1326
+ value: 0.43963
1327
+ - type: mrr_at_10
1328
+ value: 0.53237
1329
+ - type: mrr_at_100
1330
+ value: 0.53777
1331
+ - type: mrr_at_1000
1332
+ value: 0.53822
1333
+ - type: mrr_at_3
1334
+ value: 0.51445
1335
+ - type: mrr_at_5
1336
+ value: 0.52466
1337
+ - type: ndcg_at_1
1338
+ value: 0.41486
1339
+ - type: ndcg_at_10
1340
+ value: 0.33737
1341
+ - type: ndcg_at_100
1342
+ value: 0.30886
1343
+ - type: ndcg_at_1000
1344
+ value: 0.40018
1345
+ - type: ndcg_at_3
1346
+ value: 0.39324
1347
+ - type: ndcg_at_5
1348
+ value: 0.36949
1349
+ - type: precision_at_1
1350
+ value: 0.43344
1351
+ - type: precision_at_10
1352
+ value: 0.24799
1353
+ - type: precision_at_100
1354
+ value: 0.07895
1355
+ - type: precision_at_1000
1356
+ value: 0.02091
1357
+ - type: precision_at_3
1358
+ value: 0.37152
1359
+ - type: precision_at_5
1360
+ value: 0.31703
1361
+ - type: recall_at_1
1362
+ value: 0.05452
1363
+ - type: recall_at_10
1364
+ value: 0.1712
1365
+ - type: recall_at_100
1366
+ value: 0.30719
1367
+ - type: recall_at_1000
1368
+ value: 0.62766
1369
+ - type: recall_at_3
1370
+ value: 0.10733
1371
+ - type: recall_at_5
1372
+ value: 0.13553
1373
+ - dataset:
1374
+ type: mteb/nq
1375
+ name: MTEB NQ
1376
+ config: default
1377
+ split: test
1378
+ task:
1379
+ type: Retrieval
1380
+ metrics:
1381
+ - type: map_at_1
1382
+ value: 0.29022
1383
+ - type: map_at_10
1384
+ value: 0.4373
1385
+ - type: map_at_100
1386
+ value: 0.44849
1387
+ - type: map_at_1000
1388
+ value: 0.44877
1389
+ - type: map_at_3
1390
+ value: 0.39045
1391
+ - type: map_at_5
1392
+ value: 0.4186
1393
+ - type: mrr_at_1
1394
+ value: 0.32793
1395
+ - type: mrr_at_10
1396
+ value: 0.46243
1397
+ - type: mrr_at_100
1398
+ value: 0.47083
1399
+ - type: mrr_at_1000
1400
+ value: 0.47101
1401
+ - type: mrr_at_3
1402
+ value: 0.42261
1403
+ - type: mrr_at_5
1404
+ value: 0.44775
1405
+ - type: ndcg_at_1
1406
+ value: 0.32793
1407
+ - type: ndcg_at_10
1408
+ value: 0.51631
1409
+ - type: ndcg_at_100
1410
+ value: 0.56287
1411
+ - type: ndcg_at_1000
1412
+ value: 0.56949
1413
+ - type: ndcg_at_3
1414
+ value: 0.42782
1415
+ - type: ndcg_at_5
1416
+ value: 0.47554
1417
+ - type: precision_at_1
1418
+ value: 0.32793
1419
+ - type: precision_at_10
1420
+ value: 0.08737
1421
+ - type: precision_at_100
1422
+ value: 0.01134
1423
+ - type: precision_at_1000
1424
+ value: 0.0012
1425
+ - type: precision_at_3
1426
+ value: 0.19583
1427
+ - type: precision_at_5
1428
+ value: 0.14484
1429
+ - type: recall_at_1
1430
+ value: 0.29022
1431
+ - type: recall_at_10
1432
+ value: 0.73325
1433
+ - type: recall_at_100
1434
+ value: 0.93455
1435
+ - type: recall_at_1000
1436
+ value: 0.98414
1437
+ - type: recall_at_3
1438
+ value: 0.50406
1439
+ - type: recall_at_5
1440
+ value: 0.6145
1441
+ - dataset:
1442
+ type: mteb/quora
1443
+ name: MTEB QuoraRetrieval
1444
+ config: default
1445
+ split: test
1446
+ task:
1447
+ type: Retrieval
1448
+ metrics:
1449
+ - type: map_at_1
1450
+ value: 0.68941
1451
+ - type: map_at_10
1452
+ value: 0.82641
1453
+ - type: map_at_100
1454
+ value: 0.83317
1455
+ - type: map_at_1000
1456
+ value: 0.83337
1457
+ - type: map_at_3
1458
+ value: 0.79604
1459
+ - type: map_at_5
1460
+ value: 0.81525
1461
+ - type: mrr_at_1
1462
+ value: 0.7935
1463
+ - type: mrr_at_10
1464
+ value: 0.85969
1465
+ - type: mrr_at_100
1466
+ value: 0.86094
1467
+ - type: mrr_at_1000
1468
+ value: 0.86095
1469
+ - type: mrr_at_3
1470
+ value: 0.84852
1471
+ - type: mrr_at_5
1472
+ value: 0.85627
1473
+ - type: ndcg_at_1
1474
+ value: 0.7936
1475
+ - type: ndcg_at_10
1476
+ value: 0.86687
1477
+ - type: ndcg_at_100
1478
+ value: 0.88094
1479
+ - type: ndcg_at_1000
1480
+ value: 0.88243
1481
+ - type: ndcg_at_3
1482
+ value: 0.83538
1483
+ - type: ndcg_at_5
1484
+ value: 0.85308
1485
+ - type: precision_at_1
1486
+ value: 0.7936
1487
+ - type: precision_at_10
1488
+ value: 0.13145
1489
+ - type: precision_at_100
1490
+ value: 0.01517
1491
+ - type: precision_at_1000
1492
+ value: 0.00156
1493
+ - type: precision_at_3
1494
+ value: 0.36353
1495
+ - type: precision_at_5
1496
+ value: 0.24044
1497
+ - type: recall_at_1
1498
+ value: 0.68941
1499
+ - type: recall_at_10
1500
+ value: 0.94407
1501
+ - type: recall_at_100
1502
+ value: 0.99226
1503
+ - type: recall_at_1000
1504
+ value: 0.99958
1505
+ - type: recall_at_3
1506
+ value: 0.85502
1507
+ - type: recall_at_5
1508
+ value: 0.90372
1509
+ - dataset:
1510
+ type: mteb/scidocs
1511
+ name: MTEB SCIDOCS
1512
+ config: default
1513
+ split: test
1514
+ task:
1515
+ type: Retrieval
1516
+ metrics:
1517
+ - type: map_at_1
1518
+ value: 0.04988
1519
+ - type: map_at_10
1520
+ value: 0.13553
1521
+ - type: map_at_100
1522
+ value: 0.16136
1523
+ - type: map_at_1000
1524
+ value: 0.16512
1525
+ - type: map_at_3
1526
+ value: 0.09439
1527
+ - type: map_at_5
1528
+ value: 0.1146
1529
+ - type: mrr_at_1
1530
+ value: 0.246
1531
+ - type: mrr_at_10
1532
+ value: 0.36792
1533
+ - type: mrr_at_100
1534
+ value: 0.37973
1535
+ - type: mrr_at_1000
1536
+ value: 0.38011
1537
+ - type: mrr_at_3
1538
+ value: 0.33117
1539
+ - type: mrr_at_5
1540
+ value: 0.35172
1541
+ - type: ndcg_at_1
1542
+ value: 0.246
1543
+ - type: ndcg_at_10
1544
+ value: 0.22542
1545
+ - type: ndcg_at_100
1546
+ value: 0.32326
1547
+ - type: ndcg_at_1000
1548
+ value: 0.3828
1549
+ - type: ndcg_at_3
1550
+ value: 0.20896
1551
+ - type: ndcg_at_5
1552
+ value: 0.18497
1553
+ - type: precision_at_1
1554
+ value: 0.246
1555
+ - type: precision_at_10
1556
+ value: 0.1194
1557
+ - type: precision_at_100
1558
+ value: 0.02616
1559
+ - type: precision_at_1000
1560
+ value: 0.00404
1561
+ - type: precision_at_3
1562
+ value: 0.198
1563
+ - type: precision_at_5
1564
+ value: 0.1654
1565
+ - type: recall_at_1
1566
+ value: 0.04988
1567
+ - type: recall_at_10
1568
+ value: 0.24212
1569
+ - type: recall_at_100
1570
+ value: 0.53105
1571
+ - type: recall_at_1000
1572
+ value: 0.82022
1573
+ - type: recall_at_3
1574
+ value: 0.12047
1575
+ - type: recall_at_5
1576
+ value: 0.16777
1577
+ - dataset:
1578
+ type: mteb/scifact
1579
+ name: MTEB SciFact
1580
+ config: default
1581
+ split: test
1582
+ task:
1583
+ type: Retrieval
1584
+ metrics:
1585
+ - type: map_at_1
1586
+ value: 0.56578
1587
+ - type: map_at_10
1588
+ value: 0.66725
1589
+ - type: map_at_100
1590
+ value: 0.67379
1591
+ - type: map_at_1000
1592
+ value: 0.674
1593
+ - type: map_at_3
1594
+ value: 0.63416
1595
+ - type: map_at_5
1596
+ value: 0.6577
1597
+ - type: mrr_at_1
1598
+ value: 0.59333
1599
+ - type: mrr_at_10
1600
+ value: 0.67533
1601
+ - type: mrr_at_100
1602
+ value: 0.68062
1603
+ - type: mrr_at_1000
1604
+ value: 0.68082
1605
+ - type: mrr_at_3
1606
+ value: 0.64944
1607
+ - type: mrr_at_5
1608
+ value: 0.66928
1609
+ - type: ndcg_at_1
1610
+ value: 0.59333
1611
+ - type: ndcg_at_10
1612
+ value: 0.7127
1613
+ - type: ndcg_at_100
1614
+ value: 0.73889
1615
+ - type: ndcg_at_1000
1616
+ value: 0.7441
1617
+ - type: ndcg_at_3
1618
+ value: 0.65793
1619
+ - type: ndcg_at_5
1620
+ value: 0.69429
1621
+ - type: precision_at_1
1622
+ value: 0.59333
1623
+ - type: precision_at_10
1624
+ value: 0.096
1625
+ - type: precision_at_100
1626
+ value: 0.01087
1627
+ - type: precision_at_1000
1628
+ value: 0.00113
1629
+ - type: precision_at_3
1630
+ value: 0.25556
1631
+ - type: precision_at_5
1632
+ value: 0.17667
1633
+ - type: recall_at_1
1634
+ value: 0.56578
1635
+ - type: recall_at_10
1636
+ value: 0.842
1637
+ - type: recall_at_100
1638
+ value: 0.95667
1639
+ - type: recall_at_1000
1640
+ value: 0.99667
1641
+ - type: recall_at_3
1642
+ value: 0.70072
1643
+ - type: recall_at_5
1644
+ value: 0.79011
1645
+ - dataset:
1646
+ type: mteb/touche2020
1647
+ name: MTEB Touche2020
1648
+ config: default
1649
+ split: test
1650
+ task:
1651
+ type: Retrieval
1652
+ metrics:
1653
+ - type: map_at_1
1654
+ value: 0.01976
1655
+ - type: map_at_10
1656
+ value: 0.09688
1657
+ - type: map_at_100
1658
+ value: 0.15117
1659
+ - type: map_at_1000
1660
+ value: 0.16769
1661
+ - type: map_at_3
1662
+ value: 0.04589
1663
+ - type: map_at_5
1664
+ value: 0.06556
1665
+ - type: mrr_at_1
1666
+ value: 0.26531
1667
+ - type: mrr_at_10
1668
+ value: 0.43863
1669
+ - type: mrr_at_100
1670
+ value: 0.44767
1671
+ - type: mrr_at_1000
1672
+ value: 0.44767
1673
+ - type: mrr_at_3
1674
+ value: 0.39116
1675
+ - type: mrr_at_5
1676
+ value: 0.41156
1677
+ - type: ndcg_at_1
1678
+ value: 0.23469
1679
+ - type: ndcg_at_10
1680
+ value: 0.24029
1681
+ - type: ndcg_at_100
1682
+ value: 0.34425
1683
+ - type: ndcg_at_1000
1684
+ value: 0.46907
1685
+ - type: ndcg_at_3
1686
+ value: 0.25522
1687
+ - type: ndcg_at_5
1688
+ value: 0.24333
1689
+ - type: precision_at_1
1690
+ value: 0.26531
1691
+ - type: precision_at_10
1692
+ value: 0.22449
1693
+ - type: precision_at_100
1694
+ value: 0.07122
1695
+ - type: precision_at_1000
1696
+ value: 0.01527
1697
+ - type: precision_at_3
1698
+ value: 0.27891
1699
+ - type: precision_at_5
1700
+ value: 0.25714
1701
+ - type: recall_at_1
1702
+ value: 0.01976
1703
+ - type: recall_at_10
1704
+ value: 0.16633
1705
+ - type: recall_at_100
1706
+ value: 0.4561
1707
+ - type: recall_at_1000
1708
+ value: 0.82481
1709
+ - type: recall_at_3
1710
+ value: 0.06101
1711
+ - type: recall_at_5
1712
+ value: 0.0968
1713
+ - dataset:
1714
+ type: mteb/trec-covid
1715
+ name: MTEB TRECCOVID
1716
+ config: default
1717
+ split: test
1718
+ task:
1719
+ type: Retrieval
1720
+ metrics:
1721
+ - type: map_at_1
1722
+ value: 0.00211
1723
+ - type: map_at_10
1724
+ value: 0.01526
1725
+ - type: map_at_100
1726
+ value: 0.08863
1727
+ - type: map_at_1000
1728
+ value: 0.23162
1729
+ - type: map_at_3
1730
+ value: 0.00555
1731
+ - type: map_at_5
1732
+ value: 0.00873
1733
+ - type: mrr_at_1
1734
+ value: 0.76
1735
+ - type: mrr_at_10
1736
+ value: 0.8485
1737
+ - type: mrr_at_100
1738
+ value: 0.8485
1739
+ - type: mrr_at_1000
1740
+ value: 0.8485
1741
+ - type: mrr_at_3
1742
+ value: 0.84
1743
+ - type: mrr_at_5
1744
+ value: 0.844
1745
+ - type: ndcg_at_1
1746
+ value: 0.7
1747
+ - type: ndcg_at_10
1748
+ value: 0.63098
1749
+ - type: ndcg_at_100
1750
+ value: 0.49847
1751
+ - type: ndcg_at_1000
1752
+ value: 0.48395
1753
+ - type: ndcg_at_3
1754
+ value: 0.68704
1755
+ - type: ndcg_at_5
1756
+ value: 0.67533
1757
+ - type: precision_at_1
1758
+ value: 0.76
1759
+ - type: precision_at_10
1760
+ value: 0.66
1761
+ - type: precision_at_100
1762
+ value: 0.5134
1763
+ - type: precision_at_1000
1764
+ value: 0.2168
1765
+ - type: precision_at_3
1766
+ value: 0.72667
1767
+ - type: precision_at_5
1768
+ value: 0.716
1769
+ - type: recall_at_1
1770
+ value: 0.00211
1771
+ - type: recall_at_10
1772
+ value: 0.01748
1773
+ - type: recall_at_100
1774
+ value: 0.12448
1775
+ - type: recall_at_1000
1776
+ value: 0.46795
1777
+ - type: recall_at_3
1778
+ value: 0.00593
1779
+ - type: recall_at_5
1780
+ value: 0.00962
1781
+ pipeline_tag: sentence-similarity
1782
  ---
1783
+ # Granite-Embedding-30m-English
1784
+
1785
+ **Model Summary:**
1786
+ Granite-Embedding-30m-English is a 30M parameter model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 384 and is trained using a combination of open source relevance-pair datasets with permissive, enterprise-friendly license, and IBM collected and generated datasets. This model is developed using retrieval oriented pretraining, contrastive finetuning, knowledge distillation and model merging for improved performance.
1787
+
1788
+ - **Developers:** Granite Embedding Team, IBM
1789
+ - **GitHub Repository:**
1790
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
1791
+ - **Paper:**
1792
+ - **Release Date**: December 18th, 2024
1793
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
1794
+
1795
+ **Supported Languages:**
1796
+ English.
1797
+
1798
+ **Intended use:**
1799
+ The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
1800
+
1801
+ **Usage with Sentence Transformers:**
1802
+ The model is compatible with SentenceTransformer library and is very easy to use:
1803
+
1804
+ First, install the sentence transformers library
1805
+ ```shell
1806
+ pip install sentence_transformers
1807
+ ```
1808
+
1809
+ The model can then be used to encode pairs of text and find the similarity between their representations
1810
+
1811
+ ```python
1812
+ from sentence_transformers import SentenceTransformer, util
1813
+
1814
+ model_path = "ibm-granite/granite-embedding-30m-english"
1815
+ # Load the Sentence Transformer model
1816
+ model = SentenceTransformer(model_path)
1817
+
1818
+ input_queries = [
1819
+ ' Who made the song My achy breaky heart? ',
1820
+ 'summit define'
1821
+ ]
1822
+
1823
+ input_passages = [
1824
+ "Achy Breaky Heart is a country song written by Don Von Tress. Originally titled Don't Tell My Heart and performed by The Marcy Brothers in 1991. ",
1825
+ "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
1826
+ ]
1827
+
1828
+ # encode queries and passages
1829
+ query_embeddings = model.encode(input_queries)
1830
+ passage_embeddings = model.encode(input_passages)
1831
+
1832
+ # calculate cosine similarity
1833
+ print(util.cos_sim(query_embeddings, passage_embeddings))
1834
+ ```
1835
+
1836
+ **Usage with Huggingface Transformers:**
1837
+ This is a simple example of how to use the Granite-Embedding-30m-English model with the Transformers library and PyTorch.
1838
+
1839
+ First, install the required libraries
1840
+ ```shell
1841
+ pip install transformers torch
1842
+ ```
1843
+
1844
+ The model can then be used to encode pairs of text
1845
+
1846
+ ```python
1847
+ import torch
1848
+ from transformers import AutoModel, AutoTokenizer
1849
+
1850
+ model_path = "ibm-granite/granite-embedding-30m-english"
1851
+
1852
+ # Load the model and tokenizer
1853
+ model = AutoModel.from_pretrained(model_path)
1854
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
1855
+ model.eval()
1856
+
1857
+ input_queries = [
1858
+ ' Who made the song My achy breaky heart? ',
1859
+ 'summit define'
1860
+ ]
1861
+
1862
+ # tokenize inputs
1863
+ tokenized_queries = tokenizer(input_queries, padding=True, truncation=True, return_tensors='pt')
1864
+
1865
+ # encode queries
1866
+ with torch.no_grad():
1867
+ # Queries
1868
+ model_output = model(**tokenized_queries)
1869
+ # Perform pooling. granite-embedding-30m-english uses CLS Pooling
1870
+ query_embeddings = model_output[0][:, 0]
1871
+
1872
+ # normalize the embeddings
1873
+ query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)
1874
+
1875
+ ```
1876
+ **Evaluation:**
1877
+
1878
+ Granite-Embedding-30M-English is twice as fast as other models with similar embedding dimensions, while maintaining competitive performance. The performance of the Granite-Embedding-30M-English model on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below. MTEB Retrieval(14) indicates the average BEIR performance excluding MS-MARCO task as, unlike all other models, Granite-Embedding-30M-English did not train on MS-MARCO due to the dataset's non-commercial license. The average time required to encode and retrieve per query is also reported.
1879
+
1880
+ | Model | Paramters (M)| Embedding Dimension | MTEB Retrieval (15) | MTEB Retrieval (14) | CoIR (10) | Retrieval Time (seconds/query)|
1881
+ |---------------------------------|-------------:|--------------------:|--------------------:|---------------------:|----------:|------------------------------:|
1882
+ |granite-embedding-30m-english |30 |384 |49.1 |50.4 |47.0 | 0.16 |
1883
+
1884
+
1885
+ **Model Architecture:**
1886
+ Granite-Embedding-30m-English is based on an encoder-only RoBERTa like transformer architecture, trained internally at IBM Research.
1887
+
1888
+ | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual |
1889
+ | :--------- | :-------:| :--------: | :-----:| :-----:|
1890
+ | Embedding size | **384** | 768 | 384 | 768 |
1891
+ | Number of layers | **6** | 12 | 6 | 12 |
1892
+ | Number of attention heads | **12** | 12 | 12 | 12 |
1893
+ | Intermediate size | **1536** | 3072 | 1536 | 3072 |
1894
+ | Activation Function | **GeLU** | GeLU | GeLU | GeLU |
1895
+ | Vocabulary Size | **50265**| 50265 | 250002 | 250002 |
1896
+ | Max. Sequence Length | **512** | 512 | 512 | 512 |
1897
+ | # Parameters | **30M** | 125M | 107M | 278M |
1898
+
1899
+
1900
+ **Training Data:**
1901
+ Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below:
1902
+
1903
+ | **Dataset** | **Num. Pairs** |
1904
+ |----------------------------------------------------|:---------------:|
1905
+ | SPECTER citation triplets | 684,100 |
1906
+ | Stack Exchange Duplicate questions (titles) | 304,525 |
1907
+ | Stack Exchange Duplicate questions (bodies) | 250,519 |
1908
+ | Stack Exchange Duplicate questions (titles+bodies) | 250,460 |
1909
+ | Natural Questions (NQ) | 100,231 |
1910
+ | SQuAD2.0 | 87,599 |
1911
+ | PAQ (Question, Answer) pairs | 64,371,441 |
1912
+ | Stack Exchange (Title, Answer) pairs | 4,067,139 |
1913
+ | Stack Exchange (Title, Body) pairs | 23,978,013 |
1914
+ | Stack Exchange (Title+Body, Answer) pairs | 187,195 |
1915
+ | S2ORC Citation pairs (Titles) | 52,603,982 |
1916
+ | S2ORC (Title, Abstract) | 41,769,185 |
1917
+ | S2ORC (Citations, abstracts) | 52,603,982 |
1918
+ | WikiAnswers Duplicate question pairs | 77,427,422 |
1919
+ | SearchQA | 582,261 |
1920
+ | HotpotQA | 85,000 |
1921
+ | Fever | 109,810 |
1922
+ | Arxiv | 2,358,545 |
1923
+ | Wikipedia | 20,745,403 |
1924
+ | PubMed | 20,000,000 |
1925
+ | Miracl En Pairs | 9,016 |
1926
+ | DBPedia Title-Body Pairs | 4,635,922 |
1927
+ | Synthetic: Query-Wikipedia Passage | 1,879,093 |
1928
+ | Synthetic: Fact Verification | 9,888 |
1929
+ | IBM Internal Triples | 40,290 |
1930
+ | IBM Internal Title-Body Pairs | 1,524,586 |
1931
+
1932
+ Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality.
1933
+
1934
+ **Infrastructure:**
1935
+ We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs.
1936
+
1937
+ **Ethical Considerations and Limitations:**
1938
+ The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-30m-English is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size).
1939
+
1940
+
1941
+ <!-- ## Citation
1942
+ ```
1943
+ @misc{granite-embedding-models,
1944
+ author = {author 1, author2, ...},
1945
+ title = {},
1946
+ journal = {},
1947
+ volume = {},
1948
+ year = {2024},
1949
+ url = {https://arxiv.org/abs/0000.00000},
1950
+ }
1951
+ ``` -->