jncraton commited on
Commit
4588fe7
·
verified ·
1 Parent(s): 3890c7f

Upload folder using huggingface_hub

Browse files
Files changed (7) hide show
  1. README.md +1950 -0
  2. config.json +7 -0
  3. model.bin +3 -0
  4. special_tokens_map.json +51 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +61 -0
  7. vocabulary.json +0 -0
README.md ADDED
@@ -0,0 +1,1950 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - granite
9
+ - embeddings
10
+ model-index:
11
+ - name: ibm-granite/granite-embedding-125m-english
12
+ results:
13
+ - dataset:
14
+ type: mteb/arguana
15
+ name: MTEB ArguaAna
16
+ config: default
17
+ split: test
18
+ task:
19
+ type: Retrieval
20
+ metrics:
21
+ - type: map_at_1
22
+ value: 0.33642
23
+ - type: map_at_10
24
+ value: 0.49716
25
+ - type: map_at_100
26
+ value: 0.50519
27
+ - type: map_at_1000
28
+ value: 0.50521
29
+ - type: map_at_3
30
+ value: 0.45057
31
+ - type: map_at_5
32
+ value: 0.47774
33
+ - type: mrr_at_1
34
+ value: 0.34922
35
+ - type: mrr_at_10
36
+ value: 0.50197
37
+ - type: mrr_at_100
38
+ value: 0.50992
39
+ - type: mrr_at_1000
40
+ value: 0.50994
41
+ - type: mrr_at_3
42
+ value: 0.45484
43
+ - type: mrr_at_5
44
+ value: 0.48272
45
+ - type: ndcg_at_1
46
+ value: 0.33642
47
+ - type: ndcg_at_10
48
+ value: 0.58401
49
+ - type: ndcg_at_100
50
+ value: 0.6157
51
+ - type: ndcg_at_1000
52
+ value: 0.61608
53
+ - type: ndcg_at_3
54
+ value: 0.48825
55
+ - type: ndcg_at_5
56
+ value: 0.53689
57
+ - type: precision_at_1
58
+ value: 0.33642
59
+ - type: precision_at_10
60
+ value: 0.08606
61
+ - type: precision_at_100
62
+ value: 0.00994
63
+ - type: precision_at_1000
64
+ value: 0.001
65
+ - type: precision_at_3
66
+ value: 0.19915
67
+ - type: precision_at_5
68
+ value: 0.14296
69
+ - type: recall_at_1
70
+ value: 0.33642
71
+ - type: recall_at_10
72
+ value: 0.8606
73
+ - type: recall_at_100
74
+ value: 0.9936
75
+ - type: recall_at_1000
76
+ value: 0.99644
77
+ - type: recall_at_3
78
+ value: 0.59744
79
+ - type: recall_at_5
80
+ value: 0.71479
81
+ - dataset:
82
+ type: mteb/climate-fever
83
+ name: MTEB ClimateFEVER
84
+ config: default
85
+ split: test
86
+ task:
87
+ type: Retrieval
88
+ metrics:
89
+ - type: map_at_1
90
+ value: 0.1457
91
+ - type: map_at_10
92
+ value: 0.24102
93
+ - type: map_at_100
94
+ value: 0.25826
95
+ - type: map_at_1000
96
+ value: 0.26021
97
+ - type: map_at_3
98
+ value: 0.20346
99
+ - type: map_at_5
100
+ value: 0.22228
101
+ - type: mrr_at_1
102
+ value: 0.32573
103
+ - type: mrr_at_10
104
+ value: 0.44411
105
+ - type: mrr_at_100
106
+ value: 0.45176
107
+ - type: mrr_at_1000
108
+ value: 0.45209
109
+ - type: mrr_at_3
110
+ value: 0.4126
111
+ - type: mrr_at_5
112
+ value: 0.43312
113
+ - type: ndcg_at_1
114
+ value: 0.32573
115
+ - type: ndcg_at_10
116
+ value: 0.3315
117
+ - type: ndcg_at_100
118
+ value: 0.39898
119
+ - type: ndcg_at_1000
120
+ value: 0.43151
121
+ - type: ndcg_at_3
122
+ value: 0.27683
123
+ - type: ndcg_at_5
124
+ value: 0.29538
125
+ - type: precision_at_1
126
+ value: 0.32573
127
+ - type: precision_at_10
128
+ value: 0.10176
129
+ - type: precision_at_100
130
+ value: 0.01754
131
+ - type: precision_at_1000
132
+ value: 0.00236
133
+ - type: precision_at_3
134
+ value: 0.20347
135
+ - type: precision_at_5
136
+ value: 0.15505
137
+ - type: recall_at_1
138
+ value: 0.1457
139
+ - type: recall_at_10
140
+ value: 0.38825
141
+ - type: recall_at_100
142
+ value: 0.62237
143
+ - type: recall_at_1000
144
+ value: 0.8022
145
+ - type: recall_at_3
146
+ value: 0.25245
147
+ - type: recall_at_5
148
+ value: 0.30821
149
+ - dataset:
150
+ type: mteb/cqadupstack-android
151
+ name: MTEB CQADupstackAndroidRetrieval
152
+ config: default
153
+ split: test
154
+ task:
155
+ type: Retrieval
156
+ metrics:
157
+ - type: map_at_1
158
+ value: 0.36964
159
+ - type: map_at_10
160
+ value: 0.5043
161
+ - type: map_at_100
162
+ value: 0.52066
163
+ - type: map_at_1000
164
+ value: 0.52175
165
+ - type: map_at_3
166
+ value: 0.46001
167
+ - type: map_at_5
168
+ value: 0.48312
169
+ - type: mrr_at_1
170
+ value: 0.45923
171
+ - type: mrr_at_10
172
+ value: 0.56733
173
+ - type: mrr_at_100
174
+ value: 0.57292
175
+ - type: mrr_at_1000
176
+ value: 0.57321
177
+ - type: mrr_at_3
178
+ value: 0.54053
179
+ - type: mrr_at_5
180
+ value: 0.55556
181
+ - type: ndcg_at_1
182
+ value: 0.45923
183
+ - type: ndcg_at_10
184
+ value: 0.57667
185
+ - type: ndcg_at_100
186
+ value: 0.62373
187
+ - type: ndcg_at_1000
188
+ value: 0.6368
189
+ - type: ndcg_at_3
190
+ value: 0.51843
191
+ - type: ndcg_at_5
192
+ value: 0.54257
193
+ - type: precision_at_1
194
+ value: 0.45923
195
+ - type: precision_at_10
196
+ value: 0.11316
197
+ - type: precision_at_100
198
+ value: 0.01705
199
+ - type: precision_at_1000
200
+ value: 0.00216
201
+ - type: precision_at_3
202
+ value: 0.2537
203
+ - type: precision_at_5
204
+ value: 0.1814
205
+ - type: recall_at_1
206
+ value: 0.36964
207
+ - type: recall_at_10
208
+ value: 0.71234
209
+ - type: recall_at_100
210
+ value: 0.90421
211
+ - type: recall_at_1000
212
+ value: 0.98296
213
+ - type: recall_at_3
214
+ value: 0.53655
215
+ - type: recall_at_5
216
+ value: 0.60996
217
+ - dataset:
218
+ type: mteb/cqadupstack-english
219
+ name: MTEB CQADupstackEnglishRetrieval
220
+ config: default
221
+ split: test
222
+ task:
223
+ type: Retrieval
224
+ metrics:
225
+ - type: map_at_1
226
+ value: 0.36198
227
+ - type: map_at_10
228
+ value: 0.49199
229
+ - type: map_at_100
230
+ value: 0.50602
231
+ - type: map_at_1000
232
+ value: 0.50736
233
+ - type: map_at_3
234
+ value: 0.45678
235
+ - type: map_at_5
236
+ value: 0.47605
237
+ - type: mrr_at_1
238
+ value: 0.45478
239
+ - type: mrr_at_10
240
+ value: 0.55075
241
+ - type: mrr_at_100
242
+ value: 0.55656
243
+ - type: mrr_at_1000
244
+ value: 0.55688
245
+ - type: mrr_at_3
246
+ value: 0.52887
247
+ - type: mrr_at_5
248
+ value: 0.54282
249
+ - type: ndcg_at_1
250
+ value: 0.45478
251
+ - type: ndcg_at_10
252
+ value: 0.55505
253
+ - type: ndcg_at_100
254
+ value: 0.59606
255
+ - type: ndcg_at_1000
256
+ value: 0.61255
257
+ - type: ndcg_at_3
258
+ value: 0.51124
259
+ - type: ndcg_at_5
260
+ value: 0.53166
261
+ - type: precision_at_1
262
+ value: 0.45478
263
+ - type: precision_at_10
264
+ value: 0.10752
265
+ - type: precision_at_100
266
+ value: 0.01666
267
+ - type: precision_at_1000
268
+ value: 0.00211
269
+ - type: precision_at_3
270
+ value: 0.25053
271
+ - type: precision_at_5
272
+ value: 0.17694
273
+ - type: recall_at_1
274
+ value: 0.36198
275
+ - type: recall_at_10
276
+ value: 0.66465
277
+ - type: recall_at_100
278
+ value: 0.83632
279
+ - type: recall_at_1000
280
+ value: 0.93276
281
+ - type: recall_at_3
282
+ value: 0.53207
283
+ - type: recall_at_5
284
+ value: 0.59169
285
+ - dataset:
286
+ type: mteb/cqadupstack-gaming
287
+ name: MTEB CQADupstackGamingRetrieval
288
+ config: default
289
+ split: test
290
+ task:
291
+ type: Retrieval
292
+ metrics:
293
+ - type: map_at_1
294
+ value: 0.44157
295
+ - type: map_at_10
296
+ value: 0.57753
297
+ - type: map_at_100
298
+ value: 0.58698
299
+ - type: map_at_1000
300
+ value: 0.5874
301
+ - type: map_at_3
302
+ value: 0.54223
303
+ - type: map_at_5
304
+ value: 0.56307
305
+ - type: mrr_at_1
306
+ value: 0.50094
307
+ - type: mrr_at_10
308
+ value: 0.607
309
+ - type: mrr_at_100
310
+ value: 0.6126
311
+ - type: mrr_at_1000
312
+ value: 0.6128
313
+ - type: mrr_at_3
314
+ value: 0.58265
315
+ - type: mrr_at_5
316
+ value: 0.59817
317
+ - type: ndcg_at_1
318
+ value: 0.50094
319
+ - type: ndcg_at_10
320
+ value: 0.63641
321
+ - type: ndcg_at_100
322
+ value: 0.67055
323
+ - type: ndcg_at_1000
324
+ value: 0.67855
325
+ - type: ndcg_at_3
326
+ value: 0.58022
327
+ - type: ndcg_at_5
328
+ value: 0.6097
329
+ - type: precision_at_1
330
+ value: 0.50094
331
+ - type: precision_at_10
332
+ value: 0.10182
333
+ - type: precision_at_100
334
+ value: 0.01278
335
+ - type: precision_at_1000
336
+ value: 0.00138
337
+ - type: precision_at_3
338
+ value: 0.2581
339
+ - type: precision_at_5
340
+ value: 0.17755
341
+ - type: recall_at_1
342
+ value: 0.44157
343
+ - type: recall_at_10
344
+ value: 0.7778
345
+ - type: recall_at_100
346
+ value: 0.92244
347
+ - type: recall_at_1000
348
+ value: 0.9781
349
+ - type: recall_at_3
350
+ value: 0.63087
351
+ - type: recall_at_5
352
+ value: 0.70172
353
+ - dataset:
354
+ type: mteb/cqadupstack-gis
355
+ name: MTEB CQADupstackGisRetrieval
356
+ config: default
357
+ split: test
358
+ task:
359
+ type: Retrieval
360
+ metrics:
361
+ - type: map_at_1
362
+ value: 0.29532
363
+ - type: map_at_10
364
+ value: 0.40214
365
+ - type: map_at_100
366
+ value: 0.41289
367
+ - type: map_at_1000
368
+ value: 0.41359
369
+ - type: map_at_3
370
+ value: 0.37086
371
+ - type: map_at_5
372
+ value: 0.38889
373
+ - type: mrr_at_1
374
+ value: 0.3209
375
+ - type: mrr_at_10
376
+ value: 0.42423
377
+ - type: mrr_at_100
378
+ value: 0.43342
379
+ - type: mrr_at_1000
380
+ value: 0.43395
381
+ - type: mrr_at_3
382
+ value: 0.39736
383
+ - type: mrr_at_5
384
+ value: 0.41307
385
+ - type: ndcg_at_1
386
+ value: 0.3209
387
+ - type: ndcg_at_10
388
+ value: 0.46075
389
+ - type: ndcg_at_100
390
+ value: 0.5103
391
+ - type: ndcg_at_1000
392
+ value: 0.52668
393
+ - type: ndcg_at_3
394
+ value: 0.40149
395
+ - type: ndcg_at_5
396
+ value: 0.43111
397
+ - type: precision_at_1
398
+ value: 0.3209
399
+ - type: precision_at_10
400
+ value: 0.07141
401
+ - type: precision_at_100
402
+ value: 0.01018
403
+ - type: precision_at_1000
404
+ value: 0.00118
405
+ - type: precision_at_3
406
+ value: 0.17175
407
+ - type: precision_at_5
408
+ value: 0.12068
409
+ - type: recall_at_1
410
+ value: 0.29532
411
+ - type: recall_at_10
412
+ value: 0.62025
413
+ - type: recall_at_100
414
+ value: 0.83829
415
+ - type: recall_at_1000
416
+ value: 0.95995
417
+ - type: recall_at_3
418
+ value: 0.4603
419
+ - type: recall_at_5
420
+ value: 0.53089
421
+ - dataset:
422
+ type: mteb/cqadupstack-mathematica
423
+ name: MTEB CQADupstackMathematicaRetrieval
424
+ config: default
425
+ split: test
426
+ task:
427
+ type: Retrieval
428
+ metrics:
429
+ - type: map_at_1
430
+ value: 0.18944
431
+ - type: map_at_10
432
+ value: 0.29611
433
+ - type: map_at_100
434
+ value: 0.31063
435
+ - type: map_at_1000
436
+ value: 0.31174
437
+ - type: map_at_3
438
+ value: 0.26098
439
+ - type: map_at_5
440
+ value: 0.28151
441
+ - type: mrr_at_1
442
+ value: 0.23756
443
+ - type: mrr_at_10
444
+ value: 0.34491
445
+ - type: mrr_at_100
446
+ value: 0.35457
447
+ - type: mrr_at_1000
448
+ value: 0.35512
449
+ - type: mrr_at_3
450
+ value: 0.3126
451
+ - type: mrr_at_5
452
+ value: 0.3317
453
+ - type: ndcg_at_1
454
+ value: 0.23756
455
+ - type: ndcg_at_10
456
+ value: 0.36015
457
+ - type: ndcg_at_100
458
+ value: 0.42175
459
+ - type: ndcg_at_1000
460
+ value: 0.44607
461
+ - type: ndcg_at_3
462
+ value: 0.29725
463
+ - type: ndcg_at_5
464
+ value: 0.32879
465
+ - type: precision_at_1
466
+ value: 0.23756
467
+ - type: precision_at_10
468
+ value: 0.06928
469
+ - type: precision_at_100
470
+ value: 0.01153
471
+ - type: precision_at_1000
472
+ value: 0.00149
473
+ - type: precision_at_3
474
+ value: 0.14635
475
+ - type: precision_at_5
476
+ value: 0.1107
477
+ - type: recall_at_1
478
+ value: 0.18944
479
+ - type: recall_at_10
480
+ value: 0.50691
481
+ - type: recall_at_100
482
+ value: 0.76503
483
+ - type: recall_at_1000
484
+ value: 0.93624
485
+ - type: recall_at_3
486
+ value: 0.33611
487
+ - type: recall_at_5
488
+ value: 0.41427
489
+ - dataset:
490
+ type: mteb/cqadupstack-physics
491
+ name: MTEB CQADupstackPhysicsRetrieval
492
+ config: default
493
+ split: test
494
+ task:
495
+ type: Retrieval
496
+ metrics:
497
+ - type: map_at_1
498
+ value: 0.33824
499
+ - type: map_at_10
500
+ value: 0.46868
501
+ - type: map_at_100
502
+ value: 0.48306
503
+ - type: map_at_1000
504
+ value: 0.48406
505
+ - type: map_at_3
506
+ value: 0.43335
507
+ - type: map_at_5
508
+ value: 0.45279
509
+ - type: mrr_at_1
510
+ value: 0.42348
511
+ - type: mrr_at_10
512
+ value: 0.52972
513
+ - type: mrr_at_100
514
+ value: 0.53707
515
+ - type: mrr_at_1000
516
+ value: 0.53734
517
+ - type: mrr_at_3
518
+ value: 0.50722
519
+ - type: mrr_at_5
520
+ value: 0.52012
521
+ - type: ndcg_at_1
522
+ value: 0.42348
523
+ - type: ndcg_at_10
524
+ value: 0.53504
525
+ - type: ndcg_at_100
526
+ value: 0.58899
527
+ - type: ndcg_at_1000
528
+ value: 0.60323
529
+ - type: ndcg_at_3
530
+ value: 0.48478
531
+ - type: ndcg_at_5
532
+ value: 0.5079
533
+ - type: precision_at_1
534
+ value: 0.42348
535
+ - type: precision_at_10
536
+ value: 0.0975
537
+ - type: precision_at_100
538
+ value: 0.01466
539
+ - type: precision_at_1000
540
+ value: 0.00177
541
+ - type: precision_at_3
542
+ value: 0.23741
543
+ - type: precision_at_5
544
+ value: 0.16439
545
+ - type: recall_at_1
546
+ value: 0.33824
547
+ - type: recall_at_10
548
+ value: 0.67142
549
+ - type: recall_at_100
550
+ value: 0.89134
551
+ - type: recall_at_1000
552
+ value: 0.97816
553
+ - type: recall_at_3
554
+ value: 0.52305
555
+ - type: recall_at_5
556
+ value: 0.58804
557
+ - dataset:
558
+ type: mteb/cqadupstack-programmers
559
+ name: MTEB CQADupstackProgrammersRetrieval
560
+ config: default
561
+ split: test
562
+ task:
563
+ type: Retrieval
564
+ metrics:
565
+ - type: map_at_1
566
+ value: 0.30125
567
+ - type: map_at_10
568
+ value: 0.42119
569
+ - type: map_at_100
570
+ value: 0.43599
571
+ - type: map_at_1000
572
+ value: 0.4369
573
+ - type: map_at_3
574
+ value: 0.38018
575
+ - type: map_at_5
576
+ value: 0.40368
577
+ - type: mrr_at_1
578
+ value: 0.37557
579
+ - type: mrr_at_10
580
+ value: 0.47573
581
+ - type: mrr_at_100
582
+ value: 0.4846
583
+ - type: mrr_at_1000
584
+ value: 0.48499
585
+ - type: mrr_at_3
586
+ value: 0.44654
587
+ - type: mrr_at_5
588
+ value: 0.4644
589
+ - type: ndcg_at_1
590
+ value: 0.37557
591
+ - type: ndcg_at_10
592
+ value: 0.48743
593
+ - type: ndcg_at_100
594
+ value: 0.54458
595
+ - type: ndcg_at_1000
596
+ value: 0.56076
597
+ - type: ndcg_at_3
598
+ value: 0.42573
599
+ - type: ndcg_at_5
600
+ value: 0.45528
601
+ - type: precision_at_1
602
+ value: 0.37557
603
+ - type: precision_at_10
604
+ value: 0.09269
605
+ - type: precision_at_100
606
+ value: 0.01401
607
+ - type: precision_at_1000
608
+ value: 0.0017
609
+ - type: precision_at_3
610
+ value: 0.20624
611
+ - type: precision_at_5
612
+ value: 0.15068
613
+ - type: recall_at_1
614
+ value: 0.30125
615
+ - type: recall_at_10
616
+ value: 0.62619
617
+ - type: recall_at_100
618
+ value: 0.86574
619
+ - type: recall_at_1000
620
+ value: 0.97102
621
+ - type: recall_at_3
622
+ value: 0.45437
623
+ - type: recall_at_5
624
+ value: 0.53197
625
+ - dataset:
626
+ type: mteb/cqadupstack-stats
627
+ name: MTEB CQADupstackStatsRetrieval
628
+ config: default
629
+ split: test
630
+ task:
631
+ type: Retrieval
632
+ metrics:
633
+ - type: map_at_1
634
+ value: 0.29193
635
+ - type: map_at_10
636
+ value: 0.37529
637
+ - type: map_at_100
638
+ value: 0.38614
639
+ - type: map_at_1000
640
+ value: 0.38714
641
+ - type: map_at_3
642
+ value: 0.34897
643
+ - type: map_at_5
644
+ value: 0.36273
645
+ - type: mrr_at_1
646
+ value: 0.32669
647
+ - type: mrr_at_10
648
+ value: 0.40288
649
+ - type: mrr_at_100
650
+ value: 0.41177
651
+ - type: mrr_at_1000
652
+ value: 0.41241
653
+ - type: mrr_at_3
654
+ value: 0.38037
655
+ - type: mrr_at_5
656
+ value: 0.39195
657
+ - type: ndcg_at_1
658
+ value: 0.32669
659
+ - type: ndcg_at_10
660
+ value: 0.42353
661
+ - type: ndcg_at_100
662
+ value: 0.47424
663
+ - type: ndcg_at_1000
664
+ value: 0.4959
665
+ - type: ndcg_at_3
666
+ value: 0.37604
667
+ - type: ndcg_at_5
668
+ value: 0.39682
669
+ - type: precision_at_1
670
+ value: 0.32669
671
+ - type: precision_at_10
672
+ value: 0.06871
673
+ - type: precision_at_100
674
+ value: 0.01008
675
+ - type: precision_at_1000
676
+ value: 0.00126
677
+ - type: precision_at_3
678
+ value: 0.16309
679
+ - type: precision_at_5
680
+ value: 0.11288
681
+ - type: recall_at_1
682
+ value: 0.29193
683
+ - type: recall_at_10
684
+ value: 0.54159
685
+ - type: recall_at_100
686
+ value: 0.77267
687
+ - type: recall_at_1000
688
+ value: 0.92805
689
+ - type: recall_at_3
690
+ value: 0.41014
691
+ - type: recall_at_5
692
+ value: 0.46248
693
+ - dataset:
694
+ type: mteb/cqadupstack-tex
695
+ name: MTEB CQADupstackTexRetrieval
696
+ config: default
697
+ split: test
698
+ task:
699
+ type: Retrieval
700
+ metrics:
701
+ - type: map_at_1
702
+ value: 0.21217
703
+ - type: map_at_10
704
+ value: 0.30848
705
+ - type: map_at_100
706
+ value: 0.32173
707
+ - type: map_at_1000
708
+ value: 0.32296
709
+ - type: map_at_3
710
+ value: 0.27882
711
+ - type: map_at_5
712
+ value: 0.29537
713
+ - type: mrr_at_1
714
+ value: 0.25946
715
+ - type: mrr_at_10
716
+ value: 0.35091
717
+ - type: mrr_at_100
718
+ value: 0.36047
719
+ - type: mrr_at_1000
720
+ value: 0.36111
721
+ - type: mrr_at_3
722
+ value: 0.32485
723
+ - type: mrr_at_5
724
+ value: 0.33964
725
+ - type: ndcg_at_1
726
+ value: 0.25946
727
+ - type: ndcg_at_10
728
+ value: 0.3655
729
+ - type: ndcg_at_100
730
+ value: 0.42328
731
+ - type: ndcg_at_1000
732
+ value: 0.44783
733
+ - type: ndcg_at_3
734
+ value: 0.31463
735
+ - type: ndcg_at_5
736
+ value: 0.33803
737
+ - type: precision_at_1
738
+ value: 0.25946
739
+ - type: precision_at_10
740
+ value: 0.06793
741
+ - type: precision_at_100
742
+ value: 0.01138
743
+ - type: precision_at_1000
744
+ value: 0.00155
745
+ - type: precision_at_3
746
+ value: 0.1513
747
+ - type: precision_at_5
748
+ value: 0.10991
749
+ - type: recall_at_1
750
+ value: 0.21217
751
+ - type: recall_at_10
752
+ value: 0.49327
753
+ - type: recall_at_100
754
+ value: 0.7472
755
+ - type: recall_at_1000
756
+ value: 0.91637
757
+ - type: recall_at_3
758
+ value: 0.34993
759
+ - type: recall_at_5
760
+ value: 0.41029
761
+ - dataset:
762
+ type: mteb/cqadupstack-unix
763
+ name: MTEB CQADupstackUnixRetrieval
764
+ config: default
765
+ split: test
766
+ task:
767
+ type: Retrieval
768
+ metrics:
769
+ - type: map_at_1
770
+ value: 0.34303
771
+ - type: map_at_10
772
+ value: 0.45312
773
+ - type: map_at_100
774
+ value: 0.46563
775
+ - type: map_at_1000
776
+ value: 0.4664
777
+ - type: map_at_3
778
+ value: 0.4143
779
+ - type: map_at_5
780
+ value: 0.43633
781
+ - type: mrr_at_1
782
+ value: 0.40112
783
+ - type: mrr_at_10
784
+ value: 0.49097
785
+ - type: mrr_at_100
786
+ value: 0.49966
787
+ - type: mrr_at_1000
788
+ value: 0.50006
789
+ - type: mrr_at_3
790
+ value: 0.46129
791
+ - type: mrr_at_5
792
+ value: 0.47901
793
+ - type: ndcg_at_1
794
+ value: 0.40112
795
+ - type: ndcg_at_10
796
+ value: 0.513
797
+ - type: ndcg_at_100
798
+ value: 0.56534
799
+ - type: ndcg_at_1000
800
+ value: 0.58048
801
+ - type: ndcg_at_3
802
+ value: 0.4491
803
+ - type: ndcg_at_5
804
+ value: 0.48048
805
+ - type: precision_at_1
806
+ value: 0.40112
807
+ - type: precision_at_10
808
+ value: 0.08806
809
+ - type: precision_at_100
810
+ value: 0.01266
811
+ - type: precision_at_1000
812
+ value: 0.00149
813
+ - type: precision_at_3
814
+ value: 0.20211
815
+ - type: precision_at_5
816
+ value: 0.14496
817
+ - type: recall_at_1
818
+ value: 0.34303
819
+ - type: recall_at_10
820
+ value: 0.65508
821
+ - type: recall_at_100
822
+ value: 0.8753
823
+ - type: recall_at_1000
824
+ value: 0.9742
825
+ - type: recall_at_3
826
+ value: 0.48465
827
+ - type: recall_at_5
828
+ value: 0.56374
829
+ - dataset:
830
+ type: mteb/cqadupstack-webmasters
831
+ name: MTEB CQADupstackWebmastersRetrieval
832
+ config: default
833
+ split: test
834
+ task:
835
+ type: Retrieval
836
+ metrics:
837
+ - type: map_at_1
838
+ value: 0.30312
839
+ - type: map_at_10
840
+ value: 0.40931
841
+ - type: map_at_100
842
+ value: 0.42893
843
+ - type: map_at_1000
844
+ value: 0.4312
845
+ - type: map_at_3
846
+ value: 0.37527
847
+ - type: map_at_5
848
+ value: 0.3936
849
+ - type: mrr_at_1
850
+ value: 0.36364
851
+ - type: mrr_at_10
852
+ value: 0.45677
853
+ - type: mrr_at_100
854
+ value: 0.46753
855
+ - type: mrr_at_1000
856
+ value: 0.46787
857
+ - type: mrr_at_3
858
+ value: 0.42918
859
+ - type: mrr_at_5
860
+ value: 0.4443
861
+ - type: ndcg_at_1
862
+ value: 0.36364
863
+ - type: ndcg_at_10
864
+ value: 0.47301
865
+ - type: ndcg_at_100
866
+ value: 0.53698
867
+ - type: ndcg_at_1000
868
+ value: 0.55503
869
+ - type: ndcg_at_3
870
+ value: 0.41875
871
+ - type: ndcg_at_5
872
+ value: 0.44316
873
+ - type: precision_at_1
874
+ value: 0.36364
875
+ - type: precision_at_10
876
+ value: 0.09032
877
+ - type: precision_at_100
878
+ value: 0.01806
879
+ - type: precision_at_1000
880
+ value: 0.00258
881
+ - type: precision_at_3
882
+ value: 0.19499
883
+ - type: precision_at_5
884
+ value: 0.1415
885
+ - type: recall_at_1
886
+ value: 0.30312
887
+ - type: recall_at_10
888
+ value: 0.59418
889
+ - type: recall_at_100
890
+ value: 0.8656
891
+ - type: recall_at_1000
892
+ value: 0.97412
893
+ - type: recall_at_3
894
+ value: 0.44251
895
+ - type: recall_at_5
896
+ value: 0.50457
897
+ - dataset:
898
+ type: mteb/cqadupstack-wordpress
899
+ name: MTEB CQADupstackWordpressRetrieval
900
+ config: default
901
+ split: test
902
+ task:
903
+ type: Retrieval
904
+ metrics:
905
+ - type: map_at_1
906
+ value: 0.23851
907
+ - type: map_at_10
908
+ value: 0.33429
909
+ - type: map_at_100
910
+ value: 0.34482
911
+ - type: map_at_1000
912
+ value: 0.3457
913
+ - type: map_at_3
914
+ value: 0.30271
915
+ - type: map_at_5
916
+ value: 0.31905
917
+ - type: mrr_at_1
918
+ value: 0.25693
919
+ - type: mrr_at_10
920
+ value: 0.35383
921
+ - type: mrr_at_100
922
+ value: 0.36295
923
+ - type: mrr_at_1000
924
+ value: 0.36346
925
+ - type: mrr_at_3
926
+ value: 0.32532
927
+ - type: mrr_at_5
928
+ value: 0.3402
929
+ - type: ndcg_at_1
930
+ value: 0.25693
931
+ - type: ndcg_at_10
932
+ value: 0.39196
933
+ - type: ndcg_at_100
934
+ value: 0.44501
935
+ - type: ndcg_at_1000
936
+ value: 0.46482
937
+ - type: ndcg_at_3
938
+ value: 0.33
939
+ - type: ndcg_at_5
940
+ value: 0.35736
941
+ - type: precision_at_1
942
+ value: 0.25693
943
+ - type: precision_at_10
944
+ value: 0.06433
945
+ - type: precision_at_100
946
+ value: 0.00989
947
+ - type: precision_at_1000
948
+ value: 0.00128
949
+ - type: precision_at_3
950
+ value: 0.14295
951
+ - type: precision_at_5
952
+ value: 0.10277
953
+ - type: recall_at_1
954
+ value: 0.23851
955
+ - type: recall_at_10
956
+ value: 0.55036
957
+ - type: recall_at_100
958
+ value: 0.79592
959
+ - type: recall_at_1000
960
+ value: 0.94283
961
+ - type: recall_at_3
962
+ value: 0.38435
963
+ - type: recall_at_5
964
+ value: 0.44872
965
+ - dataset:
966
+ type: mteb/dbpedia
967
+ name: MTEB DBPedia
968
+ config: default
969
+ split: test
970
+ task:
971
+ type: Retrieval
972
+ metrics:
973
+ - type: map_at_1
974
+ value: 0.0871
975
+ - type: map_at_10
976
+ value: 0.19218
977
+ - type: map_at_100
978
+ value: 0.26291
979
+ - type: map_at_1000
980
+ value: 0.27985
981
+ - type: map_at_3
982
+ value: 0.13974
983
+ - type: map_at_5
984
+ value: 0.16104
985
+ - type: mrr_at_1
986
+ value: 0.6725
987
+ - type: mrr_at_10
988
+ value: 0.75037
989
+ - type: mrr_at_100
990
+ value: 0.75318
991
+ - type: mrr_at_1000
992
+ value: 0.75325
993
+ - type: mrr_at_3
994
+ value: 0.73833
995
+ - type: mrr_at_5
996
+ value: 0.74308
997
+ - type: ndcg_at_1
998
+ value: 0.54375
999
+ - type: ndcg_at_10
1000
+ value: 0.39409
1001
+ - type: ndcg_at_100
1002
+ value: 0.44382
1003
+ - type: ndcg_at_1000
1004
+ value: 0.52485
1005
+ - type: ndcg_at_3
1006
+ value: 0.44463
1007
+ - type: ndcg_at_5
1008
+ value: 0.41276
1009
+ - type: precision_at_1
1010
+ value: 0.6725
1011
+ - type: precision_at_10
1012
+ value: 0.3055
1013
+ - type: precision_at_100
1014
+ value: 0.09588
1015
+ - type: precision_at_1000
1016
+ value: 0.02118
1017
+ - type: precision_at_3
1018
+ value: 0.48167
1019
+ - type: precision_at_5
1020
+ value: 0.394
1021
+ - type: recall_at_1
1022
+ value: 0.0871
1023
+ - type: recall_at_10
1024
+ value: 0.2527
1025
+ - type: recall_at_100
1026
+ value: 0.5185
1027
+ - type: recall_at_1000
1028
+ value: 0.76491
1029
+ - type: recall_at_3
1030
+ value: 0.15516
1031
+ - type: recall_at_5
1032
+ value: 0.18907
1033
+ - dataset:
1034
+ type: mteb/fever
1035
+ name: MTEB FEVER
1036
+ config: default
1037
+ split: test
1038
+ task:
1039
+ type: Retrieval
1040
+ metrics:
1041
+ - type: map_at_1
1042
+ value: 0.78993
1043
+ - type: map_at_10
1044
+ value: 0.8502
1045
+ - type: map_at_100
1046
+ value: 0.85186
1047
+ - type: map_at_1000
1048
+ value: 0.852
1049
+ - type: map_at_3
1050
+ value: 0.8437
1051
+ - type: map_at_5
1052
+ value: 0.84812
1053
+ - type: mrr_at_1
1054
+ value: 0.85179
1055
+ - type: mrr_at_10
1056
+ value: 0.90744
1057
+ - type: mrr_at_100
1058
+ value: 0.90799
1059
+ - type: mrr_at_1000
1060
+ value: 0.90801
1061
+ - type: mrr_at_3
1062
+ value: 0.90322
1063
+ - type: mrr_at_5
1064
+ value: 0.90622
1065
+ - type: ndcg_at_1
1066
+ value: 0.85179
1067
+ - type: ndcg_at_10
1068
+ value: 0.88229
1069
+ - type: ndcg_at_100
1070
+ value: 0.8884
1071
+ - type: ndcg_at_1000
1072
+ value: 0.89116
1073
+ - type: ndcg_at_3
1074
+ value: 0.87304
1075
+ - type: ndcg_at_5
1076
+ value: 0.87862
1077
+ - type: precision_at_1
1078
+ value: 0.85179
1079
+ - type: precision_at_10
1080
+ value: 0.10129
1081
+ - type: precision_at_100
1082
+ value: 0.0106
1083
+ - type: precision_at_1000
1084
+ value: 0.0011
1085
+ - type: precision_at_3
1086
+ value: 0.32543
1087
+ - type: precision_at_5
1088
+ value: 0.19931
1089
+ - type: recall_at_1
1090
+ value: 0.78993
1091
+ - type: recall_at_10
1092
+ value: 0.92685
1093
+ - type: recall_at_100
1094
+ value: 0.9516
1095
+ - type: recall_at_1000
1096
+ value: 0.96943
1097
+ - type: recall_at_3
1098
+ value: 0.89965
1099
+ - type: recall_at_5
1100
+ value: 0.91562
1101
+ - dataset:
1102
+ type: mteb/fiqa
1103
+ name: MTEB FiQA2018
1104
+ config: default
1105
+ split: test
1106
+ task:
1107
+ type: Retrieval
1108
+ metrics:
1109
+ - type: map_at_1
1110
+ value: 0.22586
1111
+ - type: map_at_10
1112
+ value: 0.36836
1113
+ - type: map_at_100
1114
+ value: 0.38863
1115
+ - type: map_at_1000
1116
+ value: 0.39041
1117
+ - type: map_at_3
1118
+ value: 0.32445
1119
+ - type: map_at_5
1120
+ value: 0.34951
1121
+ - type: mrr_at_1
1122
+ value: 0.44599
1123
+ - type: mrr_at_10
1124
+ value: 0.53471
1125
+ - type: mrr_at_100
1126
+ value: 0.54186
1127
+ - type: mrr_at_1000
1128
+ value: 0.54223
1129
+ - type: mrr_at_3
1130
+ value: 0.51157
1131
+ - type: mrr_at_5
1132
+ value: 0.52423
1133
+ - type: ndcg_at_1
1134
+ value: 0.44599
1135
+ - type: ndcg_at_10
1136
+ value: 0.44931
1137
+ - type: ndcg_at_100
1138
+ value: 0.51914
1139
+ - type: ndcg_at_1000
1140
+ value: 0.54674
1141
+ - type: ndcg_at_3
1142
+ value: 0.41597
1143
+ - type: ndcg_at_5
1144
+ value: 0.42611
1145
+ - type: precision_at_1
1146
+ value: 0.44599
1147
+ - type: precision_at_10
1148
+ value: 0.12346
1149
+ - type: precision_at_100
1150
+ value: 0.01951
1151
+ - type: precision_at_1000
1152
+ value: 0.00244
1153
+ - type: precision_at_3
1154
+ value: 0.27623
1155
+ - type: precision_at_5
1156
+ value: 0.20093
1157
+ - type: recall_at_1
1158
+ value: 0.22586
1159
+ - type: recall_at_10
1160
+ value: 0.5152
1161
+ - type: recall_at_100
1162
+ value: 0.77251
1163
+ - type: recall_at_1000
1164
+ value: 0.93503
1165
+ - type: recall_at_3
1166
+ value: 0.37802
1167
+ - type: recall_at_5
1168
+ value: 0.4386
1169
+ - dataset:
1170
+ type: mteb/hotpotqa
1171
+ name: MTEB HotpotQA
1172
+ config: default
1173
+ split: test
1174
+ task:
1175
+ type: Retrieval
1176
+ metrics:
1177
+ - type: map_at_1
1178
+ value: 0.38177
1179
+ - type: map_at_10
1180
+ value: 0.59021
1181
+ - type: map_at_100
1182
+ value: 0.59924
1183
+ - type: map_at_1000
1184
+ value: 0.59989
1185
+ - type: map_at_3
1186
+ value: 0.55553
1187
+ - type: map_at_5
1188
+ value: 0.57773
1189
+ - type: mrr_at_1
1190
+ value: 0.76354
1191
+ - type: mrr_at_10
1192
+ value: 0.827
1193
+ - type: mrr_at_100
1194
+ value: 0.82887
1195
+ - type: mrr_at_1000
1196
+ value: 0.82896
1197
+ - type: mrr_at_3
1198
+ value: 0.8172
1199
+ - type: mrr_at_5
1200
+ value: 0.82338
1201
+ - type: ndcg_at_1
1202
+ value: 0.76354
1203
+ - type: ndcg_at_10
1204
+ value: 0.67775
1205
+ - type: ndcg_at_100
1206
+ value: 0.70849
1207
+ - type: ndcg_at_1000
1208
+ value: 0.7215
1209
+ - type: ndcg_at_3
1210
+ value: 0.629
1211
+ - type: ndcg_at_5
1212
+ value: 0.65679
1213
+ - type: precision_at_1
1214
+ value: 0.76354
1215
+ - type: precision_at_10
1216
+ value: 0.14176
1217
+ - type: precision_at_100
1218
+ value: 0.01656
1219
+ - type: precision_at_1000
1220
+ value: 0.00183
1221
+ - type: precision_at_3
1222
+ value: 0.40113
1223
+ - type: precision_at_5
1224
+ value: 0.26255
1225
+ - type: recall_at_1
1226
+ value: 0.38177
1227
+ - type: recall_at_10
1228
+ value: 0.70878
1229
+ - type: recall_at_100
1230
+ value: 0.82822
1231
+ - type: recall_at_1000
1232
+ value: 0.91472
1233
+ - type: recall_at_3
1234
+ value: 0.60169
1235
+ - type: recall_at_5
1236
+ value: 0.65638
1237
+ - dataset:
1238
+ type: mteb/msmarco
1239
+ name: MTEB MSMARCO
1240
+ config: default
1241
+ split: dev
1242
+ task:
1243
+ type: Retrieval
1244
+ metrics:
1245
+ - type: map_at_1
1246
+ value: 0.15062
1247
+ - type: map_at_10
1248
+ value: 0.26008
1249
+ - type: map_at_100
1250
+ value: 0.27305
1251
+ - type: map_at_1000
1252
+ value: 0.27373
1253
+ - type: map_at_3
1254
+ value: 0.22236
1255
+ - type: map_at_5
1256
+ value: 0.24362
1257
+ - type: mrr_at_1
1258
+ value: 0.15444
1259
+ - type: mrr_at_10
1260
+ value: 0.26458
1261
+ - type: mrr_at_100
1262
+ value: 0.27718
1263
+ - type: mrr_at_1000
1264
+ value: 0.2778
1265
+ - type: mrr_at_3
1266
+ value: 0.22701
1267
+ - type: mrr_at_5
1268
+ value: 0.24844
1269
+ - type: ndcg_at_1
1270
+ value: 0.15444
1271
+ - type: ndcg_at_10
1272
+ value: 0.32495
1273
+ - type: ndcg_at_100
1274
+ value: 0.38957
1275
+ - type: ndcg_at_1000
1276
+ value: 0.40684
1277
+ - type: ndcg_at_3
1278
+ value: 0.24745
1279
+ - type: ndcg_at_5
1280
+ value: 0.2856
1281
+ - type: precision_at_1
1282
+ value: 0.15444
1283
+ - type: precision_at_10
1284
+ value: 0.05486
1285
+ - type: precision_at_100
1286
+ value: 0.00875
1287
+ - type: precision_at_1000
1288
+ value: 0.00102
1289
+ - type: precision_at_3
1290
+ value: 0.1086
1291
+ - type: precision_at_5
1292
+ value: 0.08441
1293
+ - type: recall_at_1
1294
+ value: 0.15062
1295
+ - type: recall_at_10
1296
+ value: 0.5272
1297
+ - type: recall_at_100
1298
+ value: 0.83006
1299
+ - type: recall_at_1000
1300
+ value: 0.96263
1301
+ - type: recall_at_3
1302
+ value: 0.31556
1303
+ - type: recall_at_5
1304
+ value: 0.40706
1305
+ - dataset:
1306
+ type: mteb/nfcorpus
1307
+ name: MTEB NFCorpus
1308
+ config: default
1309
+ split: test
1310
+ task:
1311
+ type: Retrieval
1312
+ metrics:
1313
+ - type: map_at_1
1314
+ value: 0.06126
1315
+ - type: map_at_10
1316
+ value: 0.14152
1317
+ - type: map_at_100
1318
+ value: 0.1827
1319
+ - type: map_at_1000
1320
+ value: 0.1988
1321
+ - type: map_at_3
1322
+ value: 0.10301
1323
+ - type: map_at_5
1324
+ value: 0.12085
1325
+ - type: mrr_at_1
1326
+ value: 0.47988
1327
+ - type: mrr_at_10
1328
+ value: 0.5692
1329
+ - type: mrr_at_100
1330
+ value: 0.57428
1331
+ - type: mrr_at_1000
1332
+ value: 0.57482
1333
+ - type: mrr_at_3
1334
+ value: 0.55315
1335
+ - type: mrr_at_5
1336
+ value: 0.56352
1337
+ - type: ndcg_at_1
1338
+ value: 0.45356
1339
+ - type: ndcg_at_10
1340
+ value: 0.3725
1341
+ - type: ndcg_at_100
1342
+ value: 0.34496
1343
+ - type: ndcg_at_1000
1344
+ value: 0.43374
1345
+ - type: ndcg_at_3
1346
+ value: 0.42643
1347
+ - type: ndcg_at_5
1348
+ value: 0.40882
1349
+ - type: precision_at_1
1350
+ value: 0.47368
1351
+ - type: precision_at_10
1352
+ value: 0.2774
1353
+ - type: precision_at_100
1354
+ value: 0.09071
1355
+ - type: precision_at_1000
1356
+ value: 0.02226
1357
+ - type: precision_at_3
1358
+ value: 0.40144
1359
+ - type: precision_at_5
1360
+ value: 0.35913
1361
+ - type: recall_at_1
1362
+ value: 0.06126
1363
+ - type: recall_at_10
1364
+ value: 0.18427
1365
+ - type: recall_at_100
1366
+ value: 0.35018
1367
+ - type: recall_at_1000
1368
+ value: 0.6766
1369
+ - type: recall_at_3
1370
+ value: 0.11706
1371
+ - type: recall_at_5
1372
+ value: 0.14419
1373
+ - dataset:
1374
+ type: mteb/nq
1375
+ name: MTEB NQ
1376
+ config: default
1377
+ split: test
1378
+ task:
1379
+ type: Retrieval
1380
+ metrics:
1381
+ - type: map_at_1
1382
+ value: 0.33053
1383
+ - type: map_at_10
1384
+ value: 0.49739
1385
+ - type: map_at_100
1386
+ value: 0.50626
1387
+ - type: map_at_1000
1388
+ value: 0.50647
1389
+ - type: map_at_3
1390
+ value: 0.4491
1391
+ - type: map_at_5
1392
+ value: 0.4783
1393
+ - type: mrr_at_1
1394
+ value: 0.37254
1395
+ - type: mrr_at_10
1396
+ value: 0.52222
1397
+ - type: mrr_at_100
1398
+ value: 0.52855
1399
+ - type: mrr_at_1000
1400
+ value: 0.52869
1401
+ - type: mrr_at_3
1402
+ value: 0.48445
1403
+ - type: mrr_at_5
1404
+ value: 0.50834
1405
+ - type: ndcg_at_1
1406
+ value: 0.37254
1407
+ - type: ndcg_at_10
1408
+ value: 0.58044
1409
+ - type: ndcg_at_100
1410
+ value: 0.61613
1411
+ - type: ndcg_at_1000
1412
+ value: 0.62046
1413
+ - type: ndcg_at_3
1414
+ value: 0.49219
1415
+ - type: ndcg_at_5
1416
+ value: 0.54037
1417
+ - type: precision_at_1
1418
+ value: 0.37254
1419
+ - type: precision_at_10
1420
+ value: 0.09655
1421
+ - type: precision_at_100
1422
+ value: 0.01167
1423
+ - type: precision_at_1000
1424
+ value: 0.00121
1425
+ - type: precision_at_3
1426
+ value: 0.22538
1427
+ - type: precision_at_5
1428
+ value: 0.16344
1429
+ - type: recall_at_1
1430
+ value: 0.33053
1431
+ - type: recall_at_10
1432
+ value: 0.8076
1433
+ - type: recall_at_100
1434
+ value: 0.95862
1435
+ - type: recall_at_1000
1436
+ value: 0.99044
1437
+ - type: recall_at_3
1438
+ value: 0.58157
1439
+ - type: recall_at_5
1440
+ value: 0.69235
1441
+ - dataset:
1442
+ type: mteb/quora
1443
+ name: MTEB QuoraRetrieval
1444
+ config: default
1445
+ split: test
1446
+ task:
1447
+ type: Retrieval
1448
+ metrics:
1449
+ - type: map_at_1
1450
+ value: 0.70056
1451
+ - type: map_at_10
1452
+ value: 0.84009
1453
+ - type: map_at_100
1454
+ value: 0.84661
1455
+ - type: map_at_1000
1456
+ value: 0.84678
1457
+ - type: map_at_3
1458
+ value: 0.81036
1459
+ - type: map_at_5
1460
+ value: 0.82923
1461
+ - type: mrr_at_1
1462
+ value: 0.8062
1463
+ - type: mrr_at_10
1464
+ value: 0.86971
1465
+ - type: mrr_at_100
1466
+ value: 0.87079
1467
+ - type: mrr_at_1000
1468
+ value: 0.8708
1469
+ - type: mrr_at_3
1470
+ value: 0.85943
1471
+ - type: mrr_at_5
1472
+ value: 0.86664
1473
+ - type: ndcg_at_1
1474
+ value: 0.8064
1475
+ - type: ndcg_at_10
1476
+ value: 0.87821
1477
+ - type: ndcg_at_100
1478
+ value: 0.89091
1479
+ - type: ndcg_at_1000
1480
+ value: 0.89202
1481
+ - type: ndcg_at_3
1482
+ value: 0.849
1483
+ - type: ndcg_at_5
1484
+ value: 0.86544
1485
+ - type: precision_at_1
1486
+ value: 0.8064
1487
+ - type: precision_at_10
1488
+ value: 0.13347
1489
+ - type: precision_at_100
1490
+ value: 0.01527
1491
+ - type: precision_at_1000
1492
+ value: 0.00157
1493
+ - type: precision_at_3
1494
+ value: 0.37153
1495
+ - type: precision_at_5
1496
+ value: 0.2448
1497
+ - type: recall_at_1
1498
+ value: 0.70056
1499
+ - type: recall_at_10
1500
+ value: 0.95148
1501
+ - type: recall_at_100
1502
+ value: 0.99474
1503
+ - type: recall_at_1000
1504
+ value: 0.99977
1505
+ - type: recall_at_3
1506
+ value: 0.86773
1507
+ - type: recall_at_5
1508
+ value: 0.91396
1509
+ - dataset:
1510
+ type: mteb/scidocs
1511
+ name: MTEB SCIDOCS
1512
+ config: default
1513
+ split: test
1514
+ task:
1515
+ type: Retrieval
1516
+ metrics:
1517
+ - type: map_at_1
1518
+ value: 0.05737
1519
+ - type: map_at_10
1520
+ value: 0.14896
1521
+ - type: map_at_100
1522
+ value: 0.17646
1523
+ - type: map_at_1000
1524
+ value: 0.1803
1525
+ - type: map_at_3
1526
+ value: 0.10474
1527
+ - type: map_at_5
1528
+ value: 0.12656
1529
+ - type: mrr_at_1
1530
+ value: 0.281
1531
+ - type: mrr_at_10
1532
+ value: 0.39579
1533
+ - type: mrr_at_100
1534
+ value: 0.40687
1535
+ - type: mrr_at_1000
1536
+ value: 0.40722
1537
+ - type: mrr_at_3
1538
+ value: 0.35917
1539
+ - type: mrr_at_5
1540
+ value: 0.38097
1541
+ - type: ndcg_at_1
1542
+ value: 0.281
1543
+ - type: ndcg_at_10
1544
+ value: 0.24146
1545
+ - type: ndcg_at_100
1546
+ value: 0.339
1547
+ - type: ndcg_at_1000
1548
+ value: 0.39728
1549
+ - type: ndcg_at_3
1550
+ value: 0.22721
1551
+ - type: ndcg_at_5
1552
+ value: 0.20015
1553
+ - type: precision_at_1
1554
+ value: 0.281
1555
+ - type: precision_at_10
1556
+ value: 0.1254
1557
+ - type: precision_at_100
1558
+ value: 0.02651
1559
+ - type: precision_at_1000
1560
+ value: 0.00404
1561
+ - type: precision_at_3
1562
+ value: 0.212
1563
+ - type: precision_at_5
1564
+ value: 0.176
1565
+ - type: recall_at_1
1566
+ value: 0.05737
1567
+ - type: recall_at_10
1568
+ value: 0.254
1569
+ - type: recall_at_100
1570
+ value: 0.53772
1571
+ - type: recall_at_1000
1572
+ value: 0.82013
1573
+ - type: recall_at_3
1574
+ value: 0.12897
1575
+ - type: recall_at_5
1576
+ value: 0.17855
1577
+ - dataset:
1578
+ type: mteb/scifact
1579
+ name: MTEB SciFact
1580
+ config: default
1581
+ split: test
1582
+ task:
1583
+ type: Retrieval
1584
+ metrics:
1585
+ - type: map_at_1
1586
+ value: 0.60011
1587
+ - type: map_at_10
1588
+ value: 0.70101
1589
+ - type: map_at_100
1590
+ value: 0.70687
1591
+ - type: map_at_1000
1592
+ value: 0.70699
1593
+ - type: map_at_3
1594
+ value: 0.67135
1595
+ - type: map_at_5
1596
+ value: 0.6878
1597
+ - type: mrr_at_1
1598
+ value: 0.62667
1599
+ - type: mrr_at_10
1600
+ value: 0.71022
1601
+ - type: mrr_at_100
1602
+ value: 0.71484
1603
+ - type: mrr_at_1000
1604
+ value: 0.71496
1605
+ - type: mrr_at_3
1606
+ value: 0.68944
1607
+ - type: mrr_at_5
1608
+ value: 0.69961
1609
+ - type: ndcg_at_1
1610
+ value: 0.62667
1611
+ - type: ndcg_at_10
1612
+ value: 0.7472
1613
+ - type: ndcg_at_100
1614
+ value: 0.76961
1615
+ - type: ndcg_at_1000
1616
+ value: 0.77294
1617
+ - type: ndcg_at_3
1618
+ value: 0.69776
1619
+ - type: ndcg_at_5
1620
+ value: 0.71964
1621
+ - type: precision_at_1
1622
+ value: 0.62667
1623
+ - type: precision_at_10
1624
+ value: 0.09933
1625
+ - type: precision_at_100
1626
+ value: 0.01103
1627
+ - type: precision_at_1000
1628
+ value: 0.00113
1629
+ - type: precision_at_3
1630
+ value: 0.27
1631
+ - type: precision_at_5
1632
+ value: 0.178
1633
+ - type: recall_at_1
1634
+ value: 0.60011
1635
+ - type: recall_at_10
1636
+ value: 0.878
1637
+ - type: recall_at_100
1638
+ value: 0.97333
1639
+ - type: recall_at_1000
1640
+ value: 1
1641
+ - type: recall_at_3
1642
+ value: 0.74839
1643
+ - type: recall_at_5
1644
+ value: 0.80028
1645
+ - dataset:
1646
+ type: mteb/touche2020
1647
+ name: MTEB Touche2020
1648
+ config: default
1649
+ split: test
1650
+ task:
1651
+ type: Retrieval
1652
+ metrics:
1653
+ - type: map_at_1
1654
+ value: 0.02152
1655
+ - type: map_at_10
1656
+ value: 0.07747
1657
+ - type: map_at_100
1658
+ value: 0.1364
1659
+ - type: map_at_1000
1660
+ value: 0.15235
1661
+ - type: map_at_3
1662
+ value: 0.04103
1663
+ - type: map_at_5
1664
+ value: 0.05482
1665
+ - type: mrr_at_1
1666
+ value: 0.26531
1667
+ - type: mrr_at_10
1668
+ value: 0.41399
1669
+ - type: mrr_at_100
1670
+ value: 0.43047
1671
+ - type: mrr_at_1000
1672
+ value: 0.43047
1673
+ - type: mrr_at_3
1674
+ value: 0.38776
1675
+ - type: mrr_at_5
1676
+ value: 0.40612
1677
+ - type: ndcg_at_1
1678
+ value: 0.23469
1679
+ - type: ndcg_at_10
1680
+ value: 0.20147
1681
+ - type: ndcg_at_100
1682
+ value: 0.3279
1683
+ - type: ndcg_at_1000
1684
+ value: 0.45324
1685
+ - type: ndcg_at_3
1686
+ value: 0.22555
1687
+ - type: ndcg_at_5
1688
+ value: 0.2097
1689
+ - type: precision_at_1
1690
+ value: 0.26531
1691
+ - type: precision_at_10
1692
+ value: 0.17755
1693
+ - type: precision_at_100
1694
+ value: 0.07082
1695
+ - type: precision_at_1000
1696
+ value: 0.01547
1697
+ - type: precision_at_3
1698
+ value: 0.2449
1699
+ - type: precision_at_5
1700
+ value: 0.21633
1701
+ - type: recall_at_1
1702
+ value: 0.02152
1703
+ - type: recall_at_10
1704
+ value: 0.13331
1705
+ - type: recall_at_100
1706
+ value: 0.4535
1707
+ - type: recall_at_1000
1708
+ value: 0.83447
1709
+ - type: recall_at_3
1710
+ value: 0.05531
1711
+ - type: recall_at_5
1712
+ value: 0.08029
1713
+ - dataset:
1714
+ type: mteb/trec-covid
1715
+ name: MTEB TRECCOVID
1716
+ config: default
1717
+ split: test
1718
+ task:
1719
+ type: Retrieval
1720
+ metrics:
1721
+ - type: map_at_1
1722
+ value: 0.00202
1723
+ - type: map_at_10
1724
+ value: 0.01727
1725
+ - type: map_at_100
1726
+ value: 0.10906
1727
+ - type: map_at_1000
1728
+ value: 0.2894
1729
+ - type: map_at_3
1730
+ value: 0.00553
1731
+ - type: map_at_5
1732
+ value: 0.00924
1733
+ - type: mrr_at_1
1734
+ value: 0.74
1735
+ - type: mrr_at_10
1736
+ value: 0.85667
1737
+ - type: mrr_at_100
1738
+ value: 0.85667
1739
+ - type: mrr_at_1000
1740
+ value: 0.85667
1741
+ - type: mrr_at_3
1742
+ value: 0.85667
1743
+ - type: mrr_at_5
1744
+ value: 0.85667
1745
+ - type: ndcg_at_1
1746
+ value: 0.66
1747
+ - type: ndcg_at_10
1748
+ value: 0.69259
1749
+ - type: ndcg_at_100
1750
+ value: 0.57274
1751
+ - type: ndcg_at_1000
1752
+ value: 0.55462
1753
+ - type: ndcg_at_3
1754
+ value: 0.70654
1755
+ - type: ndcg_at_5
1756
+ value: 0.71611
1757
+ - type: precision_at_1
1758
+ value: 0.74
1759
+ - type: precision_at_10
1760
+ value: 0.748
1761
+ - type: precision_at_100
1762
+ value: 0.5962
1763
+ - type: precision_at_1000
1764
+ value: 0.24842
1765
+ - type: precision_at_3
1766
+ value: 0.77333
1767
+ - type: precision_at_5
1768
+ value: 0.788
1769
+ - type: recall_at_1
1770
+ value: 0.00202
1771
+ - type: recall_at_10
1772
+ value: 0.02001
1773
+ - type: recall_at_100
1774
+ value: 0.14801
1775
+ - type: recall_at_1000
1776
+ value: 0.53939
1777
+ - type: recall_at_3
1778
+ value: 0.00609
1779
+ - type: recall_at_5
1780
+ value: 0.01048
1781
+ pipeline_tag: sentence-similarity
1782
+ ---
1783
+ # Granite-Embedding-125m-English
1784
+
1785
+ **Model Summary:**
1786
+ Granite-Embedding-125m-English is a 125M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. While maintaining competitive scores on academic benchmarks such as BEIR, this model also performs well on many enterprise use cases. This model is developed using retrieval oriented pretraining, contrastive finetuning and knowledge distillation.
1787
+
1788
+ - **Developers:** Granite Embedding Team, IBM
1789
+ - **GitHub Repository:** [ibm-granite/granite-embedding-models](https://github.com/ibm-granite/granite-embedding-models)
1790
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
1791
+ - **Paper:** Coming Soon
1792
+ - **Release Date**: December 18th, 2024
1793
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
1794
+
1795
+ **Supported Languages:**
1796
+ English.
1797
+
1798
+ **Intended use:**
1799
+ The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
1800
+
1801
+ **Usage with Sentence Transformers:**
1802
+ The model is compatible with SentenceTransformer library and is very easy to use:
1803
+
1804
+ First, install the sentence transformers library
1805
+ ```shell
1806
+ pip install sentence_transformers
1807
+ ```
1808
+
1809
+ The model can then be used to encode pairs of text and find the similarity between their representations
1810
+
1811
+ ```python
1812
+ from sentence_transformers import SentenceTransformer, util
1813
+
1814
+ model_path = "ibm-granite/granite-embedding-125m-english"
1815
+ # Load the Sentence Transformer model
1816
+ model = SentenceTransformer(model_path)
1817
+
1818
+ input_queries = [
1819
+ ' Who made the song My achy breaky heart? ',
1820
+ 'summit define'
1821
+ ]
1822
+
1823
+ input_passages = [
1824
+ "Achy Breaky Heart is a country song written by Don Von Tress. Originally titled Don't Tell My Heart and performed by The Marcy Brothers in 1991. ",
1825
+ "Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."
1826
+ ]
1827
+
1828
+ # encode queries and passages
1829
+ query_embeddings = model.encode(input_queries)
1830
+ passage_embeddings = model.encode(input_passages)
1831
+
1832
+ # calculate cosine similarity
1833
+ print(util.cos_sim(query_embeddings, passage_embeddings))
1834
+ ```
1835
+
1836
+ **Usage with Huggingface Transformers:**
1837
+ This is a simple example of how to use the Granite-Embedding-125m-English model with the Transformers library and PyTorch.
1838
+
1839
+ First, install the required libraries
1840
+ ```shell
1841
+ pip install transformers torch
1842
+ ```
1843
+
1844
+ The model can then be used to encode pairs of text
1845
+
1846
+ ```python
1847
+ import torch
1848
+ from transformers import AutoModel, AutoTokenizer
1849
+
1850
+ model_path = "ibm-granite/granite-embedding-125m-english"
1851
+
1852
+ # Load the model and tokenizer
1853
+ model = AutoModel.from_pretrained(model_path)
1854
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
1855
+ model.eval()
1856
+
1857
+ input_queries = [
1858
+ ' Who made the song My achy breaky heart? ',
1859
+ 'summit define'
1860
+ ]
1861
+
1862
+ # tokenize inputs
1863
+ tokenized_queries = tokenizer(input_queries, padding=True, truncation=True, return_tensors='pt')
1864
+
1865
+ # encode queries
1866
+ with torch.no_grad():
1867
+ # Queries
1868
+ model_output = model(**tokenized_queries)
1869
+ # Perform pooling. granite-embedding-125m-english uses CLS Pooling
1870
+ query_embeddings = model_output[0][:, 0]
1871
+
1872
+ # normalize the embeddings
1873
+ query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)
1874
+
1875
+ ```
1876
+ **Evaluation:**
1877
+
1878
+ The performance of the Granite-Embedding-125M-English model on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below.
1879
+
1880
+ | Model | Paramters (M)| Embedding Dimension | MTEB Retrieval (15) | CoIR (10) |
1881
+ |---------------------------------|:------------:|:-------------------:|:-------------------: |:----------:|
1882
+ |granite-embedding-125m-english |125 |768 |52.3 |50.3 |
1883
+
1884
+ **Model Architecture:**
1885
+ Granite-Embedding-125m-English is based on an encoder-only RoBERTa like transformer architecture, trained internally at IBM Research.
1886
+
1887
+ | Model | granite-embedding-30m-english | granite-embedding-125m-english | granite-embedding-107m-multilingual | granite-embedding-278m-multilingual |
1888
+ | :--------- | :-------:| :--------: | :-----:| :-----:|
1889
+ | Embedding size | 384 | **768** | 384 | 768 |
1890
+ | Number of layers | 6 | **12** | 6 | 12 |
1891
+ | Number of attention heads | 12 | **12** | 12 | 12 |
1892
+ | Intermediate size | 1536 | **3072** | 1536 | 3072 |
1893
+ | Activation Function | GeLU | **GeLU** | GeLU | GeLU |
1894
+ | Vocabulary Size | 50265| **50265** | 250002 | 250002 |
1895
+ | Max. Sequence Length | 512 | **512** | 512 | 512 |
1896
+ | # Parameters | 30M | **125M** | 107M | 278M |
1897
+
1898
+
1899
+ **Training Data:**
1900
+ Overall, the training data consists of four key sources: (1) unsupervised title-body paired data scraped from the web, (2) publicly available paired with permissive, enterprise-friendly license, (3) IBM-internal paired data targetting specific technical domains, and (4) IBM-generated synthetic data. The data is listed below:
1901
+
1902
+ | **Dataset** | **Num. Pairs** |
1903
+ |----------------------------------------------------|:---------------:|
1904
+ | SPECTER citation triplets | 684,100 |
1905
+ | Stack Exchange Duplicate questions (titles) | 304,525 |
1906
+ | Stack Exchange Duplicate questions (bodies) | 250,519 |
1907
+ | Stack Exchange Duplicate questions (titles+bodies) | 250,460 |
1908
+ | Natural Questions (NQ) | 100,231 |
1909
+ | SQuAD2.0 | 87,599 |
1910
+ | PAQ (Question, Answer) pairs | 64,371,441 |
1911
+ | Stack Exchange (Title, Answer) pairs | 4,067,139 |
1912
+ | Stack Exchange (Title, Body) pairs | 23,978,013 |
1913
+ | Stack Exchange (Title+Body, Answer) pairs | 187,195 |
1914
+ | S2ORC Citation pairs (Titles) | 52,603,982 |
1915
+ | S2ORC (Title, Abstract) | 41,769,185 |
1916
+ | S2ORC (Citations, abstracts) | 52,603,982 |
1917
+ | WikiAnswers Duplicate question pairs | 77,427,422 |
1918
+ | SearchQA | 582,261 |
1919
+ | HotpotQA | 85,000 |
1920
+ | Fever | 109,810 |
1921
+ | Arxiv | 2,358,545 |
1922
+ | Wikipedia | 20,745,403 |
1923
+ | PubMed | 20,000,000 |
1924
+ | Miracl En Pairs | 9,016 |
1925
+ | DBPedia Title-Body Pairs | 4,635,922 |
1926
+ | Synthetic: Query-Wikipedia Passage | 1,879,093 |
1927
+ | Synthetic: Fact Verification | 9,888 |
1928
+ | IBM Internal Triples | 40,290 |
1929
+ | IBM Internal Title-Body Pairs | 1,524,586 |
1930
+
1931
+ Notably, we do not use the popular MS-MARCO retrieval dataset in our training corpus due to its non-commercial license, while other open-source models train on this dataset due to its high quality.
1932
+
1933
+ **Infrastructure:**
1934
+ We train Granite Embedding Models using IBM's computing cluster, Cognitive Compute Cluster, which is outfitted with NVIDIA A100 80gb GPUs. This cluster provides a scalable and efficient infrastructure for training our models over multiple GPUs.
1935
+
1936
+ **Ethical Considerations and Limitations:**
1937
+ The data used to train the base language model was filtered to remove text containing hate, abuse, and profanity. Granite-Embedding-125m-English is trained only for English texts, and has a context length of 512 tokens (longer texts will be truncated to this size).
1938
+
1939
+
1940
+ <!-- ## Citation
1941
+ ```
1942
+ @misc{granite-embedding-models,
1943
+ author = {author 1, author2, ...},
1944
+ title = {},
1945
+ journal = {},
1946
+ volume = {},
1947
+ year = {2024},
1948
+ url = {https://arxiv.org/abs/0000.00000},
1949
+ }
1950
+ ``` -->
config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "layer_norm_epsilon": 1e-05,
5
+ "multi_query_attention": false,
6
+ "unk_token": "<unk>"
7
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5540cde121e37f10ba85db71ff244844ddd7eb50f7d3d1e515faa020a8baa24c
3
+ size 125709805
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": true,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "max_length": 512,
52
+ "model_max_length": 512,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "stride": 0,
56
+ "tokenizer_class": "RobertaTokenizer",
57
+ "trim_offsets": true,
58
+ "truncation_side": "right",
59
+ "truncation_strategy": "longest_first",
60
+ "unk_token": "<unk>"
61
+ }
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff