thenlper commited on
Commit
c153a79
·
1 Parent(s): 83280df

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +1144 -0
README.md CHANGED
@@ -1,3 +1,1147 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - mteb
4
+ - sentence-similarity
5
+ - sentence-transformers
6
+ - Sentence Transformers
7
+ model-index:
8
+ - name: gte-base-zh
9
+ results:
10
+ - task:
11
+ type: STS
12
+ dataset:
13
+ type: C-MTEB/AFQMC
14
+ name: MTEB AFQMC
15
+ config: default
16
+ split: validation
17
+ revision: None
18
+ metrics:
19
+ - type: cos_sim_pearson
20
+ value: 44.45621572456527
21
+ - type: cos_sim_spearman
22
+ value: 49.06500895667604
23
+ - type: euclidean_pearson
24
+ value: 47.55002064096053
25
+ - type: euclidean_spearman
26
+ value: 49.06500895667604
27
+ - type: manhattan_pearson
28
+ value: 47.429900262366715
29
+ - type: manhattan_spearman
30
+ value: 48.95704890278774
31
+ - task:
32
+ type: STS
33
+ dataset:
34
+ type: C-MTEB/ATEC
35
+ name: MTEB ATEC
36
+ config: default
37
+ split: test
38
+ revision: None
39
+ metrics:
40
+ - type: cos_sim_pearson
41
+ value: 44.31699346653116
42
+ - type: cos_sim_spearman
43
+ value: 50.83133156721432
44
+ - type: euclidean_pearson
45
+ value: 51.36086517946001
46
+ - type: euclidean_spearman
47
+ value: 50.83132818894256
48
+ - type: manhattan_pearson
49
+ value: 51.255926461574084
50
+ - type: manhattan_spearman
51
+ value: 50.73460147395406
52
+ - task:
53
+ type: Classification
54
+ dataset:
55
+ type: mteb/amazon_reviews_multi
56
+ name: MTEB AmazonReviewsClassification (zh)
57
+ config: zh
58
+ split: test
59
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
60
+ metrics:
61
+ - type: accuracy
62
+ value: 45.818000000000005
63
+ - type: f1
64
+ value: 43.998253644678144
65
+ - task:
66
+ type: STS
67
+ dataset:
68
+ type: C-MTEB/BQ
69
+ name: MTEB BQ
70
+ config: default
71
+ split: test
72
+ revision: None
73
+ metrics:
74
+ - type: cos_sim_pearson
75
+ value: 63.47477451918581
76
+ - type: cos_sim_spearman
77
+ value: 65.49832607366159
78
+ - type: euclidean_pearson
79
+ value: 64.11399760832107
80
+ - type: euclidean_spearman
81
+ value: 65.49832260877398
82
+ - type: manhattan_pearson
83
+ value: 64.02541311484639
84
+ - type: manhattan_spearman
85
+ value: 65.42436057501452
86
+ - task:
87
+ type: Clustering
88
+ dataset:
89
+ type: C-MTEB/CLSClusteringP2P
90
+ name: MTEB CLSClusteringP2P
91
+ config: default
92
+ split: test
93
+ revision: None
94
+ metrics:
95
+ - type: v_measure
96
+ value: 42.58046835435111
97
+ - task:
98
+ type: Clustering
99
+ dataset:
100
+ type: C-MTEB/CLSClusteringS2S
101
+ name: MTEB CLSClusteringS2S
102
+ config: default
103
+ split: test
104
+ revision: None
105
+ metrics:
106
+ - type: v_measure
107
+ value: 40.42134173217685
108
+ - task:
109
+ type: Reranking
110
+ dataset:
111
+ type: C-MTEB/CMedQAv1-reranking
112
+ name: MTEB CMedQAv1
113
+ config: default
114
+ split: test
115
+ revision: None
116
+ metrics:
117
+ - type: map
118
+ value: 86.79079943923792
119
+ - type: mrr
120
+ value: 88.81341269841269
121
+ - task:
122
+ type: Reranking
123
+ dataset:
124
+ type: C-MTEB/CMedQAv2-reranking
125
+ name: MTEB CMedQAv2
126
+ config: default
127
+ split: test
128
+ revision: None
129
+ metrics:
130
+ - type: map
131
+ value: 87.20186031249037
132
+ - type: mrr
133
+ value: 89.46551587301587
134
+ - task:
135
+ type: Retrieval
136
+ dataset:
137
+ type: C-MTEB/CmedqaRetrieval
138
+ name: MTEB CmedqaRetrieval
139
+ config: default
140
+ split: dev
141
+ revision: None
142
+ metrics:
143
+ - type: map_at_1
144
+ value: 25.098
145
+ - type: map_at_10
146
+ value: 37.759
147
+ - type: map_at_100
148
+ value: 39.693
149
+ - type: map_at_1000
150
+ value: 39.804
151
+ - type: map_at_3
152
+ value: 33.477000000000004
153
+ - type: map_at_5
154
+ value: 35.839
155
+ - type: mrr_at_1
156
+ value: 38.06
157
+ - type: mrr_at_10
158
+ value: 46.302
159
+ - type: mrr_at_100
160
+ value: 47.370000000000005
161
+ - type: mrr_at_1000
162
+ value: 47.412
163
+ - type: mrr_at_3
164
+ value: 43.702999999999996
165
+ - type: mrr_at_5
166
+ value: 45.213
167
+ - type: ndcg_at_1
168
+ value: 38.06
169
+ - type: ndcg_at_10
170
+ value: 44.375
171
+ - type: ndcg_at_100
172
+ value: 51.849999999999994
173
+ - type: ndcg_at_1000
174
+ value: 53.725
175
+ - type: ndcg_at_3
176
+ value: 38.97
177
+ - type: ndcg_at_5
178
+ value: 41.193000000000005
179
+ - type: precision_at_1
180
+ value: 38.06
181
+ - type: precision_at_10
182
+ value: 9.934999999999999
183
+ - type: precision_at_100
184
+ value: 1.599
185
+ - type: precision_at_1000
186
+ value: 0.183
187
+ - type: precision_at_3
188
+ value: 22.072
189
+ - type: precision_at_5
190
+ value: 16.089000000000002
191
+ - type: recall_at_1
192
+ value: 25.098
193
+ - type: recall_at_10
194
+ value: 55.264
195
+ - type: recall_at_100
196
+ value: 85.939
197
+ - type: recall_at_1000
198
+ value: 98.44800000000001
199
+ - type: recall_at_3
200
+ value: 39.122
201
+ - type: recall_at_5
202
+ value: 45.948
203
+ - task:
204
+ type: PairClassification
205
+ dataset:
206
+ type: C-MTEB/CMNLI
207
+ name: MTEB Cmnli
208
+ config: default
209
+ split: validation
210
+ revision: None
211
+ metrics:
212
+ - type: cos_sim_accuracy
213
+ value: 78.02766085387853
214
+ - type: cos_sim_ap
215
+ value: 85.59982802559004
216
+ - type: cos_sim_f1
217
+ value: 79.57103418984921
218
+ - type: cos_sim_precision
219
+ value: 72.88465279128575
220
+ - type: cos_sim_recall
221
+ value: 87.60813654430676
222
+ - type: dot_accuracy
223
+ value: 78.02766085387853
224
+ - type: dot_ap
225
+ value: 85.59604477360719
226
+ - type: dot_f1
227
+ value: 79.57103418984921
228
+ - type: dot_precision
229
+ value: 72.88465279128575
230
+ - type: dot_recall
231
+ value: 87.60813654430676
232
+ - type: euclidean_accuracy
233
+ value: 78.02766085387853
234
+ - type: euclidean_ap
235
+ value: 85.59982802559004
236
+ - type: euclidean_f1
237
+ value: 79.57103418984921
238
+ - type: euclidean_precision
239
+ value: 72.88465279128575
240
+ - type: euclidean_recall
241
+ value: 87.60813654430676
242
+ - type: manhattan_accuracy
243
+ value: 77.9795550210463
244
+ - type: manhattan_ap
245
+ value: 85.58042267497707
246
+ - type: manhattan_f1
247
+ value: 79.40344001741781
248
+ - type: manhattan_precision
249
+ value: 74.29211652067632
250
+ - type: manhattan_recall
251
+ value: 85.27004909983633
252
+ - type: max_accuracy
253
+ value: 78.02766085387853
254
+ - type: max_ap
255
+ value: 85.59982802559004
256
+ - type: max_f1
257
+ value: 79.57103418984921
258
+ - task:
259
+ type: Retrieval
260
+ dataset:
261
+ type: C-MTEB/CovidRetrieval
262
+ name: MTEB CovidRetrieval
263
+ config: default
264
+ split: dev
265
+ revision: None
266
+ metrics:
267
+ - type: map_at_1
268
+ value: 62.144
269
+ - type: map_at_10
270
+ value: 71.589
271
+ - type: map_at_100
272
+ value: 72.066
273
+ - type: map_at_1000
274
+ value: 72.075
275
+ - type: map_at_3
276
+ value: 69.916
277
+ - type: map_at_5
278
+ value: 70.806
279
+ - type: mrr_at_1
280
+ value: 62.275999999999996
281
+ - type: mrr_at_10
282
+ value: 71.57
283
+ - type: mrr_at_100
284
+ value: 72.048
285
+ - type: mrr_at_1000
286
+ value: 72.057
287
+ - type: mrr_at_3
288
+ value: 69.89800000000001
289
+ - type: mrr_at_5
290
+ value: 70.84700000000001
291
+ - type: ndcg_at_1
292
+ value: 62.381
293
+ - type: ndcg_at_10
294
+ value: 75.74
295
+ - type: ndcg_at_100
296
+ value: 77.827
297
+ - type: ndcg_at_1000
298
+ value: 78.044
299
+ - type: ndcg_at_3
300
+ value: 72.307
301
+ - type: ndcg_at_5
302
+ value: 73.91499999999999
303
+ - type: precision_at_1
304
+ value: 62.381
305
+ - type: precision_at_10
306
+ value: 8.946
307
+ - type: precision_at_100
308
+ value: 0.988
309
+ - type: precision_at_1000
310
+ value: 0.101
311
+ - type: precision_at_3
312
+ value: 26.554
313
+ - type: precision_at_5
314
+ value: 16.733
315
+ - type: recall_at_1
316
+ value: 62.144
317
+ - type: recall_at_10
318
+ value: 88.567
319
+ - type: recall_at_100
320
+ value: 97.84
321
+ - type: recall_at_1000
322
+ value: 99.473
323
+ - type: recall_at_3
324
+ value: 79.083
325
+ - type: recall_at_5
326
+ value: 83.035
327
+ - task:
328
+ type: Retrieval
329
+ dataset:
330
+ type: C-MTEB/DuRetrieval
331
+ name: MTEB DuRetrieval
332
+ config: default
333
+ split: dev
334
+ revision: None
335
+ metrics:
336
+ - type: map_at_1
337
+ value: 24.665
338
+ - type: map_at_10
339
+ value: 74.91600000000001
340
+ - type: map_at_100
341
+ value: 77.981
342
+ - type: map_at_1000
343
+ value: 78.032
344
+ - type: map_at_3
345
+ value: 51.015
346
+ - type: map_at_5
347
+ value: 64.681
348
+ - type: mrr_at_1
349
+ value: 86.5
350
+ - type: mrr_at_10
351
+ value: 90.78399999999999
352
+ - type: mrr_at_100
353
+ value: 90.859
354
+ - type: mrr_at_1000
355
+ value: 90.863
356
+ - type: mrr_at_3
357
+ value: 90.375
358
+ - type: mrr_at_5
359
+ value: 90.66199999999999
360
+ - type: ndcg_at_1
361
+ value: 86.5
362
+ - type: ndcg_at_10
363
+ value: 83.635
364
+ - type: ndcg_at_100
365
+ value: 86.926
366
+ - type: ndcg_at_1000
367
+ value: 87.425
368
+ - type: ndcg_at_3
369
+ value: 81.28999999999999
370
+ - type: ndcg_at_5
371
+ value: 80.549
372
+ - type: precision_at_1
373
+ value: 86.5
374
+ - type: precision_at_10
375
+ value: 40.544999999999995
376
+ - type: precision_at_100
377
+ value: 4.748
378
+ - type: precision_at_1000
379
+ value: 0.48700000000000004
380
+ - type: precision_at_3
381
+ value: 72.68299999999999
382
+ - type: precision_at_5
383
+ value: 61.86000000000001
384
+ - type: recall_at_1
385
+ value: 24.665
386
+ - type: recall_at_10
387
+ value: 85.72
388
+ - type: recall_at_100
389
+ value: 96.116
390
+ - type: recall_at_1000
391
+ value: 98.772
392
+ - type: recall_at_3
393
+ value: 53.705999999999996
394
+ - type: recall_at_5
395
+ value: 70.42699999999999
396
+ - task:
397
+ type: Retrieval
398
+ dataset:
399
+ type: C-MTEB/EcomRetrieval
400
+ name: MTEB EcomRetrieval
401
+ config: default
402
+ split: dev
403
+ revision: None
404
+ metrics:
405
+ - type: map_at_1
406
+ value: 54.0
407
+ - type: map_at_10
408
+ value: 64.449
409
+ - type: map_at_100
410
+ value: 64.937
411
+ - type: map_at_1000
412
+ value: 64.946
413
+ - type: map_at_3
414
+ value: 61.85000000000001
415
+ - type: map_at_5
416
+ value: 63.525
417
+ - type: mrr_at_1
418
+ value: 54.0
419
+ - type: mrr_at_10
420
+ value: 64.449
421
+ - type: mrr_at_100
422
+ value: 64.937
423
+ - type: mrr_at_1000
424
+ value: 64.946
425
+ - type: mrr_at_3
426
+ value: 61.85000000000001
427
+ - type: mrr_at_5
428
+ value: 63.525
429
+ - type: ndcg_at_1
430
+ value: 54.0
431
+ - type: ndcg_at_10
432
+ value: 69.56400000000001
433
+ - type: ndcg_at_100
434
+ value: 71.78999999999999
435
+ - type: ndcg_at_1000
436
+ value: 72.021
437
+ - type: ndcg_at_3
438
+ value: 64.334
439
+ - type: ndcg_at_5
440
+ value: 67.368
441
+ - type: precision_at_1
442
+ value: 54.0
443
+ - type: precision_at_10
444
+ value: 8.559999999999999
445
+ - type: precision_at_100
446
+ value: 0.9570000000000001
447
+ - type: precision_at_1000
448
+ value: 0.098
449
+ - type: precision_at_3
450
+ value: 23.833
451
+ - type: precision_at_5
452
+ value: 15.78
453
+ - type: recall_at_1
454
+ value: 54.0
455
+ - type: recall_at_10
456
+ value: 85.6
457
+ - type: recall_at_100
458
+ value: 95.7
459
+ - type: recall_at_1000
460
+ value: 97.5
461
+ - type: recall_at_3
462
+ value: 71.5
463
+ - type: recall_at_5
464
+ value: 78.9
465
+ - task:
466
+ type: Classification
467
+ dataset:
468
+ type: C-MTEB/IFlyTek-classification
469
+ name: MTEB IFlyTek
470
+ config: default
471
+ split: validation
472
+ revision: None
473
+ metrics:
474
+ - type: accuracy
475
+ value: 48.61869949980762
476
+ - type: f1
477
+ value: 36.49337336098832
478
+ - task:
479
+ type: Classification
480
+ dataset:
481
+ type: C-MTEB/JDReview-classification
482
+ name: MTEB JDReview
483
+ config: default
484
+ split: test
485
+ revision: None
486
+ metrics:
487
+ - type: accuracy
488
+ value: 85.94746716697938
489
+ - type: ap
490
+ value: 53.75927589310753
491
+ - type: f1
492
+ value: 80.53821597736138
493
+ - task:
494
+ type: STS
495
+ dataset:
496
+ type: C-MTEB/LCQMC
497
+ name: MTEB LCQMC
498
+ config: default
499
+ split: test
500
+ revision: None
501
+ metrics:
502
+ - type: cos_sim_pearson
503
+ value: 68.77445518082875
504
+ - type: cos_sim_spearman
505
+ value: 74.05909185405268
506
+ - type: euclidean_pearson
507
+ value: 72.92870557009725
508
+ - type: euclidean_spearman
509
+ value: 74.05909628639644
510
+ - type: manhattan_pearson
511
+ value: 72.92072580598351
512
+ - type: manhattan_spearman
513
+ value: 74.0304390211741
514
+ - task:
515
+ type: Reranking
516
+ dataset:
517
+ type: C-MTEB/Mmarco-reranking
518
+ name: MTEB MMarcoReranking
519
+ config: default
520
+ split: dev
521
+ revision: None
522
+ metrics:
523
+ - type: map
524
+ value: 27.643607073221975
525
+ - type: mrr
526
+ value: 26.646825396825395
527
+ - task:
528
+ type: Retrieval
529
+ dataset:
530
+ type: C-MTEB/MMarcoRetrieval
531
+ name: MTEB MMarcoRetrieval
532
+ config: default
533
+ split: dev
534
+ revision: None
535
+ metrics:
536
+ - type: map_at_1
537
+ value: 65.10000000000001
538
+ - type: map_at_10
539
+ value: 74.014
540
+ - type: map_at_100
541
+ value: 74.372
542
+ - type: map_at_1000
543
+ value: 74.385
544
+ - type: map_at_3
545
+ value: 72.179
546
+ - type: map_at_5
547
+ value: 73.37700000000001
548
+ - type: mrr_at_1
549
+ value: 67.364
550
+ - type: mrr_at_10
551
+ value: 74.68
552
+ - type: mrr_at_100
553
+ value: 74.992
554
+ - type: mrr_at_1000
555
+ value: 75.003
556
+ - type: mrr_at_3
557
+ value: 73.054
558
+ - type: mrr_at_5
559
+ value: 74.126
560
+ - type: ndcg_at_1
561
+ value: 67.364
562
+ - type: ndcg_at_10
563
+ value: 77.704
564
+ - type: ndcg_at_100
565
+ value: 79.29899999999999
566
+ - type: ndcg_at_1000
567
+ value: 79.637
568
+ - type: ndcg_at_3
569
+ value: 74.232
570
+ - type: ndcg_at_5
571
+ value: 76.264
572
+ - type: precision_at_1
573
+ value: 67.364
574
+ - type: precision_at_10
575
+ value: 9.397
576
+ - type: precision_at_100
577
+ value: 1.019
578
+ - type: precision_at_1000
579
+ value: 0.105
580
+ - type: precision_at_3
581
+ value: 27.942
582
+ - type: precision_at_5
583
+ value: 17.837
584
+ - type: recall_at_1
585
+ value: 65.10000000000001
586
+ - type: recall_at_10
587
+ value: 88.416
588
+ - type: recall_at_100
589
+ value: 95.61
590
+ - type: recall_at_1000
591
+ value: 98.261
592
+ - type: recall_at_3
593
+ value: 79.28
594
+ - type: recall_at_5
595
+ value: 84.108
596
+ - task:
597
+ type: Classification
598
+ dataset:
599
+ type: mteb/amazon_massive_intent
600
+ name: MTEB MassiveIntentClassification (zh-CN)
601
+ config: zh-CN
602
+ split: test
603
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
604
+ metrics:
605
+ - type: accuracy
606
+ value: 73.315400134499
607
+ - type: f1
608
+ value: 70.81060697693198
609
+ - task:
610
+ type: Classification
611
+ dataset:
612
+ type: mteb/amazon_massive_scenario
613
+ name: MTEB MassiveScenarioClassification (zh-CN)
614
+ config: zh-CN
615
+ split: test
616
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
617
+ metrics:
618
+ - type: accuracy
619
+ value: 76.78883658372563
620
+ - type: f1
621
+ value: 76.21512438791976
622
+ - task:
623
+ type: Retrieval
624
+ dataset:
625
+ type: C-MTEB/MedicalRetrieval
626
+ name: MTEB MedicalRetrieval
627
+ config: default
628
+ split: dev
629
+ revision: None
630
+ metrics:
631
+ - type: map_at_1
632
+ value: 55.300000000000004
633
+ - type: map_at_10
634
+ value: 61.879
635
+ - type: map_at_100
636
+ value: 62.434
637
+ - type: map_at_1000
638
+ value: 62.476
639
+ - type: map_at_3
640
+ value: 60.417
641
+ - type: map_at_5
642
+ value: 61.297000000000004
643
+ - type: mrr_at_1
644
+ value: 55.400000000000006
645
+ - type: mrr_at_10
646
+ value: 61.92100000000001
647
+ - type: mrr_at_100
648
+ value: 62.476
649
+ - type: mrr_at_1000
650
+ value: 62.517999999999994
651
+ - type: mrr_at_3
652
+ value: 60.483
653
+ - type: mrr_at_5
654
+ value: 61.338
655
+ - type: ndcg_at_1
656
+ value: 55.300000000000004
657
+ - type: ndcg_at_10
658
+ value: 64.937
659
+ - type: ndcg_at_100
660
+ value: 67.848
661
+ - type: ndcg_at_1000
662
+ value: 68.996
663
+ - type: ndcg_at_3
664
+ value: 61.939
665
+ - type: ndcg_at_5
666
+ value: 63.556999999999995
667
+ - type: precision_at_1
668
+ value: 55.300000000000004
669
+ - type: precision_at_10
670
+ value: 7.449999999999999
671
+ - type: precision_at_100
672
+ value: 0.886
673
+ - type: precision_at_1000
674
+ value: 0.098
675
+ - type: precision_at_3
676
+ value: 22.1
677
+ - type: precision_at_5
678
+ value: 14.06
679
+ - type: recall_at_1
680
+ value: 55.300000000000004
681
+ - type: recall_at_10
682
+ value: 74.5
683
+ - type: recall_at_100
684
+ value: 88.6
685
+ - type: recall_at_1000
686
+ value: 97.7
687
+ - type: recall_at_3
688
+ value: 66.3
689
+ - type: recall_at_5
690
+ value: 70.3
691
+ - task:
692
+ type: Classification
693
+ dataset:
694
+ type: C-MTEB/MultilingualSentiment-classification
695
+ name: MTEB MultilingualSentiment
696
+ config: default
697
+ split: validation
698
+ revision: None
699
+ metrics:
700
+ - type: accuracy
701
+ value: 75.79
702
+ - type: f1
703
+ value: 75.58944709087194
704
+ - task:
705
+ type: PairClassification
706
+ dataset:
707
+ type: C-MTEB/OCNLI
708
+ name: MTEB Ocnli
709
+ config: default
710
+ split: validation
711
+ revision: None
712
+ metrics:
713
+ - type: cos_sim_accuracy
714
+ value: 71.5755278830536
715
+ - type: cos_sim_ap
716
+ value: 75.27777388526098
717
+ - type: cos_sim_f1
718
+ value: 75.04604051565377
719
+ - type: cos_sim_precision
720
+ value: 66.53061224489795
721
+ - type: cos_sim_recall
722
+ value: 86.06124604012672
723
+ - type: dot_accuracy
724
+ value: 71.5755278830536
725
+ - type: dot_ap
726
+ value: 75.27765883143745
727
+ - type: dot_f1
728
+ value: 75.04604051565377
729
+ - type: dot_precision
730
+ value: 66.53061224489795
731
+ - type: dot_recall
732
+ value: 86.06124604012672
733
+ - type: euclidean_accuracy
734
+ value: 71.5755278830536
735
+ - type: euclidean_ap
736
+ value: 75.27762982049899
737
+ - type: euclidean_f1
738
+ value: 75.04604051565377
739
+ - type: euclidean_precision
740
+ value: 66.53061224489795
741
+ - type: euclidean_recall
742
+ value: 86.06124604012672
743
+ - type: manhattan_accuracy
744
+ value: 71.41310232809963
745
+ - type: manhattan_ap
746
+ value: 75.11908556317425
747
+ - type: manhattan_f1
748
+ value: 75.0118091639112
749
+ - type: manhattan_precision
750
+ value: 67.86324786324786
751
+ - type: manhattan_recall
752
+ value: 83.84371700105596
753
+ - type: max_accuracy
754
+ value: 71.5755278830536
755
+ - type: max_ap
756
+ value: 75.27777388526098
757
+ - type: max_f1
758
+ value: 75.04604051565377
759
+ - task:
760
+ type: Classification
761
+ dataset:
762
+ type: C-MTEB/OnlineShopping-classification
763
+ name: MTEB OnlineShopping
764
+ config: default
765
+ split: test
766
+ revision: None
767
+ metrics:
768
+ - type: accuracy
769
+ value: 93.36
770
+ - type: ap
771
+ value: 91.66871784150999
772
+ - type: f1
773
+ value: 93.35216314755989
774
+ - task:
775
+ type: STS
776
+ dataset:
777
+ type: C-MTEB/PAWSX
778
+ name: MTEB PAWSX
779
+ config: default
780
+ split: test
781
+ revision: None
782
+ metrics:
783
+ - type: cos_sim_pearson
784
+ value: 24.21926662784366
785
+ - type: cos_sim_spearman
786
+ value: 27.969680921064644
787
+ - type: euclidean_pearson
788
+ value: 28.75506415195721
789
+ - type: euclidean_spearman
790
+ value: 27.969593815056058
791
+ - type: manhattan_pearson
792
+ value: 28.90608040712011
793
+ - type: manhattan_spearman
794
+ value: 28.07097299964309
795
+ - task:
796
+ type: STS
797
+ dataset:
798
+ type: C-MTEB/QBQTC
799
+ name: MTEB QBQTC
800
+ config: default
801
+ split: test
802
+ revision: None
803
+ metrics:
804
+ - type: cos_sim_pearson
805
+ value: 33.4112661812038
806
+ - type: cos_sim_spearman
807
+ value: 35.192765228905174
808
+ - type: euclidean_pearson
809
+ value: 33.57803958232971
810
+ - type: euclidean_spearman
811
+ value: 35.19270413260232
812
+ - type: manhattan_pearson
813
+ value: 33.75933288702631
814
+ - type: manhattan_spearman
815
+ value: 35.362780488430126
816
+ - task:
817
+ type: STS
818
+ dataset:
819
+ type: mteb/sts22-crosslingual-sts
820
+ name: MTEB STS22 (zh)
821
+ config: zh
822
+ split: test
823
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
824
+ metrics:
825
+ - type: cos_sim_pearson
826
+ value: 62.178764479940206
827
+ - type: cos_sim_spearman
828
+ value: 63.644049344272155
829
+ - type: euclidean_pearson
830
+ value: 61.97852518030118
831
+ - type: euclidean_spearman
832
+ value: 63.644049344272155
833
+ - type: manhattan_pearson
834
+ value: 62.3931275533103
835
+ - type: manhattan_spearman
836
+ value: 63.68720814152202
837
+ - task:
838
+ type: STS
839
+ dataset:
840
+ type: C-MTEB/STSB
841
+ name: MTEB STSB
842
+ config: default
843
+ split: test
844
+ revision: None
845
+ metrics:
846
+ - type: cos_sim_pearson
847
+ value: 81.09847341753118
848
+ - type: cos_sim_spearman
849
+ value: 81.46211495319093
850
+ - type: euclidean_pearson
851
+ value: 80.97905808856734
852
+ - type: euclidean_spearman
853
+ value: 81.46177732221445
854
+ - type: manhattan_pearson
855
+ value: 80.8737913286308
856
+ - type: manhattan_spearman
857
+ value: 81.41142532907402
858
+ - task:
859
+ type: Reranking
860
+ dataset:
861
+ type: C-MTEB/T2Reranking
862
+ name: MTEB T2Reranking
863
+ config: default
864
+ split: dev
865
+ revision: None
866
+ metrics:
867
+ - type: map
868
+ value: 66.36295416100998
869
+ - type: mrr
870
+ value: 76.42041058129412
871
+ - task:
872
+ type: Retrieval
873
+ dataset:
874
+ type: C-MTEB/T2Retrieval
875
+ name: MTEB T2Retrieval
876
+ config: default
877
+ split: dev
878
+ revision: None
879
+ metrics:
880
+ - type: map_at_1
881
+ value: 26.898
882
+ - type: map_at_10
883
+ value: 75.089
884
+ - type: map_at_100
885
+ value: 78.786
886
+ - type: map_at_1000
887
+ value: 78.86
888
+ - type: map_at_3
889
+ value: 52.881
890
+ - type: map_at_5
891
+ value: 64.881
892
+ - type: mrr_at_1
893
+ value: 88.984
894
+ - type: mrr_at_10
895
+ value: 91.681
896
+ - type: mrr_at_100
897
+ value: 91.77300000000001
898
+ - type: mrr_at_1000
899
+ value: 91.777
900
+ - type: mrr_at_3
901
+ value: 91.205
902
+ - type: mrr_at_5
903
+ value: 91.486
904
+ - type: ndcg_at_1
905
+ value: 88.984
906
+ - type: ndcg_at_10
907
+ value: 83.083
908
+ - type: ndcg_at_100
909
+ value: 86.955
910
+ - type: ndcg_at_1000
911
+ value: 87.665
912
+ - type: ndcg_at_3
913
+ value: 84.661
914
+ - type: ndcg_at_5
915
+ value: 83.084
916
+ - type: precision_at_1
917
+ value: 88.984
918
+ - type: precision_at_10
919
+ value: 41.311
920
+ - type: precision_at_100
921
+ value: 4.978
922
+ - type: precision_at_1000
923
+ value: 0.515
924
+ - type: precision_at_3
925
+ value: 74.074
926
+ - type: precision_at_5
927
+ value: 61.956999999999994
928
+ - type: recall_at_1
929
+ value: 26.898
930
+ - type: recall_at_10
931
+ value: 82.03200000000001
932
+ - type: recall_at_100
933
+ value: 94.593
934
+ - type: recall_at_1000
935
+ value: 98.188
936
+ - type: recall_at_3
937
+ value: 54.647999999999996
938
+ - type: recall_at_5
939
+ value: 68.394
940
+ - task:
941
+ type: Classification
942
+ dataset:
943
+ type: C-MTEB/TNews-classification
944
+ name: MTEB TNews
945
+ config: default
946
+ split: validation
947
+ revision: None
948
+ metrics:
949
+ - type: accuracy
950
+ value: 53.648999999999994
951
+ - type: f1
952
+ value: 51.87788185753318
953
+ - task:
954
+ type: Clustering
955
+ dataset:
956
+ type: C-MTEB/ThuNewsClusteringP2P
957
+ name: MTEB ThuNewsClusteringP2P
958
+ config: default
959
+ split: test
960
+ revision: None
961
+ metrics:
962
+ - type: v_measure
963
+ value: 68.81293224496076
964
+ - task:
965
+ type: Clustering
966
+ dataset:
967
+ type: C-MTEB/ThuNewsClusteringS2S
968
+ name: MTEB ThuNewsClusteringS2S
969
+ config: default
970
+ split: test
971
+ revision: None
972
+ metrics:
973
+ - type: v_measure
974
+ value: 63.60504270553153
975
+ - task:
976
+ type: Retrieval
977
+ dataset:
978
+ type: C-MTEB/VideoRetrieval
979
+ name: MTEB VideoRetrieval
980
+ config: default
981
+ split: dev
982
+ revision: None
983
+ metrics:
984
+ - type: map_at_1
985
+ value: 59.3
986
+ - type: map_at_10
987
+ value: 69.89
988
+ - type: map_at_100
989
+ value: 70.261
990
+ - type: map_at_1000
991
+ value: 70.27
992
+ - type: map_at_3
993
+ value: 67.93299999999999
994
+ - type: map_at_5
995
+ value: 69.10300000000001
996
+ - type: mrr_at_1
997
+ value: 59.3
998
+ - type: mrr_at_10
999
+ value: 69.89
1000
+ - type: mrr_at_100
1001
+ value: 70.261
1002
+ - type: mrr_at_1000
1003
+ value: 70.27
1004
+ - type: mrr_at_3
1005
+ value: 67.93299999999999
1006
+ - type: mrr_at_5
1007
+ value: 69.10300000000001
1008
+ - type: ndcg_at_1
1009
+ value: 59.3
1010
+ - type: ndcg_at_10
1011
+ value: 74.67099999999999
1012
+ - type: ndcg_at_100
1013
+ value: 76.371
1014
+ - type: ndcg_at_1000
1015
+ value: 76.644
1016
+ - type: ndcg_at_3
1017
+ value: 70.678
1018
+ - type: ndcg_at_5
1019
+ value: 72.783
1020
+ - type: precision_at_1
1021
+ value: 59.3
1022
+ - type: precision_at_10
1023
+ value: 8.95
1024
+ - type: precision_at_100
1025
+ value: 0.972
1026
+ - type: precision_at_1000
1027
+ value: 0.099
1028
+ - type: precision_at_3
1029
+ value: 26.200000000000003
1030
+ - type: precision_at_5
1031
+ value: 16.74
1032
+ - type: recall_at_1
1033
+ value: 59.3
1034
+ - type: recall_at_10
1035
+ value: 89.5
1036
+ - type: recall_at_100
1037
+ value: 97.2
1038
+ - type: recall_at_1000
1039
+ value: 99.4
1040
+ - type: recall_at_3
1041
+ value: 78.60000000000001
1042
+ - type: recall_at_5
1043
+ value: 83.7
1044
+ - task:
1045
+ type: Classification
1046
+ dataset:
1047
+ type: C-MTEB/waimai-classification
1048
+ name: MTEB Waimai
1049
+ config: default
1050
+ split: test
1051
+ revision: None
1052
+ metrics:
1053
+ - type: accuracy
1054
+ value: 88.07000000000001
1055
+ - type: ap
1056
+ value: 72.68881791758656
1057
+ - type: f1
1058
+ value: 86.647906274628
1059
+ language:
1060
+ - en
1061
  license: mit
1062
  ---
1063
+
1064
+ # gte-large-zh
1065
+
1066
+ General Text Embeddings (GTE) model. [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281)
1067
+
1068
+ The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer different sizes of models for both Chinese and English Languages. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including **information retrieval**, **semantic textual similarity**, **text reranking**, etc.
1069
+
1070
+ ## Model List
1071
+
1072
+ | Models | Language | Max Sequence Length | Dimension | Model Size |
1073
+ |:-----: | :-----: |:-----: |:-----: |:-----: |
1074
+ |[GTE-large-zh](https://huggingface.co/thenlper/gte-large-zh) | Chinese | 512 | 1024 | 0.67GB |
1075
+ |[GTE-base-zh](https://huggingface.co/thenlper/gte-base-zh) | Chinese | 512 | 1024 | 0.67GB |
1076
+ |[GTE-small-zh](https://huggingface.co/thenlper/gte-small-zh) | Chinese | 512 | 1024 | 0.67GB |
1077
+ |[GTE-large](https://huggingface.co/thenlper/gte-large) | English | 512 | 1024 | 0.67GB |
1078
+ |[GTE-base](https://huggingface.co/thenlper/gte-base) | English | 512 | 1024 | 0.67GB |
1079
+ |[GTE-small](https://huggingface.co/thenlper/gte-small) | English | 512 | 1024 | 0.67GB |
1080
+
1081
+
1082
+ ## Metrics
1083
+
1084
+ We compared the performance of the GTE models with other popular text embedding models on the MTEB (CMTEB for Chinese language) benchmark. For more detailed comparison results, please refer to the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
1085
+
1086
+
1087
+ ## Usage
1088
+
1089
+ Code example
1090
+
1091
+ ```python
1092
+ import torch.nn.functional as F
1093
+ from torch import Tensor
1094
+ from transformers import AutoTokenizer, AutoModel
1095
+
1096
+ input_texts = [
1097
+ "中国的首都是哪里",
1098
+ "你喜欢去哪里旅游",
1099
+ "北京",
1100
+ "今天中午吃什么"
1101
+ ]
1102
+
1103
+ tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-base-zh")
1104
+ model = AutoModel.from_pretrained("thenlper/gte-base-zh")
1105
+
1106
+ # Tokenize the input texts
1107
+ batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')
1108
+
1109
+ outputs = model(**batch_dict)
1110
+ embeddings = outputs.last_hidden_state[:, 0]
1111
+
1112
+ # (Optionally) normalize embeddings
1113
+ embeddings = F.normalize(embeddings, p=2, dim=1)
1114
+ scores = (embeddings[:1] @ embeddings[1:].T) * 100
1115
+ print(scores.tolist())
1116
+ ```
1117
+
1118
+ Use with sentence-transformers:
1119
+ ```python
1120
+ from sentence_transformers import SentenceTransformer
1121
+ from sentence_transformers.util import cos_sim
1122
+
1123
+ sentences = ['中国的首都是哪里', '中国的首都是北京']
1124
+
1125
+ model = SentenceTransformer('thenlper/gte-base-zh')
1126
+ embeddings = model.encode(sentences)
1127
+ print(cos_sim(embeddings[0], embeddings[1]))
1128
+ ```
1129
+
1130
+ ### Limitation
1131
+
1132
+ This model exclusively caters to Chinese texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
1133
+
1134
+ ### Citation
1135
+
1136
+ If you find our paper or models helpful, please consider citing them as follows:
1137
+
1138
+ ```
1139
+ @misc{li2023general,
1140
+ title={Towards General Text Embeddings with Multi-stage Contrastive Learning},
1141
+ author={Zehan Li and Xin Zhang and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Meishan Zhang},
1142
+ year={2023},
1143
+ eprint={2308.03281},
1144
+ archivePrefix={arXiv},
1145
+ primaryClass={cs.CL}
1146
+ }
1147
+ ```