mmlw-e5-base / README.md
sdadas's picture
Update README.md
071652b verified
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
- mteb
model-index:
- name: mmlw-e5-base
results:
- task:
type: Clustering
dataset:
type: PL-MTEB/8tags-clustering
name: MTEB 8TagsClustering
config: default
split: test
revision: None
metrics:
- type: v_measure
value: 30.249113010261492
- task:
type: Classification
dataset:
type: PL-MTEB/allegro-reviews
name: MTEB AllegroReviews
config: default
split: test
revision: None
metrics:
- type: accuracy
value: 36.3817097415507
- type: f1
value: 32.77742158736663
- task:
type: Retrieval
dataset:
type: arguana-pl
name: MTEB ArguAna-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 32.646
- type: map_at_10
value: 49.488
- type: map_at_100
value: 50.190999999999995
- type: map_at_1000
value: 50.194
- type: map_at_3
value: 44.749
- type: map_at_5
value: 47.571999999999996
- type: mrr_at_1
value: 34.211000000000006
- type: mrr_at_10
value: 50.112
- type: mrr_at_100
value: 50.836000000000006
- type: mrr_at_1000
value: 50.839
- type: mrr_at_3
value: 45.614
- type: mrr_at_5
value: 48.242000000000004
- type: ndcg_at_1
value: 32.646
- type: ndcg_at_10
value: 58.396
- type: ndcg_at_100
value: 61.285000000000004
- type: ndcg_at_1000
value: 61.358999999999995
- type: ndcg_at_3
value: 48.759
- type: ndcg_at_5
value: 53.807
- type: precision_at_1
value: 32.646
- type: precision_at_10
value: 8.663
- type: precision_at_100
value: 0.9900000000000001
- type: precision_at_1000
value: 0.1
- type: precision_at_3
value: 20.128
- type: precision_at_5
value: 14.509
- type: recall_at_1
value: 32.646
- type: recall_at_10
value: 86.629
- type: recall_at_100
value: 99.004
- type: recall_at_1000
value: 99.57300000000001
- type: recall_at_3
value: 60.38400000000001
- type: recall_at_5
value: 72.54599999999999
- task:
type: Classification
dataset:
type: PL-MTEB/cbd
name: MTEB CBD
config: default
split: test
revision: None
metrics:
- type: accuracy
value: 65.53999999999999
- type: ap
value: 19.75395945379771
- type: f1
value: 55.00481388401326
- task:
type: PairClassification
dataset:
type: PL-MTEB/cdsce-pairclassification
name: MTEB CDSC-E
config: default
split: test
revision: None
metrics:
- type: cos_sim_accuracy
value: 89.5
- type: cos_sim_ap
value: 77.26879308078568
- type: cos_sim_f1
value: 65.13157894736842
- type: cos_sim_precision
value: 86.8421052631579
- type: cos_sim_recall
value: 52.10526315789473
- type: dot_accuracy
value: 88.0
- type: dot_ap
value: 69.17235659054914
- type: dot_f1
value: 65.71428571428571
- type: dot_precision
value: 71.875
- type: dot_recall
value: 60.526315789473685
- type: euclidean_accuracy
value: 89.5
- type: euclidean_ap
value: 77.1905400565015
- type: euclidean_f1
value: 64.91803278688525
- type: euclidean_precision
value: 86.08695652173914
- type: euclidean_recall
value: 52.10526315789473
- type: manhattan_accuracy
value: 89.5
- type: manhattan_ap
value: 77.19531778873724
- type: manhattan_f1
value: 64.72491909385113
- type: manhattan_precision
value: 84.03361344537815
- type: manhattan_recall
value: 52.63157894736842
- type: max_accuracy
value: 89.5
- type: max_ap
value: 77.26879308078568
- type: max_f1
value: 65.71428571428571
- task:
type: STS
dataset:
type: PL-MTEB/cdscr-sts
name: MTEB CDSC-R
config: default
split: test
revision: None
metrics:
- type: cos_sim_pearson
value: 93.18498922236566
- type: cos_sim_spearman
value: 93.26224500108704
- type: euclidean_pearson
value: 92.25462061070286
- type: euclidean_spearman
value: 93.18623989769242
- type: manhattan_pearson
value: 92.16291103586255
- type: manhattan_spearman
value: 93.14403078934417
- task:
type: Retrieval
dataset:
type: dbpedia-pl
name: MTEB DBPedia-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 8.268
- type: map_at_10
value: 17.391000000000002
- type: map_at_100
value: 24.266
- type: map_at_1000
value: 25.844
- type: map_at_3
value: 12.636
- type: map_at_5
value: 14.701
- type: mrr_at_1
value: 62.74999999999999
- type: mrr_at_10
value: 70.25200000000001
- type: mrr_at_100
value: 70.601
- type: mrr_at_1000
value: 70.613
- type: mrr_at_3
value: 68.083
- type: mrr_at_5
value: 69.37100000000001
- type: ndcg_at_1
value: 51.87500000000001
- type: ndcg_at_10
value: 37.185
- type: ndcg_at_100
value: 41.949
- type: ndcg_at_1000
value: 49.523
- type: ndcg_at_3
value: 41.556
- type: ndcg_at_5
value: 39.278
- type: precision_at_1
value: 63.24999999999999
- type: precision_at_10
value: 29.225
- type: precision_at_100
value: 9.745
- type: precision_at_1000
value: 2.046
- type: precision_at_3
value: 43.833
- type: precision_at_5
value: 37.9
- type: recall_at_1
value: 8.268
- type: recall_at_10
value: 22.542
- type: recall_at_100
value: 48.154
- type: recall_at_1000
value: 72.62100000000001
- type: recall_at_3
value: 13.818
- type: recall_at_5
value: 17.137
- task:
type: Retrieval
dataset:
type: fiqa-pl
name: MTEB FiQA-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 16.489
- type: map_at_10
value: 26.916
- type: map_at_100
value: 28.582
- type: map_at_1000
value: 28.774
- type: map_at_3
value: 23.048
- type: map_at_5
value: 24.977
- type: mrr_at_1
value: 33.642
- type: mrr_at_10
value: 41.987
- type: mrr_at_100
value: 42.882
- type: mrr_at_1000
value: 42.93
- type: mrr_at_3
value: 39.48
- type: mrr_at_5
value: 40.923
- type: ndcg_at_1
value: 33.488
- type: ndcg_at_10
value: 34.528
- type: ndcg_at_100
value: 41.085
- type: ndcg_at_1000
value: 44.474000000000004
- type: ndcg_at_3
value: 30.469
- type: ndcg_at_5
value: 31.618000000000002
- type: precision_at_1
value: 33.488
- type: precision_at_10
value: 9.783999999999999
- type: precision_at_100
value: 1.6389999999999998
- type: precision_at_1000
value: 0.22699999999999998
- type: precision_at_3
value: 20.525
- type: precision_at_5
value: 15.093
- type: recall_at_1
value: 16.489
- type: recall_at_10
value: 42.370000000000005
- type: recall_at_100
value: 67.183
- type: recall_at_1000
value: 87.211
- type: recall_at_3
value: 27.689999999999998
- type: recall_at_5
value: 33.408
- task:
type: Retrieval
dataset:
type: hotpotqa-pl
name: MTEB HotpotQA-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 37.373
- type: map_at_10
value: 57.509
- type: map_at_100
value: 58.451
- type: map_at_1000
value: 58.524
- type: map_at_3
value: 54.064
- type: map_at_5
value: 56.257999999999996
- type: mrr_at_1
value: 74.895
- type: mrr_at_10
value: 81.233
- type: mrr_at_100
value: 81.461
- type: mrr_at_1000
value: 81.47
- type: mrr_at_3
value: 80.124
- type: mrr_at_5
value: 80.862
- type: ndcg_at_1
value: 74.747
- type: ndcg_at_10
value: 66.249
- type: ndcg_at_100
value: 69.513
- type: ndcg_at_1000
value: 70.896
- type: ndcg_at_3
value: 61.312
- type: ndcg_at_5
value: 64.132
- type: precision_at_1
value: 74.747
- type: precision_at_10
value: 13.873
- type: precision_at_100
value: 1.641
- type: precision_at_1000
value: 0.182
- type: precision_at_3
value: 38.987
- type: precision_at_5
value: 25.621
- type: recall_at_1
value: 37.373
- type: recall_at_10
value: 69.365
- type: recall_at_100
value: 82.039
- type: recall_at_1000
value: 91.148
- type: recall_at_3
value: 58.48100000000001
- type: recall_at_5
value: 64.051
- task:
type: Retrieval
dataset:
type: msmarco-pl
name: MTEB MSMARCO-PL
config: default
split: validation
revision: None
metrics:
- type: map_at_1
value: 16.753999999999998
- type: map_at_10
value: 26.764
- type: map_at_100
value: 27.929
- type: map_at_1000
value: 27.994999999999997
- type: map_at_3
value: 23.527
- type: map_at_5
value: 25.343
- type: mrr_at_1
value: 17.192
- type: mrr_at_10
value: 27.141
- type: mrr_at_100
value: 28.269
- type: mrr_at_1000
value: 28.327999999999996
- type: mrr_at_3
value: 23.906
- type: mrr_at_5
value: 25.759999999999998
- type: ndcg_at_1
value: 17.177999999999997
- type: ndcg_at_10
value: 32.539
- type: ndcg_at_100
value: 38.383
- type: ndcg_at_1000
value: 40.132
- type: ndcg_at_3
value: 25.884
- type: ndcg_at_5
value: 29.15
- type: precision_at_1
value: 17.177999999999997
- type: precision_at_10
value: 5.268
- type: precision_at_100
value: 0.823
- type: precision_at_1000
value: 0.097
- type: precision_at_3
value: 11.122
- type: precision_at_5
value: 8.338
- type: recall_at_1
value: 16.753999999999998
- type: recall_at_10
value: 50.388
- type: recall_at_100
value: 77.86999999999999
- type: recall_at_1000
value: 91.55
- type: recall_at_3
value: 32.186
- type: recall_at_5
value: 40.048
- task:
type: Classification
dataset:
type: mteb/amazon_massive_intent
name: MTEB MassiveIntentClassification (pl)
config: pl
split: test
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
metrics:
- type: accuracy
value: 70.9280430396772
- type: f1
value: 68.7099581466286
- task:
type: Classification
dataset:
type: mteb/amazon_massive_scenario
name: MTEB MassiveScenarioClassification (pl)
config: pl
split: test
revision: 7d571f92784cd94a019292a1f45445077d0ef634
metrics:
- type: accuracy
value: 74.76126429051783
- type: f1
value: 74.72274307018111
- task:
type: Retrieval
dataset:
type: nfcorpus-pl
name: MTEB NFCorpus-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 5.348
- type: map_at_10
value: 12.277000000000001
- type: map_at_100
value: 15.804000000000002
- type: map_at_1000
value: 17.277
- type: map_at_3
value: 8.783000000000001
- type: map_at_5
value: 10.314
- type: mrr_at_1
value: 43.963
- type: mrr_at_10
value: 52.459999999999994
- type: mrr_at_100
value: 53.233
- type: mrr_at_1000
value: 53.26499999999999
- type: mrr_at_3
value: 50.464
- type: mrr_at_5
value: 51.548
- type: ndcg_at_1
value: 40.711999999999996
- type: ndcg_at_10
value: 33.709
- type: ndcg_at_100
value: 31.398
- type: ndcg_at_1000
value: 40.042
- type: ndcg_at_3
value: 37.85
- type: ndcg_at_5
value: 36.260999999999996
- type: precision_at_1
value: 43.344
- type: precision_at_10
value: 25.851000000000003
- type: precision_at_100
value: 8.279
- type: precision_at_1000
value: 2.085
- type: precision_at_3
value: 36.326
- type: precision_at_5
value: 32.074000000000005
- type: recall_at_1
value: 5.348
- type: recall_at_10
value: 16.441
- type: recall_at_100
value: 32.975
- type: recall_at_1000
value: 64.357
- type: recall_at_3
value: 9.841999999999999
- type: recall_at_5
value: 12.463000000000001
- task:
type: Retrieval
dataset:
type: nq-pl
name: MTEB NQ-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 24.674
- type: map_at_10
value: 37.672
- type: map_at_100
value: 38.767
- type: map_at_1000
value: 38.82
- type: map_at_3
value: 33.823
- type: map_at_5
value: 36.063
- type: mrr_at_1
value: 27.839000000000002
- type: mrr_at_10
value: 40.129
- type: mrr_at_100
value: 41.008
- type: mrr_at_1000
value: 41.048
- type: mrr_at_3
value: 36.718
- type: mrr_at_5
value: 38.841
- type: ndcg_at_1
value: 27.839000000000002
- type: ndcg_at_10
value: 44.604
- type: ndcg_at_100
value: 49.51
- type: ndcg_at_1000
value: 50.841
- type: ndcg_at_3
value: 37.223
- type: ndcg_at_5
value: 41.073
- type: precision_at_1
value: 27.839000000000002
- type: precision_at_10
value: 7.5
- type: precision_at_100
value: 1.03
- type: precision_at_1000
value: 0.116
- type: precision_at_3
value: 17.005
- type: precision_at_5
value: 12.399000000000001
- type: recall_at_1
value: 24.674
- type: recall_at_10
value: 63.32299999999999
- type: recall_at_100
value: 85.088
- type: recall_at_1000
value: 95.143
- type: recall_at_3
value: 44.157999999999994
- type: recall_at_5
value: 53.054
- task:
type: Classification
dataset:
type: laugustyniak/abusive-clauses-pl
name: MTEB PAC
config: default
split: test
revision: None
metrics:
- type: accuracy
value: 64.5033304373009
- type: ap
value: 75.81507275237081
- type: f1
value: 62.24617820785985
- task:
type: PairClassification
dataset:
type: PL-MTEB/ppc-pairclassification
name: MTEB PPC
config: default
split: test
revision: None
metrics:
- type: cos_sim_accuracy
value: 85.39999999999999
- type: cos_sim_ap
value: 91.75881977787009
- type: cos_sim_f1
value: 87.79264214046823
- type: cos_sim_precision
value: 88.68243243243244
- type: cos_sim_recall
value: 86.9205298013245
- type: dot_accuracy
value: 71.0
- type: dot_ap
value: 82.97829049033108
- type: dot_f1
value: 78.77055039313797
- type: dot_precision
value: 69.30817610062893
- type: dot_recall
value: 91.22516556291392
- type: euclidean_accuracy
value: 85.2
- type: euclidean_ap
value: 91.85245521151309
- type: euclidean_f1
value: 87.64607679465777
- type: euclidean_precision
value: 88.38383838383838
- type: euclidean_recall
value: 86.9205298013245
- type: manhattan_accuracy
value: 85.39999999999999
- type: manhattan_ap
value: 91.85497100160649
- type: manhattan_f1
value: 87.77219430485762
- type: manhattan_precision
value: 88.8135593220339
- type: manhattan_recall
value: 86.75496688741721
- type: max_accuracy
value: 85.39999999999999
- type: max_ap
value: 91.85497100160649
- type: max_f1
value: 87.79264214046823
- task:
type: PairClassification
dataset:
type: PL-MTEB/psc-pairclassification
name: MTEB PSC
config: default
split: test
revision: None
metrics:
- type: cos_sim_accuracy
value: 97.58812615955473
- type: cos_sim_ap
value: 99.14945370088302
- type: cos_sim_f1
value: 96.06060606060606
- type: cos_sim_precision
value: 95.48192771084338
- type: cos_sim_recall
value: 96.64634146341463
- type: dot_accuracy
value: 95.17625231910947
- type: dot_ap
value: 97.05592933601112
- type: dot_f1
value: 92.14501510574019
- type: dot_precision
value: 91.31736526946108
- type: dot_recall
value: 92.98780487804879
- type: euclidean_accuracy
value: 97.6808905380334
- type: euclidean_ap
value: 99.18538119402824
- type: euclidean_f1
value: 96.20637329286798
- type: euclidean_precision
value: 95.77039274924472
- type: euclidean_recall
value: 96.64634146341463
- type: manhattan_accuracy
value: 97.58812615955473
- type: manhattan_ap
value: 99.17870990853292
- type: manhattan_f1
value: 96.02446483180427
- type: manhattan_precision
value: 96.31901840490798
- type: manhattan_recall
value: 95.73170731707317
- type: max_accuracy
value: 97.6808905380334
- type: max_ap
value: 99.18538119402824
- type: max_f1
value: 96.20637329286798
- task:
type: Classification
dataset:
type: PL-MTEB/polemo2_in
name: MTEB PolEmo2.0-IN
config: default
split: test
revision: None
metrics:
- type: accuracy
value: 68.69806094182825
- type: f1
value: 68.0619984307764
- task:
type: Classification
dataset:
type: PL-MTEB/polemo2_out
name: MTEB PolEmo2.0-OUT
config: default
split: test
revision: None
metrics:
- type: accuracy
value: 35.80971659919028
- type: f1
value: 31.13081621324864
- task:
type: Retrieval
dataset:
type: quora-pl
name: MTEB Quora-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 66.149
- type: map_at_10
value: 80.133
- type: map_at_100
value: 80.845
- type: map_at_1000
value: 80.866
- type: map_at_3
value: 76.983
- type: map_at_5
value: 78.938
- type: mrr_at_1
value: 76.09
- type: mrr_at_10
value: 83.25099999999999
- type: mrr_at_100
value: 83.422
- type: mrr_at_1000
value: 83.42500000000001
- type: mrr_at_3
value: 82.02199999999999
- type: mrr_at_5
value: 82.831
- type: ndcg_at_1
value: 76.14999999999999
- type: ndcg_at_10
value: 84.438
- type: ndcg_at_100
value: 86.048
- type: ndcg_at_1000
value: 86.226
- type: ndcg_at_3
value: 80.97999999999999
- type: ndcg_at_5
value: 82.856
- type: precision_at_1
value: 76.14999999999999
- type: precision_at_10
value: 12.985
- type: precision_at_100
value: 1.513
- type: precision_at_1000
value: 0.156
- type: precision_at_3
value: 35.563
- type: precision_at_5
value: 23.586
- type: recall_at_1
value: 66.149
- type: recall_at_10
value: 93.195
- type: recall_at_100
value: 98.924
- type: recall_at_1000
value: 99.885
- type: recall_at_3
value: 83.439
- type: recall_at_5
value: 88.575
- task:
type: Retrieval
dataset:
type: scidocs-pl
name: MTEB SCIDOCS-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 3.688
- type: map_at_10
value: 10.23
- type: map_at_100
value: 12.077
- type: map_at_1000
value: 12.382
- type: map_at_3
value: 7.149
- type: map_at_5
value: 8.689
- type: mrr_at_1
value: 18.2
- type: mrr_at_10
value: 28.816999999999997
- type: mrr_at_100
value: 29.982
- type: mrr_at_1000
value: 30.058
- type: mrr_at_3
value: 25.983
- type: mrr_at_5
value: 27.418
- type: ndcg_at_1
value: 18.2
- type: ndcg_at_10
value: 17.352999999999998
- type: ndcg_at_100
value: 24.859
- type: ndcg_at_1000
value: 30.535
- type: ndcg_at_3
value: 16.17
- type: ndcg_at_5
value: 14.235000000000001
- type: precision_at_1
value: 18.2
- type: precision_at_10
value: 9.19
- type: precision_at_100
value: 2.01
- type: precision_at_1000
value: 0.338
- type: precision_at_3
value: 15.5
- type: precision_at_5
value: 12.78
- type: recall_at_1
value: 3.688
- type: recall_at_10
value: 18.632
- type: recall_at_100
value: 40.822
- type: recall_at_1000
value: 68.552
- type: recall_at_3
value: 9.423
- type: recall_at_5
value: 12.943
- task:
type: PairClassification
dataset:
type: PL-MTEB/sicke-pl-pairclassification
name: MTEB SICK-E-PL
config: default
split: test
revision: None
metrics:
- type: cos_sim_accuracy
value: 83.12270688952303
- type: cos_sim_ap
value: 76.4528312253856
- type: cos_sim_f1
value: 68.69627507163324
- type: cos_sim_precision
value: 69.0922190201729
- type: cos_sim_recall
value: 68.30484330484332
- type: dot_accuracy
value: 79.20913167549939
- type: dot_ap
value: 65.03147071986633
- type: dot_f1
value: 62.812160694896846
- type: dot_precision
value: 50.74561403508772
- type: dot_recall
value: 82.4074074074074
- type: euclidean_accuracy
value: 83.16347329800244
- type: euclidean_ap
value: 76.49405838298205
- type: euclidean_f1
value: 68.66738120757414
- type: euclidean_precision
value: 68.88888888888889
- type: euclidean_recall
value: 68.44729344729345
- type: manhattan_accuracy
value: 83.16347329800244
- type: manhattan_ap
value: 76.5080551733795
- type: manhattan_f1
value: 68.73883529832084
- type: manhattan_precision
value: 68.9605734767025
- type: manhattan_recall
value: 68.51851851851852
- type: max_accuracy
value: 83.16347329800244
- type: max_ap
value: 76.5080551733795
- type: max_f1
value: 68.73883529832084
- task:
type: STS
dataset:
type: PL-MTEB/sickr-pl-sts
name: MTEB SICK-R-PL
config: default
split: test
revision: None
metrics:
- type: cos_sim_pearson
value: 82.60225159739653
- type: cos_sim_spearman
value: 76.76667220288542
- type: euclidean_pearson
value: 80.16302518898615
- type: euclidean_spearman
value: 76.76131897866455
- type: manhattan_pearson
value: 80.11881021613914
- type: manhattan_spearman
value: 76.74246419368048
- task:
type: STS
dataset:
type: mteb/sts22-crosslingual-sts
name: MTEB STS22 (pl)
config: pl
split: test
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
metrics:
- type: cos_sim_pearson
value: 38.2744776092718
- type: cos_sim_spearman
value: 40.35664941442517
- type: euclidean_pearson
value: 29.148502128336585
- type: euclidean_spearman
value: 40.45531563224982
- type: manhattan_pearson
value: 29.124177399433098
- type: manhattan_spearman
value: 40.2801387844354
- task:
type: Retrieval
dataset:
type: scifact-pl
name: MTEB SciFact-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 52.994
- type: map_at_10
value: 63.612
- type: map_at_100
value: 64.294
- type: map_at_1000
value: 64.325
- type: map_at_3
value: 61.341
- type: map_at_5
value: 62.366
- type: mrr_at_1
value: 56.667
- type: mrr_at_10
value: 65.333
- type: mrr_at_100
value: 65.89399999999999
- type: mrr_at_1000
value: 65.91900000000001
- type: mrr_at_3
value: 63.666999999999994
- type: mrr_at_5
value: 64.36699999999999
- type: ndcg_at_1
value: 56.333
- type: ndcg_at_10
value: 68.292
- type: ndcg_at_100
value: 71.136
- type: ndcg_at_1000
value: 71.90100000000001
- type: ndcg_at_3
value: 64.387
- type: ndcg_at_5
value: 65.546
- type: precision_at_1
value: 56.333
- type: precision_at_10
value: 9.133
- type: precision_at_100
value: 1.0630000000000002
- type: precision_at_1000
value: 0.11299999999999999
- type: precision_at_3
value: 25.556
- type: precision_at_5
value: 16.267
- type: recall_at_1
value: 52.994
- type: recall_at_10
value: 81.178
- type: recall_at_100
value: 93.767
- type: recall_at_1000
value: 99.667
- type: recall_at_3
value: 69.906
- type: recall_at_5
value: 73.18299999999999
- task:
type: Retrieval
dataset:
type: trec-covid-pl
name: MTEB TRECCOVID-PL
config: default
split: test
revision: None
metrics:
- type: map_at_1
value: 0.231
- type: map_at_10
value: 1.822
- type: map_at_100
value: 10.134
- type: map_at_1000
value: 24.859
- type: map_at_3
value: 0.615
- type: map_at_5
value: 0.9939999999999999
- type: mrr_at_1
value: 84.0
- type: mrr_at_10
value: 90.4
- type: mrr_at_100
value: 90.4
- type: mrr_at_1000
value: 90.4
- type: mrr_at_3
value: 89.0
- type: mrr_at_5
value: 90.4
- type: ndcg_at_1
value: 81.0
- type: ndcg_at_10
value: 73.333
- type: ndcg_at_100
value: 55.35099999999999
- type: ndcg_at_1000
value: 49.875
- type: ndcg_at_3
value: 76.866
- type: ndcg_at_5
value: 75.472
- type: precision_at_1
value: 86.0
- type: precision_at_10
value: 78.2
- type: precision_at_100
value: 57.18
- type: precision_at_1000
value: 22.332
- type: precision_at_3
value: 82.0
- type: precision_at_5
value: 81.2
- type: recall_at_1
value: 0.231
- type: recall_at_10
value: 2.056
- type: recall_at_100
value: 13.468
- type: recall_at_1000
value: 47.038999999999994
- type: recall_at_3
value: 0.6479999999999999
- type: recall_at_5
value: 1.088
language: pl
license: apache-2.0
widget:
- source_sentence: "query: Jak dożyć 100 lat?"
sentences:
- "passage: Trzeba zdrowo się odżywiać i uprawiać sport."
- "passage: Trzeba pić alkohol, imprezować i jeździć szybkimi autami."
- "passage: Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu."
---
<h1 align="center">MMLW-e5-base</h1>
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish.
This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning.
It transforms texts to 768 dimensional vectors.
The model was initialized with multilingual E5 checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation.
## Usage (Sentence-Transformers)
⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with **"query: "** and passages with **"passage: "** ⚠️
You can use the model like this with [sentence-transformers](https://www.SBERT.net):
```python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
query_prefix = "query: "
answer_prefix = "passage: "
queries = [query_prefix + "Jak dożyć 100 lat?"]
answers = [
answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.",
answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.",
answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu."
]
model = SentenceTransformer("sdadas/mmlw-e5-base")
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False)
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False)
best_answer = cos_sim(queries_emb, answers_emb).argmax().item()
print(answers[best_answer])
# Trzeba zdrowo się odżywiać i uprawiać sport.
```
## Evaluation Results
- The model achieves an **Average Score** of **59.71** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results.
- The model achieves **NDCG@10** of **53.56** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results.
## Acknowledgements
This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative.
## Citation
```bibtex
@article{dadas2024pirb,
title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods},
author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata},
year={2024},
eprint={2402.13350},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```