|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- mteb |
|
model-index: |
|
- name: mmlw-e5-base |
|
results: |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: PL-MTEB/8tags-clustering |
|
name: MTEB 8TagsClustering |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 30.249113010261492 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/allegro-reviews |
|
name: MTEB AllegroReviews |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 36.3817097415507 |
|
- type: f1 |
|
value: 32.77742158736663 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: arguana-pl |
|
name: MTEB ArguAna-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 32.646 |
|
- type: map_at_10 |
|
value: 49.488 |
|
- type: map_at_100 |
|
value: 50.190999999999995 |
|
- type: map_at_1000 |
|
value: 50.194 |
|
- type: map_at_3 |
|
value: 44.749 |
|
- type: map_at_5 |
|
value: 47.571999999999996 |
|
- type: mrr_at_1 |
|
value: 34.211000000000006 |
|
- type: mrr_at_10 |
|
value: 50.112 |
|
- type: mrr_at_100 |
|
value: 50.836000000000006 |
|
- type: mrr_at_1000 |
|
value: 50.839 |
|
- type: mrr_at_3 |
|
value: 45.614 |
|
- type: mrr_at_5 |
|
value: 48.242000000000004 |
|
- type: ndcg_at_1 |
|
value: 32.646 |
|
- type: ndcg_at_10 |
|
value: 58.396 |
|
- type: ndcg_at_100 |
|
value: 61.285000000000004 |
|
- type: ndcg_at_1000 |
|
value: 61.358999999999995 |
|
- type: ndcg_at_3 |
|
value: 48.759 |
|
- type: ndcg_at_5 |
|
value: 53.807 |
|
- type: precision_at_1 |
|
value: 32.646 |
|
- type: precision_at_10 |
|
value: 8.663 |
|
- type: precision_at_100 |
|
value: 0.9900000000000001 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 20.128 |
|
- type: precision_at_5 |
|
value: 14.509 |
|
- type: recall_at_1 |
|
value: 32.646 |
|
- type: recall_at_10 |
|
value: 86.629 |
|
- type: recall_at_100 |
|
value: 99.004 |
|
- type: recall_at_1000 |
|
value: 99.57300000000001 |
|
- type: recall_at_3 |
|
value: 60.38400000000001 |
|
- type: recall_at_5 |
|
value: 72.54599999999999 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/cbd |
|
name: MTEB CBD |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 65.53999999999999 |
|
- type: ap |
|
value: 19.75395945379771 |
|
- type: f1 |
|
value: 55.00481388401326 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/cdsce-pairclassification |
|
name: MTEB CDSC-E |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 89.5 |
|
- type: cos_sim_ap |
|
value: 77.26879308078568 |
|
- type: cos_sim_f1 |
|
value: 65.13157894736842 |
|
- type: cos_sim_precision |
|
value: 86.8421052631579 |
|
- type: cos_sim_recall |
|
value: 52.10526315789473 |
|
- type: dot_accuracy |
|
value: 88.0 |
|
- type: dot_ap |
|
value: 69.17235659054914 |
|
- type: dot_f1 |
|
value: 65.71428571428571 |
|
- type: dot_precision |
|
value: 71.875 |
|
- type: dot_recall |
|
value: 60.526315789473685 |
|
- type: euclidean_accuracy |
|
value: 89.5 |
|
- type: euclidean_ap |
|
value: 77.1905400565015 |
|
- type: euclidean_f1 |
|
value: 64.91803278688525 |
|
- type: euclidean_precision |
|
value: 86.08695652173914 |
|
- type: euclidean_recall |
|
value: 52.10526315789473 |
|
- type: manhattan_accuracy |
|
value: 89.5 |
|
- type: manhattan_ap |
|
value: 77.19531778873724 |
|
- type: manhattan_f1 |
|
value: 64.72491909385113 |
|
- type: manhattan_precision |
|
value: 84.03361344537815 |
|
- type: manhattan_recall |
|
value: 52.63157894736842 |
|
- type: max_accuracy |
|
value: 89.5 |
|
- type: max_ap |
|
value: 77.26879308078568 |
|
- type: max_f1 |
|
value: 65.71428571428571 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/cdscr-sts |
|
name: MTEB CDSC-R |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 93.18498922236566 |
|
- type: cos_sim_spearman |
|
value: 93.26224500108704 |
|
- type: euclidean_pearson |
|
value: 92.25462061070286 |
|
- type: euclidean_spearman |
|
value: 93.18623989769242 |
|
- type: manhattan_pearson |
|
value: 92.16291103586255 |
|
- type: manhattan_spearman |
|
value: 93.14403078934417 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: dbpedia-pl |
|
name: MTEB DBPedia-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 8.268 |
|
- type: map_at_10 |
|
value: 17.391000000000002 |
|
- type: map_at_100 |
|
value: 24.266 |
|
- type: map_at_1000 |
|
value: 25.844 |
|
- type: map_at_3 |
|
value: 12.636 |
|
- type: map_at_5 |
|
value: 14.701 |
|
- type: mrr_at_1 |
|
value: 62.74999999999999 |
|
- type: mrr_at_10 |
|
value: 70.25200000000001 |
|
- type: mrr_at_100 |
|
value: 70.601 |
|
- type: mrr_at_1000 |
|
value: 70.613 |
|
- type: mrr_at_3 |
|
value: 68.083 |
|
- type: mrr_at_5 |
|
value: 69.37100000000001 |
|
- type: ndcg_at_1 |
|
value: 51.87500000000001 |
|
- type: ndcg_at_10 |
|
value: 37.185 |
|
- type: ndcg_at_100 |
|
value: 41.949 |
|
- type: ndcg_at_1000 |
|
value: 49.523 |
|
- type: ndcg_at_3 |
|
value: 41.556 |
|
- type: ndcg_at_5 |
|
value: 39.278 |
|
- type: precision_at_1 |
|
value: 63.24999999999999 |
|
- type: precision_at_10 |
|
value: 29.225 |
|
- type: precision_at_100 |
|
value: 9.745 |
|
- type: precision_at_1000 |
|
value: 2.046 |
|
- type: precision_at_3 |
|
value: 43.833 |
|
- type: precision_at_5 |
|
value: 37.9 |
|
- type: recall_at_1 |
|
value: 8.268 |
|
- type: recall_at_10 |
|
value: 22.542 |
|
- type: recall_at_100 |
|
value: 48.154 |
|
- type: recall_at_1000 |
|
value: 72.62100000000001 |
|
- type: recall_at_3 |
|
value: 13.818 |
|
- type: recall_at_5 |
|
value: 17.137 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: fiqa-pl |
|
name: MTEB FiQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 16.489 |
|
- type: map_at_10 |
|
value: 26.916 |
|
- type: map_at_100 |
|
value: 28.582 |
|
- type: map_at_1000 |
|
value: 28.774 |
|
- type: map_at_3 |
|
value: 23.048 |
|
- type: map_at_5 |
|
value: 24.977 |
|
- type: mrr_at_1 |
|
value: 33.642 |
|
- type: mrr_at_10 |
|
value: 41.987 |
|
- type: mrr_at_100 |
|
value: 42.882 |
|
- type: mrr_at_1000 |
|
value: 42.93 |
|
- type: mrr_at_3 |
|
value: 39.48 |
|
- type: mrr_at_5 |
|
value: 40.923 |
|
- type: ndcg_at_1 |
|
value: 33.488 |
|
- type: ndcg_at_10 |
|
value: 34.528 |
|
- type: ndcg_at_100 |
|
value: 41.085 |
|
- type: ndcg_at_1000 |
|
value: 44.474000000000004 |
|
- type: ndcg_at_3 |
|
value: 30.469 |
|
- type: ndcg_at_5 |
|
value: 31.618000000000002 |
|
- type: precision_at_1 |
|
value: 33.488 |
|
- type: precision_at_10 |
|
value: 9.783999999999999 |
|
- type: precision_at_100 |
|
value: 1.6389999999999998 |
|
- type: precision_at_1000 |
|
value: 0.22699999999999998 |
|
- type: precision_at_3 |
|
value: 20.525 |
|
- type: precision_at_5 |
|
value: 15.093 |
|
- type: recall_at_1 |
|
value: 16.489 |
|
- type: recall_at_10 |
|
value: 42.370000000000005 |
|
- type: recall_at_100 |
|
value: 67.183 |
|
- type: recall_at_1000 |
|
value: 87.211 |
|
- type: recall_at_3 |
|
value: 27.689999999999998 |
|
- type: recall_at_5 |
|
value: 33.408 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: hotpotqa-pl |
|
name: MTEB HotpotQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 37.373 |
|
- type: map_at_10 |
|
value: 57.509 |
|
- type: map_at_100 |
|
value: 58.451 |
|
- type: map_at_1000 |
|
value: 58.524 |
|
- type: map_at_3 |
|
value: 54.064 |
|
- type: map_at_5 |
|
value: 56.257999999999996 |
|
- type: mrr_at_1 |
|
value: 74.895 |
|
- type: mrr_at_10 |
|
value: 81.233 |
|
- type: mrr_at_100 |
|
value: 81.461 |
|
- type: mrr_at_1000 |
|
value: 81.47 |
|
- type: mrr_at_3 |
|
value: 80.124 |
|
- type: mrr_at_5 |
|
value: 80.862 |
|
- type: ndcg_at_1 |
|
value: 74.747 |
|
- type: ndcg_at_10 |
|
value: 66.249 |
|
- type: ndcg_at_100 |
|
value: 69.513 |
|
- type: ndcg_at_1000 |
|
value: 70.896 |
|
- type: ndcg_at_3 |
|
value: 61.312 |
|
- type: ndcg_at_5 |
|
value: 64.132 |
|
- type: precision_at_1 |
|
value: 74.747 |
|
- type: precision_at_10 |
|
value: 13.873 |
|
- type: precision_at_100 |
|
value: 1.641 |
|
- type: precision_at_1000 |
|
value: 0.182 |
|
- type: precision_at_3 |
|
value: 38.987 |
|
- type: precision_at_5 |
|
value: 25.621 |
|
- type: recall_at_1 |
|
value: 37.373 |
|
- type: recall_at_10 |
|
value: 69.365 |
|
- type: recall_at_100 |
|
value: 82.039 |
|
- type: recall_at_1000 |
|
value: 91.148 |
|
- type: recall_at_3 |
|
value: 58.48100000000001 |
|
- type: recall_at_5 |
|
value: 64.051 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: msmarco-pl |
|
name: MTEB MSMARCO-PL |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 16.753999999999998 |
|
- type: map_at_10 |
|
value: 26.764 |
|
- type: map_at_100 |
|
value: 27.929 |
|
- type: map_at_1000 |
|
value: 27.994999999999997 |
|
- type: map_at_3 |
|
value: 23.527 |
|
- type: map_at_5 |
|
value: 25.343 |
|
- type: mrr_at_1 |
|
value: 17.192 |
|
- type: mrr_at_10 |
|
value: 27.141 |
|
- type: mrr_at_100 |
|
value: 28.269 |
|
- type: mrr_at_1000 |
|
value: 28.327999999999996 |
|
- type: mrr_at_3 |
|
value: 23.906 |
|
- type: mrr_at_5 |
|
value: 25.759999999999998 |
|
- type: ndcg_at_1 |
|
value: 17.177999999999997 |
|
- type: ndcg_at_10 |
|
value: 32.539 |
|
- type: ndcg_at_100 |
|
value: 38.383 |
|
- type: ndcg_at_1000 |
|
value: 40.132 |
|
- type: ndcg_at_3 |
|
value: 25.884 |
|
- type: ndcg_at_5 |
|
value: 29.15 |
|
- type: precision_at_1 |
|
value: 17.177999999999997 |
|
- type: precision_at_10 |
|
value: 5.268 |
|
- type: precision_at_100 |
|
value: 0.823 |
|
- type: precision_at_1000 |
|
value: 0.097 |
|
- type: precision_at_3 |
|
value: 11.122 |
|
- type: precision_at_5 |
|
value: 8.338 |
|
- type: recall_at_1 |
|
value: 16.753999999999998 |
|
- type: recall_at_10 |
|
value: 50.388 |
|
- type: recall_at_100 |
|
value: 77.86999999999999 |
|
- type: recall_at_1000 |
|
value: 91.55 |
|
- type: recall_at_3 |
|
value: 32.186 |
|
- type: recall_at_5 |
|
value: 40.048 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 70.9280430396772 |
|
- type: f1 |
|
value: 68.7099581466286 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 74.76126429051783 |
|
- type: f1 |
|
value: 74.72274307018111 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nfcorpus-pl |
|
name: MTEB NFCorpus-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 5.348 |
|
- type: map_at_10 |
|
value: 12.277000000000001 |
|
- type: map_at_100 |
|
value: 15.804000000000002 |
|
- type: map_at_1000 |
|
value: 17.277 |
|
- type: map_at_3 |
|
value: 8.783000000000001 |
|
- type: map_at_5 |
|
value: 10.314 |
|
- type: mrr_at_1 |
|
value: 43.963 |
|
- type: mrr_at_10 |
|
value: 52.459999999999994 |
|
- type: mrr_at_100 |
|
value: 53.233 |
|
- type: mrr_at_1000 |
|
value: 53.26499999999999 |
|
- type: mrr_at_3 |
|
value: 50.464 |
|
- type: mrr_at_5 |
|
value: 51.548 |
|
- type: ndcg_at_1 |
|
value: 40.711999999999996 |
|
- type: ndcg_at_10 |
|
value: 33.709 |
|
- type: ndcg_at_100 |
|
value: 31.398 |
|
- type: ndcg_at_1000 |
|
value: 40.042 |
|
- type: ndcg_at_3 |
|
value: 37.85 |
|
- type: ndcg_at_5 |
|
value: 36.260999999999996 |
|
- type: precision_at_1 |
|
value: 43.344 |
|
- type: precision_at_10 |
|
value: 25.851000000000003 |
|
- type: precision_at_100 |
|
value: 8.279 |
|
- type: precision_at_1000 |
|
value: 2.085 |
|
- type: precision_at_3 |
|
value: 36.326 |
|
- type: precision_at_5 |
|
value: 32.074000000000005 |
|
- type: recall_at_1 |
|
value: 5.348 |
|
- type: recall_at_10 |
|
value: 16.441 |
|
- type: recall_at_100 |
|
value: 32.975 |
|
- type: recall_at_1000 |
|
value: 64.357 |
|
- type: recall_at_3 |
|
value: 9.841999999999999 |
|
- type: recall_at_5 |
|
value: 12.463000000000001 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nq-pl |
|
name: MTEB NQ-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 24.674 |
|
- type: map_at_10 |
|
value: 37.672 |
|
- type: map_at_100 |
|
value: 38.767 |
|
- type: map_at_1000 |
|
value: 38.82 |
|
- type: map_at_3 |
|
value: 33.823 |
|
- type: map_at_5 |
|
value: 36.063 |
|
- type: mrr_at_1 |
|
value: 27.839000000000002 |
|
- type: mrr_at_10 |
|
value: 40.129 |
|
- type: mrr_at_100 |
|
value: 41.008 |
|
- type: mrr_at_1000 |
|
value: 41.048 |
|
- type: mrr_at_3 |
|
value: 36.718 |
|
- type: mrr_at_5 |
|
value: 38.841 |
|
- type: ndcg_at_1 |
|
value: 27.839000000000002 |
|
- type: ndcg_at_10 |
|
value: 44.604 |
|
- type: ndcg_at_100 |
|
value: 49.51 |
|
- type: ndcg_at_1000 |
|
value: 50.841 |
|
- type: ndcg_at_3 |
|
value: 37.223 |
|
- type: ndcg_at_5 |
|
value: 41.073 |
|
- type: precision_at_1 |
|
value: 27.839000000000002 |
|
- type: precision_at_10 |
|
value: 7.5 |
|
- type: precision_at_100 |
|
value: 1.03 |
|
- type: precision_at_1000 |
|
value: 0.116 |
|
- type: precision_at_3 |
|
value: 17.005 |
|
- type: precision_at_5 |
|
value: 12.399000000000001 |
|
- type: recall_at_1 |
|
value: 24.674 |
|
- type: recall_at_10 |
|
value: 63.32299999999999 |
|
- type: recall_at_100 |
|
value: 85.088 |
|
- type: recall_at_1000 |
|
value: 95.143 |
|
- type: recall_at_3 |
|
value: 44.157999999999994 |
|
- type: recall_at_5 |
|
value: 53.054 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: laugustyniak/abusive-clauses-pl |
|
name: MTEB PAC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 64.5033304373009 |
|
- type: ap |
|
value: 75.81507275237081 |
|
- type: f1 |
|
value: 62.24617820785985 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/ppc-pairclassification |
|
name: MTEB PPC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 85.39999999999999 |
|
- type: cos_sim_ap |
|
value: 91.75881977787009 |
|
- type: cos_sim_f1 |
|
value: 87.79264214046823 |
|
- type: cos_sim_precision |
|
value: 88.68243243243244 |
|
- type: cos_sim_recall |
|
value: 86.9205298013245 |
|
- type: dot_accuracy |
|
value: 71.0 |
|
- type: dot_ap |
|
value: 82.97829049033108 |
|
- type: dot_f1 |
|
value: 78.77055039313797 |
|
- type: dot_precision |
|
value: 69.30817610062893 |
|
- type: dot_recall |
|
value: 91.22516556291392 |
|
- type: euclidean_accuracy |
|
value: 85.2 |
|
- type: euclidean_ap |
|
value: 91.85245521151309 |
|
- type: euclidean_f1 |
|
value: 87.64607679465777 |
|
- type: euclidean_precision |
|
value: 88.38383838383838 |
|
- type: euclidean_recall |
|
value: 86.9205298013245 |
|
- type: manhattan_accuracy |
|
value: 85.39999999999999 |
|
- type: manhattan_ap |
|
value: 91.85497100160649 |
|
- type: manhattan_f1 |
|
value: 87.77219430485762 |
|
- type: manhattan_precision |
|
value: 88.8135593220339 |
|
- type: manhattan_recall |
|
value: 86.75496688741721 |
|
- type: max_accuracy |
|
value: 85.39999999999999 |
|
- type: max_ap |
|
value: 91.85497100160649 |
|
- type: max_f1 |
|
value: 87.79264214046823 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/psc-pairclassification |
|
name: MTEB PSC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 97.58812615955473 |
|
- type: cos_sim_ap |
|
value: 99.14945370088302 |
|
- type: cos_sim_f1 |
|
value: 96.06060606060606 |
|
- type: cos_sim_precision |
|
value: 95.48192771084338 |
|
- type: cos_sim_recall |
|
value: 96.64634146341463 |
|
- type: dot_accuracy |
|
value: 95.17625231910947 |
|
- type: dot_ap |
|
value: 97.05592933601112 |
|
- type: dot_f1 |
|
value: 92.14501510574019 |
|
- type: dot_precision |
|
value: 91.31736526946108 |
|
- type: dot_recall |
|
value: 92.98780487804879 |
|
- type: euclidean_accuracy |
|
value: 97.6808905380334 |
|
- type: euclidean_ap |
|
value: 99.18538119402824 |
|
- type: euclidean_f1 |
|
value: 96.20637329286798 |
|
- type: euclidean_precision |
|
value: 95.77039274924472 |
|
- type: euclidean_recall |
|
value: 96.64634146341463 |
|
- type: manhattan_accuracy |
|
value: 97.58812615955473 |
|
- type: manhattan_ap |
|
value: 99.17870990853292 |
|
- type: manhattan_f1 |
|
value: 96.02446483180427 |
|
- type: manhattan_precision |
|
value: 96.31901840490798 |
|
- type: manhattan_recall |
|
value: 95.73170731707317 |
|
- type: max_accuracy |
|
value: 97.6808905380334 |
|
- type: max_ap |
|
value: 99.18538119402824 |
|
- type: max_f1 |
|
value: 96.20637329286798 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_in |
|
name: MTEB PolEmo2.0-IN |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 68.69806094182825 |
|
- type: f1 |
|
value: 68.0619984307764 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_out |
|
name: MTEB PolEmo2.0-OUT |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 35.80971659919028 |
|
- type: f1 |
|
value: 31.13081621324864 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: quora-pl |
|
name: MTEB Quora-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 66.149 |
|
- type: map_at_10 |
|
value: 80.133 |
|
- type: map_at_100 |
|
value: 80.845 |
|
- type: map_at_1000 |
|
value: 80.866 |
|
- type: map_at_3 |
|
value: 76.983 |
|
- type: map_at_5 |
|
value: 78.938 |
|
- type: mrr_at_1 |
|
value: 76.09 |
|
- type: mrr_at_10 |
|
value: 83.25099999999999 |
|
- type: mrr_at_100 |
|
value: 83.422 |
|
- type: mrr_at_1000 |
|
value: 83.42500000000001 |
|
- type: mrr_at_3 |
|
value: 82.02199999999999 |
|
- type: mrr_at_5 |
|
value: 82.831 |
|
- type: ndcg_at_1 |
|
value: 76.14999999999999 |
|
- type: ndcg_at_10 |
|
value: 84.438 |
|
- type: ndcg_at_100 |
|
value: 86.048 |
|
- type: ndcg_at_1000 |
|
value: 86.226 |
|
- type: ndcg_at_3 |
|
value: 80.97999999999999 |
|
- type: ndcg_at_5 |
|
value: 82.856 |
|
- type: precision_at_1 |
|
value: 76.14999999999999 |
|
- type: precision_at_10 |
|
value: 12.985 |
|
- type: precision_at_100 |
|
value: 1.513 |
|
- type: precision_at_1000 |
|
value: 0.156 |
|
- type: precision_at_3 |
|
value: 35.563 |
|
- type: precision_at_5 |
|
value: 23.586 |
|
- type: recall_at_1 |
|
value: 66.149 |
|
- type: recall_at_10 |
|
value: 93.195 |
|
- type: recall_at_100 |
|
value: 98.924 |
|
- type: recall_at_1000 |
|
value: 99.885 |
|
- type: recall_at_3 |
|
value: 83.439 |
|
- type: recall_at_5 |
|
value: 88.575 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scidocs-pl |
|
name: MTEB SCIDOCS-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 3.688 |
|
- type: map_at_10 |
|
value: 10.23 |
|
- type: map_at_100 |
|
value: 12.077 |
|
- type: map_at_1000 |
|
value: 12.382 |
|
- type: map_at_3 |
|
value: 7.149 |
|
- type: map_at_5 |
|
value: 8.689 |
|
- type: mrr_at_1 |
|
value: 18.2 |
|
- type: mrr_at_10 |
|
value: 28.816999999999997 |
|
- type: mrr_at_100 |
|
value: 29.982 |
|
- type: mrr_at_1000 |
|
value: 30.058 |
|
- type: mrr_at_3 |
|
value: 25.983 |
|
- type: mrr_at_5 |
|
value: 27.418 |
|
- type: ndcg_at_1 |
|
value: 18.2 |
|
- type: ndcg_at_10 |
|
value: 17.352999999999998 |
|
- type: ndcg_at_100 |
|
value: 24.859 |
|
- type: ndcg_at_1000 |
|
value: 30.535 |
|
- type: ndcg_at_3 |
|
value: 16.17 |
|
- type: ndcg_at_5 |
|
value: 14.235000000000001 |
|
- type: precision_at_1 |
|
value: 18.2 |
|
- type: precision_at_10 |
|
value: 9.19 |
|
- type: precision_at_100 |
|
value: 2.01 |
|
- type: precision_at_1000 |
|
value: 0.338 |
|
- type: precision_at_3 |
|
value: 15.5 |
|
- type: precision_at_5 |
|
value: 12.78 |
|
- type: recall_at_1 |
|
value: 3.688 |
|
- type: recall_at_10 |
|
value: 18.632 |
|
- type: recall_at_100 |
|
value: 40.822 |
|
- type: recall_at_1000 |
|
value: 68.552 |
|
- type: recall_at_3 |
|
value: 9.423 |
|
- type: recall_at_5 |
|
value: 12.943 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/sicke-pl-pairclassification |
|
name: MTEB SICK-E-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 83.12270688952303 |
|
- type: cos_sim_ap |
|
value: 76.4528312253856 |
|
- type: cos_sim_f1 |
|
value: 68.69627507163324 |
|
- type: cos_sim_precision |
|
value: 69.0922190201729 |
|
- type: cos_sim_recall |
|
value: 68.30484330484332 |
|
- type: dot_accuracy |
|
value: 79.20913167549939 |
|
- type: dot_ap |
|
value: 65.03147071986633 |
|
- type: dot_f1 |
|
value: 62.812160694896846 |
|
- type: dot_precision |
|
value: 50.74561403508772 |
|
- type: dot_recall |
|
value: 82.4074074074074 |
|
- type: euclidean_accuracy |
|
value: 83.16347329800244 |
|
- type: euclidean_ap |
|
value: 76.49405838298205 |
|
- type: euclidean_f1 |
|
value: 68.66738120757414 |
|
- type: euclidean_precision |
|
value: 68.88888888888889 |
|
- type: euclidean_recall |
|
value: 68.44729344729345 |
|
- type: manhattan_accuracy |
|
value: 83.16347329800244 |
|
- type: manhattan_ap |
|
value: 76.5080551733795 |
|
- type: manhattan_f1 |
|
value: 68.73883529832084 |
|
- type: manhattan_precision |
|
value: 68.9605734767025 |
|
- type: manhattan_recall |
|
value: 68.51851851851852 |
|
- type: max_accuracy |
|
value: 83.16347329800244 |
|
- type: max_ap |
|
value: 76.5080551733795 |
|
- type: max_f1 |
|
value: 68.73883529832084 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/sickr-pl-sts |
|
name: MTEB SICK-R-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 82.60225159739653 |
|
- type: cos_sim_spearman |
|
value: 76.76667220288542 |
|
- type: euclidean_pearson |
|
value: 80.16302518898615 |
|
- type: euclidean_spearman |
|
value: 76.76131897866455 |
|
- type: manhattan_pearson |
|
value: 80.11881021613914 |
|
- type: manhattan_spearman |
|
value: 76.74246419368048 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (pl) |
|
config: pl |
|
split: test |
|
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 38.2744776092718 |
|
- type: cos_sim_spearman |
|
value: 40.35664941442517 |
|
- type: euclidean_pearson |
|
value: 29.148502128336585 |
|
- type: euclidean_spearman |
|
value: 40.45531563224982 |
|
- type: manhattan_pearson |
|
value: 29.124177399433098 |
|
- type: manhattan_spearman |
|
value: 40.2801387844354 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scifact-pl |
|
name: MTEB SciFact-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 52.994 |
|
- type: map_at_10 |
|
value: 63.612 |
|
- type: map_at_100 |
|
value: 64.294 |
|
- type: map_at_1000 |
|
value: 64.325 |
|
- type: map_at_3 |
|
value: 61.341 |
|
- type: map_at_5 |
|
value: 62.366 |
|
- type: mrr_at_1 |
|
value: 56.667 |
|
- type: mrr_at_10 |
|
value: 65.333 |
|
- type: mrr_at_100 |
|
value: 65.89399999999999 |
|
- type: mrr_at_1000 |
|
value: 65.91900000000001 |
|
- type: mrr_at_3 |
|
value: 63.666999999999994 |
|
- type: mrr_at_5 |
|
value: 64.36699999999999 |
|
- type: ndcg_at_1 |
|
value: 56.333 |
|
- type: ndcg_at_10 |
|
value: 68.292 |
|
- type: ndcg_at_100 |
|
value: 71.136 |
|
- type: ndcg_at_1000 |
|
value: 71.90100000000001 |
|
- type: ndcg_at_3 |
|
value: 64.387 |
|
- type: ndcg_at_5 |
|
value: 65.546 |
|
- type: precision_at_1 |
|
value: 56.333 |
|
- type: precision_at_10 |
|
value: 9.133 |
|
- type: precision_at_100 |
|
value: 1.0630000000000002 |
|
- type: precision_at_1000 |
|
value: 0.11299999999999999 |
|
- type: precision_at_3 |
|
value: 25.556 |
|
- type: precision_at_5 |
|
value: 16.267 |
|
- type: recall_at_1 |
|
value: 52.994 |
|
- type: recall_at_10 |
|
value: 81.178 |
|
- type: recall_at_100 |
|
value: 93.767 |
|
- type: recall_at_1000 |
|
value: 99.667 |
|
- type: recall_at_3 |
|
value: 69.906 |
|
- type: recall_at_5 |
|
value: 73.18299999999999 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: trec-covid-pl |
|
name: MTEB TRECCOVID-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.231 |
|
- type: map_at_10 |
|
value: 1.822 |
|
- type: map_at_100 |
|
value: 10.134 |
|
- type: map_at_1000 |
|
value: 24.859 |
|
- type: map_at_3 |
|
value: 0.615 |
|
- type: map_at_5 |
|
value: 0.9939999999999999 |
|
- type: mrr_at_1 |
|
value: 84.0 |
|
- type: mrr_at_10 |
|
value: 90.4 |
|
- type: mrr_at_100 |
|
value: 90.4 |
|
- type: mrr_at_1000 |
|
value: 90.4 |
|
- type: mrr_at_3 |
|
value: 89.0 |
|
- type: mrr_at_5 |
|
value: 90.4 |
|
- type: ndcg_at_1 |
|
value: 81.0 |
|
- type: ndcg_at_10 |
|
value: 73.333 |
|
- type: ndcg_at_100 |
|
value: 55.35099999999999 |
|
- type: ndcg_at_1000 |
|
value: 49.875 |
|
- type: ndcg_at_3 |
|
value: 76.866 |
|
- type: ndcg_at_5 |
|
value: 75.472 |
|
- type: precision_at_1 |
|
value: 86.0 |
|
- type: precision_at_10 |
|
value: 78.2 |
|
- type: precision_at_100 |
|
value: 57.18 |
|
- type: precision_at_1000 |
|
value: 22.332 |
|
- type: precision_at_3 |
|
value: 82.0 |
|
- type: precision_at_5 |
|
value: 81.2 |
|
- type: recall_at_1 |
|
value: 0.231 |
|
- type: recall_at_10 |
|
value: 2.056 |
|
- type: recall_at_100 |
|
value: 13.468 |
|
- type: recall_at_1000 |
|
value: 47.038999999999994 |
|
- type: recall_at_3 |
|
value: 0.6479999999999999 |
|
- type: recall_at_5 |
|
value: 1.088 |
|
language: pl |
|
license: apache-2.0 |
|
widget: |
|
- source_sentence: "query: Jak dożyć 100 lat?" |
|
sentences: |
|
- "passage: Trzeba zdrowo się odżywiać i uprawiać sport." |
|
- "passage: Trzeba pić alkohol, imprezować i jeździć szybkimi autami." |
|
- "passage: Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
|
|
--- |
|
|
|
<h1 align="center">MMLW-e5-base</h1> |
|
|
|
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. |
|
This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. |
|
It transforms texts to 768 dimensional vectors. |
|
The model was initialized with multilingual E5 checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation. |
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, queries should be prefixed with **"query: "** and passages with **"passage: "** ⚠️ |
|
|
|
You can use the model like this with [sentence-transformers](https://www.SBERT.net): |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import cos_sim |
|
|
|
query_prefix = "query: " |
|
answer_prefix = "passage: " |
|
queries = [query_prefix + "Jak dożyć 100 lat?"] |
|
answers = [ |
|
answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.", |
|
answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.", |
|
answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
] |
|
model = SentenceTransformer("sdadas/mmlw-e5-base") |
|
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False) |
|
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False) |
|
|
|
best_answer = cos_sim(queries_emb, answers_emb).argmax().item() |
|
print(answers[best_answer]) |
|
# Trzeba zdrowo się odżywiać i uprawiać sport. |
|
``` |
|
|
|
## Evaluation Results |
|
|
|
- The model achieves an **Average Score** of **59.71** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results. |
|
- The model achieves **NDCG@10** of **53.56** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results. |
|
|
|
## Acknowledgements |
|
This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{dadas2024pirb, |
|
title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods}, |
|
author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata}, |
|
year={2024}, |
|
eprint={2402.13350}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |