Spaces:
Running
on
CPU Upgrade
CQADupstackRetrieval evaluation inquiry
CQADupstackRetrieval is divided into 12 datasets - however in the leaderboard there is no reference to which subset had been used for evaluation.
Is it the English subset? Or the average NDCG@10 on all of them?
Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.
Afaik they only have an English subset?
The score is NDCG@10 for all dataset in the Retrieval Tab computed for each individual dataset.
Hello, after getting each NDCG@10 score for all subset such as CQADupstackAndroidRetrieval, CQADupstackEnglishRetrieval..., how to get the final score of CQADupstackRetrieval as reported in the leaderboard?
Is the score of CQADupstackRetrieval reported in the leaderboard is the average of all subsets, or one of a special subset?
I find codes in https://github.com/embeddings-benchmark/mteb/blob/main/scripts/mteb_meta.py as
MTEB(tasks=[ds_name.replace("CQADupstackRetrieval", "CQADupstackAndroidRetrieval")]).tasks[0].description
Does this means that, the score of CQADupstackRetrieval reported in leaderboard is actually the score of CQADupstackAndroidRetrieval?
python scripts/average_cqadupstack.py path/to/your/results/folder
python scripts/average_cqadupstack.py path/to/your/results/folder
Thanks!