idolezal commited on
Commit
2c4907f
ยท
2 Parent(s): cfb07ff feb5abd

Merge branch 'main' of hf.co:spaces/CZLC/BenCzechMark

Browse files
Files changed (1) hide show
  1. content.py +2 -0
content.py CHANGED
@@ -10,6 +10,8 @@ Here, you can compare models on tasks in the Czech language or submit your own m
10
  - Visit the **Submission** page to learn about how to submit your model.
11
  - Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
12
  - __How scoring works__:
 
 
13
  - For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
14
  - Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
15
  - The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
 
10
  - Visit the **Submission** page to learn about how to submit your model.
11
  - Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
12
  - __How scoring works__:
13
+ - On each task, we score every model using one of our metrics (Accuracy for multiple choice tasks, Word Perplexity for language modeling, AUROC for classification).
14
+ - On each task for each model pair, we perform a _duel_: a statistical significance test (with a 5% alpha level) to determine if the model's improvement in the metric is significant.
15
  - For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
16
  - Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
17
  - The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.