Spaces:

CZLC
/

BenCzechMark

Running

mfajcik commited on Oct 1, 2024

Commit

889e149

verified ·

1 Parent(s): dab68b3

Update content.py

Fixed articles in the description, and reformatted for clarity.

Files changed (1) hide show

content.py CHANGED Viewed

@@ -5,18 +5,18 @@ HEADER_MARKDOWN = """
 # 🇨🇿 BenCzechMark
 Welcome to the leaderboard!
-Here you can compare models on tasks in Czech language and/or submit your own model. We use our modified fork of [lm-evaluation-harness](https://github.com/DCGM/lm-evaluation-harness) to evaluate every model under same protocol.
-- Head to **Submission** page to learn about submission details.
-- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
 - __How scoring works__:
-  - On each task, the __Duel Win Score__ reports proportion of won duels.
-  - Category scores are obtained by averaging across category tasks. When selecting a category (other then Overall), the "Average" column shows Category Duel Win Scores.
-  - __Overall__ Duel Win Scores are an average over category scores. When selecting Overall category, the "Average" column shows Overall Duel Win Score.
-- All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
-- In submission page, __you can obtain results on leaderboard without publishing them__.
-    - First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
 """
 LEADERBOARD_TAB_TITLE_MARKDOWN = """

 # 🇨🇿 BenCzechMark
 Welcome to the leaderboard!
+Here, you can compare models on tasks in the Czech language or submit your own model. We use a modified fork of [lm-evaluation-harness](https://github.com/DCGM/lm-evaluation-harness) to evaluate every model under the same protocol.
+- Visit the **Submission** page to learn about how to submit your model.
+- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
 - __How scoring works__:
+  - For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
+  - Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
+  - The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
+- All public submissions are available in the [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
+- On the submission page, __you can view your model's results on the leaderboard without publishing them__.
+    - The first step is "pre-submission." After this is complete (significance tests may take up to an hour), you can choose to submit the results if you wish.
 """
 LEADERBOARD_TAB_TITLE_MARKDOWN = """