Spaces:
Running
Running
Update content.py
Browse filesFixed articles in the description, and reformatted for clarity.
- content.py +10 -10
content.py
CHANGED
@@ -5,18 +5,18 @@ HEADER_MARKDOWN = """
|
|
5 |
# π¨πΏ BenCzechMark
|
6 |
|
7 |
Welcome to the leaderboard!
|
8 |
-
Here you can compare models on tasks in Czech language
|
9 |
|
10 |
-
|
11 |
-
-
|
12 |
-
- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
|
13 |
- __How scoring works__:
|
14 |
-
-
|
15 |
-
- Category scores are
|
16 |
-
- __Overall__ Duel Win
|
17 |
-
- All public submissions are
|
18 |
-
-
|
19 |
-
-
|
|
|
20 |
|
21 |
"""
|
22 |
LEADERBOARD_TAB_TITLE_MARKDOWN = """
|
|
|
5 |
# π¨πΏ BenCzechMark
|
6 |
|
7 |
Welcome to the leaderboard!
|
8 |
+
Here, you can compare models on tasks in the Czech language or submit your own model. We use a modified fork of [lm-evaluation-harness](https://github.com/DCGM/lm-evaluation-harness) to evaluate every model under the same protocol.
|
9 |
|
10 |
+
- Visit the **Submission** page to learn about how to submit your model.
|
11 |
+
- Check out the **About** page for a brief overview of our evaluation protocol, win score mechanism, citation details, and future plans for this benchmark.
|
|
|
12 |
- __How scoring works__:
|
13 |
+
- For each task, the __Duel Win Score__ reflects the proportion of duels a model has won.
|
14 |
+
- Category scores are calculated by averaging scores across all tasks within that category. When viewing a specific category (other than Overall), the "Average" column displays the Category Duel Win Scores.
|
15 |
+
- The __Overall__ Duel Win Score is the average across all category scores. When selecting the Overall category, the "Average" column shows the Overall Duel Win Score.
|
16 |
+
- All public submissions are available in the [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
|
17 |
+
- On the submission page, __you can view your model's results on the leaderboard without publishing them__.
|
18 |
+
- The first step is "pre-submission." After this is complete (significance tests may take up to an hour), you can choose to submit the results if you wish.
|
19 |
+
|
20 |
|
21 |
"""
|
22 |
LEADERBOARD_TAB_TITLE_MARKDOWN = """
|