Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update src/about.py
Browse files- src/about.py +1 -0
src/about.py
CHANGED
@@ -93,6 +93,7 @@ And here find all the translated benchmarks provided by the Language evaluation
|
|
93 |
|
94 |
|
95 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
|
|
96 |
Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
|
97 |
|
98 |
|
|
|
93 |
|
94 |
|
95 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
96 |
+
|
97 |
Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
|
98 |
|
99 |
|