Files changed (1) hide show
  1. content.py +2 -2
content.py CHANGED
@@ -14,8 +14,8 @@ Both multilingual and language-specific LLMs are welcome in this leaderboard.
14
  We currently evaluate models over four benchmarks:
15
 
16
  - <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
17
- - <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot)
18
- - <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (5-shot)
19
  - <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)
20
 
21
  The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
 
14
  We currently evaluate models over four benchmarks:
15
 
16
  - <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
17
+ - <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (0-shot)
18
+ - <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (25-shot)
19
  - <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)
20
 
21
  The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).