Spaces:

uonlp
/

open_multilingual_llm_leaderboard

Running

Update content.py

by vietlai-kensho - opened Feb 3

←

Files changed (1) hide show

content.py CHANGED Viewed

@@ -14,8 +14,8 @@ Both multilingual and language-specific LLMs are welcome in this leaderboard.
 We currently evaluate models over four benchmarks:
 - <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot)
-- <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> (10-shot)
-- <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  (5-shot)
 - <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)
 The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).

 We currently evaluate models over four benchmarks:
 - <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot)
+- <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> (0-shot)
+- <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  (25-shot)
 - <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)
 The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).