Spaces:

sparse-generative-ai
/

open-moe-llm-leaderboard

Running

chivier commited on May 16, 2024

Commit

271706f

1 Parent(s): e6c97c0

sync from github

Files changed (3) hide show

src/backend/envs.py CHANGED Viewed

@@ -58,7 +58,7 @@ class Tasks(Enum):
     # task20 = Task("race", "acc", "RACE", 0)
     task21 = Task("mmlu", "acc", "MMLU", 5)
     task22 = Task("gsm8k_custom", "em", "GSM8K", 5)
-    task23 = Task("gsm8k_cot", "em", "GSM8K", 8)
 EVAL_REQUESTS_PATH_BACKEND = os.path.join(CACHE_PATH, "eval-queue-bk")

     # task20 = Task("race", "acc", "RACE", 0)
     task21 = Task("mmlu", "acc", "MMLU", 5)
     task22 = Task("gsm8k_custom", "em", "GSM8K", 5)
+    # task23 = Task("gsm8k_cot", "em", "GSM8K", 8)
 EVAL_REQUESTS_PATH_BACKEND = os.path.join(CACHE_PATH, "eval-queue-bk")

src/display/about.py CHANGED Viewed

@@ -12,12 +12,15 @@ The OPEN-MOE-LLM-LEADERBOARD includes generation and multiple choice tasks to me
 Tasks:
 - **Generation Self-consistancy** -- [SelfCheckGPT](https://github.com/potsawee/selfcheckgpt)
 - **Multiple Choice Performance** -- [MMLU](https://arxiv.org/abs/2009.03300)
 Columns and Metrics:
 - Method: The MOE LLMs inference framework.
 - E2E(s): Average End to End generation time in seconds.
 - PRE(s): Prefilling Time of input prompt in seconds.
 - T/s: Tokens throughout per second.
 - Precision: The precison of used model.
 """

 Tasks:
 - **Generation Self-consistancy** -- [SelfCheckGPT](https://github.com/potsawee/selfcheckgpt)
 - **Multiple Choice Performance** -- [MMLU](https://arxiv.org/abs/2009.03300)
+- **Mathematics Problem-Solving Performance** -- [GSM8K](https://arxiv.org/abs/2110.14168)
 Columns and Metrics:
 - Method: The MOE LLMs inference framework.
 - E2E(s): Average End to End generation time in seconds.
 - PRE(s): Prefilling Time of input prompt in seconds.
 - T/s: Tokens throughout per second.
+- MBU(%): Model Bandwidth Utilization.
+- MFU(%): Model FLOPs Utilization.
 - Precision: The precison of used model.
 """

src/display/utils.py CHANGED Viewed

@@ -82,7 +82,7 @@ class Tasks(Enum):
     selfcheck = Task("selfcheckgpt", "max-selfcheckgpt", "SelfCheckGPT")
     mmlu = Task("mmlu", "acc", "MMLU") #MMLU/Acc (5-shot)
     gsm8k = Task("gsm8k_custom", "em", "GSM8K") #GSM8K/EM (5-shot)
-    gsm8k_cot = Task("gsm8k_cot", "em", "GSM8K COT") #GSM8K COT/EM (5-shot)
 # These classes are for user facing column names,

     selfcheck = Task("selfcheckgpt", "max-selfcheckgpt", "SelfCheckGPT")
     mmlu = Task("mmlu", "acc", "MMLU") #MMLU/Acc (5-shot)
     gsm8k = Task("gsm8k_custom", "em", "GSM8K") #GSM8K/EM (5-shot)
+    # gsm8k_cot = Task("gsm8k_cot", "em", "GSM8K COT") #GSM8K COT/EM (5-shot)
 # These classes are for user facing column names,