Spaces:

SeaLLMs
/

LLM_Leaderboard_for_SEA

Running

isakzhang commited on Nov 26, 2024

Commit

cf3c7c9

verified ·

1 Parent(s): b0f2c39

Update src/display/about.py

Files changed (1) hide show

src/display/about.py CHANGED Viewed

@@ -37,8 +37,8 @@ This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA)
 INTRODUCTION_TEXT = """
 This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA) languages through two comprehensive benchmarks - SeaExam and SeaBench:
-* **SeaExam** assesses world knowledge and reasoning capabilities through exam-style questions [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaExam)] [[eval code](https://github.com/DAMO-NLP-SG/SeaExam)]
-* **SeaBench** evaluates instruction-following abilities and multi-turn conversational skills. [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaBench)] [[eval code](https://github.com/DAMO-NLP-SG/SeaBench?tab=readme-ov-file)]
 Below are the aggregated results for SeaExam and SeaBench, shown both the public dataset ("pub") - which you can download via the link above - and our in-house held-out private dataset ("prv").

 INTRODUCTION_TEXT = """
 This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA) languages through two comprehensive benchmarks - SeaExam and SeaBench:
+* **SeaExam** assesses world knowledge and reasoning capabilities through exam-style questions (for both base and chat version models) [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaExam)] [[eval code](https://github.com/DAMO-NLP-SG/SeaExam)]
+* **SeaBench** evaluates instruction-following abilities and multi-turn conversational skills (thus only for chat version models). [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaBench)] [[eval code](https://github.com/DAMO-NLP-SG/SeaBench?tab=readme-ov-file)]
 Below are the aggregated results for SeaExam and SeaBench, shown both the public dataset ("pub") - which you can download via the link above - and our in-house held-out private dataset ("prv").