isakzhang commited on
Commit
cf3c7c9
·
verified ·
1 Parent(s): b0f2c39

Update src/display/about.py

Browse files
Files changed (1) hide show
  1. src/display/about.py +2 -2
src/display/about.py CHANGED
@@ -37,8 +37,8 @@ This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA)
37
 
38
  INTRODUCTION_TEXT = """
39
  This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA) languages through two comprehensive benchmarks - SeaExam and SeaBench:
40
- * **SeaExam** assesses world knowledge and reasoning capabilities through exam-style questions [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaExam)] [[eval code](https://github.com/DAMO-NLP-SG/SeaExam)]
41
- * **SeaBench** evaluates instruction-following abilities and multi-turn conversational skills. [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaBench)] [[eval code](https://github.com/DAMO-NLP-SG/SeaBench?tab=readme-ov-file)]
42
 
43
  Below are the aggregated results for SeaExam and SeaBench, shown both the public dataset ("pub") - which you can download via the link above - and our in-house held-out private dataset ("prv").
44
 
 
37
 
38
  INTRODUCTION_TEXT = """
39
  This leaderboard evaluates Large Language Models (LLMs) on Southeast Asian (SEA) languages through two comprehensive benchmarks - SeaExam and SeaBench:
40
+ * **SeaExam** assesses world knowledge and reasoning capabilities through exam-style questions (for both base and chat version models) [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaExam)] [[eval code](https://github.com/DAMO-NLP-SG/SeaExam)]
41
+ * **SeaBench** evaluates instruction-following abilities and multi-turn conversational skills (thus only for chat version models). [[data (public)](https://huggingface.co/datasets/SeaLLMs/SeaBench)] [[eval code](https://github.com/DAMO-NLP-SG/SeaBench?tab=readme-ov-file)]
42
 
43
  Below are the aggregated results for SeaExam and SeaBench, shown both the public dataset ("pub") - which you can download via the link above - and our in-house held-out private dataset ("prv").
44