Gregor Betz commited on
Commit
3fc8d52
β€’
1 Parent(s): 058891a
Files changed (1) hide show
  1. src/display/about.py +26 -7
src/display/about.py CHANGED
@@ -49,16 +49,35 @@ Each `regime` has a different _accuracy gain Ξ”_, and the leaderboard reports (f
49
 
50
  ## How is it different from other leaderboards?
51
 
52
- Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
55
 
56
- |πŸ€— Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
57
- |---|---|
58
- |Can `model` solve `task`?|Can `model` do CoT to improve in `task`?|
59
- |Measures `task` performance.|Measures ability to reason (about `task`).|
60
- |Metric: absolute accuracy.|Metric: relative accuracy gain.|
61
- |Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
 
64
  ## Test dataset selection (`tasks`)
 
49
 
50
  ## How is it different from other leaderboards?
51
 
52
+ Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according to task performance.
53
 
54
  Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
55
 
56
+
57
+ <table>
58
+ <tr style="text-align:center;">
59
+ <td>πŸ€— Open LLM Leaderboard </td>
60
+ <td>`/\/` Open CoT Leaderboard </td>
61
+ </tr>
62
+ <tr>
63
+ <td>Can `model` solve `task`?</td>
64
+ <td>Can `model` do CoT to improve in `task`?</td>
65
+ </tr>
66
+ <tr>
67
+ <td>Measures `task` performance.</td>
68
+ <td>Measures ability to reason (about `task`).</td>
69
+ </tr>
70
+ <tr>
71
+ <td>Metric: absolute accuracy.</td>
72
+ <td>Metric: relative accuracy gain.</td>
73
+ </tr>
74
+ <tr>
75
+ <td>Covers broad spectrum of `tasks`.</td>
76
+ <td>Focuses on critical thinking `tasks`.</td>
77
+ </tr>
78
+ </table>
79
+
80
+
81
 
82
 
83
  ## Test dataset selection (`tasks`)