Gregor Betz commited on
Commit
058891a
β€’
1 Parent(s): f621b6a

description

Browse files
Files changed (1) hide show
  1. src/display/about.py +5 -2
src/display/about.py CHANGED
@@ -51,10 +51,13 @@ Each `regime` has a different _accuracy gain Ξ”_, and the leaderboard reports (f
51
 
52
  Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
53
 
 
 
54
  |πŸ€— Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
55
  |---|---|
56
- |Can `model` solve task?|Does `model` do CoT to improve in task?|
57
- |Measures absolute performance.|Measures relative performance gains.|
 
58
  |Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
59
 
60
 
 
51
 
52
  Performance leaderboards like the [πŸ€— Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
53
 
54
+ Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
55
+
56
  |πŸ€— Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
57
  |---|---|
58
+ |Can `model` solve `task`?|Can `model` do CoT to improve in `task`?|
59
+ |Measures `task` performance.|Measures ability to reason (about `task`).|
60
+ |Metric: absolute accuracy.|Metric: relative accuracy gain.|
61
  |Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
62
 
63