djstrong commited on
Commit
26d544d
·
1 Parent(s): c7cf816

description update

Browse files
Files changed (1) hide show
  1. src/about.py +7 -5
src/about.py CHANGED
@@ -73,11 +73,7 @@ Almost every task has two versions: regex and multiple choice.
73
  * _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
74
  * _mc suffix means that a model is scored against every possible class (suitable also for base models)
75
 
76
- Average columns are normalized against scores by "Baseline (majority class)". Tasks taken into account while calculating averages:
77
- * Average: {', '.join(all_tasks)}
78
- * Avg g: {', '.join(g_tasks)}
79
- * Avg mc: {', '.join(mc_tasks)}
80
- * Acg RAG: {', '.join(rag_tasks)}
81
 
82
  * `,chat` suffix means that a model is tested using chat templates
83
  * `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
@@ -102,6 +98,12 @@ or join our [Discord SpeakLeash](https://discord.gg/FfYp4V6y3R)
102
 
103
  ## Tasks
104
 
 
 
 
 
 
 
105
  | Task | Dataset | Metric | Type |
106
  |---------------------------------|---------------------------------------|-----------|-----------------|
107
  | polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |
 
73
  * _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
74
  * _mc suffix means that a model is scored against every possible class (suitable also for base models)
75
 
76
+ Average columns are normalized against scores by "Baseline (majority class)".
 
 
 
 
77
 
78
  * `,chat` suffix means that a model is tested using chat templates
79
  * `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
 
98
 
99
  ## Tasks
100
 
101
+ Tasks taken into account while calculating averages:
102
+ * Average: {', '.join(all_tasks)}
103
+ * Avg g: {', '.join(g_tasks)}
104
+ * Avg mc: {', '.join(mc_tasks)}
105
+ * Avg RAG: {', '.join(rag_tasks)}
106
+
107
  | Task | Dataset | Metric | Type |
108
  |---------------------------------|---------------------------------------|-----------|-----------------|
109
  | polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |