description update
Browse files- src/about.py +7 -5
src/about.py
CHANGED
@@ -73,11 +73,7 @@ Almost every task has two versions: regex and multiple choice.
|
|
73 |
* _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
|
74 |
* _mc suffix means that a model is scored against every possible class (suitable also for base models)
|
75 |
|
76 |
-
Average columns are normalized against scores by "Baseline (majority class)".
|
77 |
-
* Average: {', '.join(all_tasks)}
|
78 |
-
* Avg g: {', '.join(g_tasks)}
|
79 |
-
* Avg mc: {', '.join(mc_tasks)}
|
80 |
-
* Acg RAG: {', '.join(rag_tasks)}
|
81 |
|
82 |
* `,chat` suffix means that a model is tested using chat templates
|
83 |
* `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
|
@@ -102,6 +98,12 @@ or join our [Discord SpeakLeash](https://discord.gg/FfYp4V6y3R)
|
|
102 |
|
103 |
## Tasks
|
104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
| Task | Dataset | Metric | Type |
|
106 |
|---------------------------------|---------------------------------------|-----------|-----------------|
|
107 |
| polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |
|
|
|
73 |
* _g suffix means that a model needs to generate an answer (only suitable for instructions-based models)
|
74 |
* _mc suffix means that a model is scored against every possible class (suitable also for base models)
|
75 |
|
76 |
+
Average columns are normalized against scores by "Baseline (majority class)".
|
|
|
|
|
|
|
|
|
77 |
|
78 |
* `,chat` suffix means that a model is tested using chat templates
|
79 |
* `,chat,multiturn` suffix means that a model is tested using chat templates and fewshot examples are treated as a multi-turn conversation
|
|
|
98 |
|
99 |
## Tasks
|
100 |
|
101 |
+
Tasks taken into account while calculating averages:
|
102 |
+
* Average: {', '.join(all_tasks)}
|
103 |
+
* Avg g: {', '.join(g_tasks)}
|
104 |
+
* Avg mc: {', '.join(mc_tasks)}
|
105 |
+
* Avg RAG: {', '.join(rag_tasks)}
|
106 |
+
|
107 |
| Task | Dataset | Metric | Type |
|
108 |
|---------------------------------|---------------------------------------|-----------|-----------------|
|
109 |
| polemo2_in | allegro/klej-polemo2-in | accuracy | generate_until |
|