File size: 16,630 Bytes
774dfb9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-11-15 17:16:28,761: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-11-15 17:16:28,761: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:16:28,761: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:16:31,122: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-11-15 17:16:31,123: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:16:31,123: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:16:33,057: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.30s INFO: 2024-11-15 17:16:33,199: llmtf.base.darumeru/PARus: Loading Dataset: 2.08s INFO: 2024-11-15 17:16:35,422: llmtf.base.darumeru/PARus: Processing Dataset: 2.22s INFO: 2024-11-15 17:16:35,422: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-11-15 17:16:35,431: llmtf.base.darumeru/PARus: {'acc': 0.28} INFO: 2024-11-15 17:16:35,432: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:16:35,432: llmtf.base.evaluator: mean darumeru/PARus 0.280 0.280 INFO: 2024-11-15 17:16:43,944: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-11-15 17:16:43,944: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:16:43,944: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:16:47,099: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.16s INFO: 2024-11-15 17:17:11,239: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 24.14s INFO: 2024-11-15 17:17:11,239: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-11-15 17:17:11,250: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.47465635738831613, 'f1_macro': 0.4733795691558009} INFO: 2024-11-15 17:17:11,257: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:17:11,258: llmtf.base.evaluator: mean darumeru/PARus darumeru/ruOpenBookQA 0.377 0.280 0.474 INFO: 2024-11-15 17:17:20,171: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-11-15 17:17:20,171: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:17:20,171: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:17:22,512: llmtf.base.darumeru/RWSD: Loading Dataset: 2.34s INFO: 2024-11-15 17:17:25,027: llmtf.base.darumeru/RWSD: Processing Dataset: 2.51s INFO: 2024-11-15 17:17:25,028: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-11-15 17:17:25,028: llmtf.base.darumeru/RWSD: {'acc': 0.4362745098039216} INFO: 2024-11-15 17:17:25,029: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:17:25,029: llmtf.base.evaluator: mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA 0.397 0.280 0.436 0.474 INFO: 2024-11-15 17:17:33,677: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-11-15 17:17:33,677: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:17:33,678: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:19:15,640: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 101.96s INFO: 2024-11-15 17:22:11,674: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 176.03s INFO: 2024-11-15 17:22:11,674: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-11-15 17:22:11,735: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.270000 anatomy 0.392593 astronomy 0.421053 business_ethics 0.410000 clinical_knowledge 0.426415 college_biology 0.340278 college_chemistry 0.200000 college_computer_science 0.330000 college_mathematics 0.300000 college_medicine 0.358382 college_physics 0.294118 computer_security 0.540000 conceptual_physics 0.331915 econometrics 0.359649 electrical_engineering 0.496552 elementary_mathematics 0.399471 formal_logic 0.269841 global_facts 0.390000 high_school_biology 0.393548 high_school_chemistry 0.418719 high_school_computer_science 0.610000 high_school_european_history 0.478788 high_school_geography 0.525253 high_school_government_and_politics 0.388601 high_school_macroeconomics 0.335897 high_school_mathematics 0.366667 high_school_microeconomics 0.407563 high_school_physics 0.331126 high_school_psychology 0.436697 high_school_statistics 0.356481 high_school_us_history 0.401961 high_school_world_history 0.535865 human_aging 0.403587 human_sexuality 0.412214 international_law 0.652893 jurisprudence 0.500000 logical_fallacies 0.374233 machine_learning 0.366071 management 0.456311 marketing 0.645299 medical_genetics 0.440000 miscellaneous 0.429119 moral_disputes 0.447977 moral_scenarios 0.242458 nutrition 0.408497 philosophy 0.459807 prehistory 0.398148 professional_accounting 0.329787 professional_law 0.327249 professional_medicine 0.294118 professional_psychology 0.370915 public_relations 0.436364 security_studies 0.416327 sociology 0.562189 us_foreign_policy 0.610000 virology 0.373494 world_religions 0.456140 INFO: 2024-11-15 17:22:11,743: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.375889 humanities 0.426566 other (business, health, misc.) 0.411257 social sciences 0.438472 INFO: 2024-11-15 17:22:11,748: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.41304613612768604} INFO: 2024-11-15 17:22:11,782: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:22:11,783: llmtf.base.evaluator: mean darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.401 0.280 0.436 0.474 0.413 INFO: 2024-11-15 17:22:20,448: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-11-15 17:22:20,448: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:22:20,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:22:24,310: llmtf.base.daru/treewayabstractive: Loading Dataset: 3.86s INFO: 2024-11-15 17:24:23,418: llmtf.base.darumeru/MultiQ: Processing Dataset: 470.36s INFO: 2024-11-15 17:24:23,419: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-11-15 17:24:23,420: llmtf.base.darumeru/MultiQ: {'f1': 0.22938942868828982, 'em': 0.13479923518164436} INFO: 2024-11-15 17:24:23,424: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:24:23,425: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.357 0.182 0.280 0.436 0.474 0.413 INFO: 2024-11-15 17:24:32,018: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-11-15 17:24:32,018: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:24:32,019: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:24:34,515: llmtf.base.darumeru/RCB: Loading Dataset: 2.50s INFO: 2024-11-15 17:24:37,191: llmtf.base.darumeru/RCB: Processing Dataset: 2.68s INFO: 2024-11-15 17:24:37,191: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-11-15 17:24:37,195: llmtf.base.darumeru/RCB: {'acc': 0.4636363636363636, 'f1_macro': 0.4278154677497561} INFO: 2024-11-15 17:24:37,195: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:24:37,196: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA nlpcoreteam/ruMMLU 0.372 0.182 0.280 0.446 0.436 0.474 0.413 INFO: 2024-11-15 17:24:45,714: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-11-15 17:24:45,714: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:24:45,714: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:24:48,302: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.59s INFO: 2024-11-15 17:24:49,696: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 1.39s INFO: 2024-11-15 17:24:49,696: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-11-15 17:24:49,699: llmtf.base.darumeru/ruWorldTree: {'acc': 0.6571428571428571, 'f1_macro': 0.6549941370855985} INFO: 2024-11-15 17:24:49,699: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:24:49,700: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.412 0.182 0.280 0.446 0.436 0.474 0.656 0.413 INFO: 2024-11-15 17:24:58,281: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-11-15 17:24:58,282: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:24:58,282: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:25:11,202: llmtf.base.daru/treewayextractive: Loading Dataset: 12.92s INFO: 2024-11-15 17:26:13,880: llmtf.base.daru/treewayabstractive: Processing Dataset: 229.57s INFO: 2024-11-15 17:26:13,880: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-11-15 17:26:13,881: llmtf.base.daru/treewayabstractive: {'rouge1': 0.31763876629967247, 'rouge2': 0.10272501116299726} INFO: 2024-11-15 17:26:13,882: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:26:13,883: llmtf.base.evaluator: mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.387 0.210 0.182 0.280 0.446 0.436 0.474 0.656 0.413 INFO: 2024-11-15 17:26:58,894: llmtf.base.daru/treewayextractive: Processing Dataset: 107.69s INFO: 2024-11-15 17:26:58,894: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-11-15 17:26:59,122: llmtf.base.daru/treewayextractive: {'r-prec': 0.3720740981240981} INFO: 2024-11-15 17:26:59,162: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:26:59,164: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.385 0.210 0.372 0.182 0.280 0.446 0.436 0.474 0.656 0.413 INFO: 2024-11-15 17:27:07,864: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-11-15 17:27:07,864: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:27:07,864: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:28:52,751: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 104.89s INFO: 2024-11-15 17:31:30,875: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 158.12s INFO: 2024-11-15 17:31:30,875: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-11-15 17:31:30,937: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.360000 anatomy 0.525926 astronomy 0.480263 business_ethics 0.510000 clinical_knowledge 0.581132 college_biology 0.500000 college_chemistry 0.280000 college_computer_science 0.390000 college_mathematics 0.290000 college_medicine 0.485549 college_physics 0.294118 computer_security 0.640000 conceptual_physics 0.459574 econometrics 0.394737 electrical_engineering 0.572414 elementary_mathematics 0.428571 formal_logic 0.333333 global_facts 0.350000 high_school_biology 0.567742 high_school_chemistry 0.477833 high_school_computer_science 0.660000 high_school_european_history 0.612121 high_school_geography 0.601010 high_school_government_and_politics 0.580311 high_school_macroeconomics 0.453846 high_school_mathematics 0.325926 high_school_microeconomics 0.537815 high_school_physics 0.350993 high_school_psychology 0.645872 high_school_statistics 0.435185 high_school_us_history 0.549020 high_school_world_history 0.649789 human_aging 0.533632 human_sexuality 0.526718 international_law 0.669421 jurisprudence 0.564815 logical_fallacies 0.607362 machine_learning 0.366071 management 0.621359 marketing 0.760684 medical_genetics 0.450000 miscellaneous 0.583653 moral_disputes 0.531792 moral_scenarios 0.242458 nutrition 0.549020 philosophy 0.517685 prehistory 0.540123 professional_accounting 0.382979 professional_law 0.362451 professional_medicine 0.345588 professional_psychology 0.449346 public_relations 0.509091 security_studies 0.555102 sociology 0.641791 us_foreign_policy 0.680000 virology 0.427711 world_religions 0.619883 INFO: 2024-11-15 17:31:30,945: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.437705 humanities 0.523096 other (business, health, misc.) 0.507659 social sciences 0.547970 INFO: 2024-11-15 17:31:30,950: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.5041077125064222} INFO: 2024-11-15 17:31:30,983: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:31:30,985: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.397 0.210 0.372 0.182 0.280 0.446 0.436 0.474 0.656 0.504 0.413 INFO: 2024-11-15 17:31:39,751: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-11-15 17:31:39,751: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-15 17:31:39,751: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-15 17:31:42,286: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.53s INFO: 2024-11-15 17:34:56,464: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 194.18s INFO: 2024-11-15 17:34:56,464: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-11-15 17:34:56,465: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.902600722678228, 'symbol_per_token': 3.932343331088908, 'len': 0.967827425390648, 'lcs': 0.23} INFO: 2024-11-15 17:34:56,466: llmtf.base.evaluator: Ended eval INFO: 2024-11-15 17:34:56,466: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.382 0.210 0.372 0.182 0.280 0.446 0.436 0.230 0.474 0.656 0.504 0.413 |