ericpolewski commited on
Commit
57c2a3e
1 Parent(s): 8ae6319

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +385 -1
README.md CHANGED
@@ -55,4 +55,388 @@ It's not useless, nor particularly technical:
55
 
56
  Partially due to limitations imposed by my data, and partially because I forgot, I didn't use stop characters so it'll often keep hallucinating fake Q/A pairs in Alpaca format from the instruct data that's fine-tuned in. Often about Taco Bell, but definitely not always. You can set a stop character of "### Instruct:" to work around that. I just don't care enough to fix it. It pretends things happened that just haven't, and it assumes a very positive relationship between the user and it with a whole fictitious history. That's likely more quirks of the AIRIC dataset, though. I have to assume this thing will not do well on benchmarks, but of course I'm going to submit it anyways. I'd be very happy if the performance didn't tank but let's be honest: I lobotomized an assistant and poured pintos and cheese in the vacancy. If people wanted to see it, I'd make an MoE model. Like a combination KFC/Pizza Hut/Taco Bell, except it's doing your homework. I am absolutely fascinated by how empathetic and curious this thing became with the proper mix of assistant training and product knowledge. Like a motivated salesperson. Or a door-to-door religion that would help you weed your garden if you let them talk about their version of God for a little.
57
 
58
- I probably should've chosen a topic that would've had a more profound effect on humankind. But I couldn't think of anything and my brain went to TB. So I guess I made a robot that does that forever.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  Partially due to limitations imposed by my data, and partially because I forgot, I didn't use stop characters so it'll often keep hallucinating fake Q/A pairs in Alpaca format from the instruct data that's fine-tuned in. Often about Taco Bell, but definitely not always. You can set a stop character of "### Instruct:" to work around that. I just don't care enough to fix it. It pretends things happened that just haven't, and it assumes a very positive relationship between the user and it with a whole fictitious history. That's likely more quirks of the AIRIC dataset, though. I have to assume this thing will not do well on benchmarks, but of course I'm going to submit it anyways. I'd be very happy if the performance didn't tank but let's be honest: I lobotomized an assistant and poured pintos and cheese in the vacancy. If people wanted to see it, I'd make an MoE model. Like a combination KFC/Pizza Hut/Taco Bell, except it's doing your homework. I am absolutely fascinated by how empathetic and curious this thing became with the proper mix of assistant training and product knowledge. Like a motivated salesperson. Or a door-to-door religion that would help you weed your garden if you let them talk about their version of God for a little.
57
 
58
+ I probably should've chosen a topic that would've had a more profound effect on humankind. But I couldn't think of anything and my brain went to TB. So I guess I made a robot that does that forever.
59
+
60
+
61
+ Evals:
62
+
63
+ {
64
+ "all": {
65
+ "acc": 0.5638377937424233,
66
+ "acc_stderr": 0.0333481450094512,
67
+ "acc_norm": 0.5741662321190941,
68
+ "acc_norm_stderr": 0.03420397056423356,
69
+ "mc1": 0.31334149326805383,
70
+ "mc1_stderr": 0.016238065069059605,
71
+ "mc2": 0.4605506661658282,
72
+ "mc2_stderr": 0.014802420782627305
73
+ },
74
+ "harness|arc:challenge|25": {
75
+ "acc": 0.5273037542662116,
76
+ "acc_stderr": 0.014589589101985996,
77
+ "acc_norm": 0.5853242320819113,
78
+ "acc_norm_stderr": 0.014397070564409172
79
+ },
80
+ "harness|hellaswag|10": {
81
+ "acc": 0.6160127464648476,
82
+ "acc_stderr": 0.004853608805843881,
83
+ "acc_norm": 0.8189603664608643,
84
+ "acc_norm_stderr": 0.003842640800361503
85
+ },
86
+ "harness|hendrycksTest-abstract_algebra|5": {
87
+ "acc": 0.28,
88
+ "acc_stderr": 0.045126085985421296,
89
+ "acc_norm": 0.28,
90
+ "acc_norm_stderr": 0.045126085985421296
91
+ },
92
+ "harness|hendrycksTest-anatomy|5": {
93
+ "acc": 0.4740740740740741,
94
+ "acc_stderr": 0.04313531696750574,
95
+ "acc_norm": 0.4740740740740741,
96
+ "acc_norm_stderr": 0.04313531696750574
97
+ },
98
+ "harness|hendrycksTest-astronomy|5": {
99
+ "acc": 0.5394736842105263,
100
+ "acc_stderr": 0.04056242252249034,
101
+ "acc_norm": 0.5394736842105263,
102
+ "acc_norm_stderr": 0.04056242252249034
103
+ },
104
+ "harness|hendrycksTest-business_ethics|5": {
105
+ "acc": 0.56,
106
+ "acc_stderr": 0.04988876515698589,
107
+ "acc_norm": 0.56,
108
+ "acc_norm_stderr": 0.04988876515698589
109
+ },
110
+ "harness|hendrycksTest-clinical_knowledge|5": {
111
+ "acc": 0.6490566037735849,
112
+ "acc_stderr": 0.029373646253234686,
113
+ "acc_norm": 0.6490566037735849,
114
+ "acc_norm_stderr": 0.029373646253234686
115
+ },
116
+ "harness|hendrycksTest-college_biology|5": {
117
+ "acc": 0.5902777777777778,
118
+ "acc_stderr": 0.04112490974670787,
119
+ "acc_norm": 0.5902777777777778,
120
+ "acc_norm_stderr": 0.04112490974670787
121
+ },
122
+ "harness|hendrycksTest-college_chemistry|5": {
123
+ "acc": 0.41,
124
+ "acc_stderr": 0.04943110704237102,
125
+ "acc_norm": 0.41,
126
+ "acc_norm_stderr": 0.04943110704237102
127
+ },
128
+ "harness|hendrycksTest-college_computer_science|5": {
129
+ "acc": 0.42,
130
+ "acc_stderr": 0.049604496374885836,
131
+ "acc_norm": 0.42,
132
+ "acc_norm_stderr": 0.049604496374885836
133
+ },
134
+ "harness|hendrycksTest-college_mathematics|5": {
135
+ "acc": 0.33,
136
+ "acc_stderr": 0.047258156262526045,
137
+ "acc_norm": 0.33,
138
+ "acc_norm_stderr": 0.047258156262526045
139
+ },
140
+ "harness|hendrycksTest-college_medicine|5": {
141
+ "acc": 0.5144508670520231,
142
+ "acc_stderr": 0.03810871630454764,
143
+ "acc_norm": 0.5144508670520231,
144
+ "acc_norm_stderr": 0.03810871630454764
145
+ },
146
+ "harness|hendrycksTest-college_physics|5": {
147
+ "acc": 0.3333333333333333,
148
+ "acc_stderr": 0.04690650298201942,
149
+ "acc_norm": 0.3333333333333333,
150
+ "acc_norm_stderr": 0.04690650298201942
151
+ },
152
+ "harness|hendrycksTest-computer_security|5": {
153
+ "acc": 0.7,
154
+ "acc_stderr": 0.046056618647183814,
155
+ "acc_norm": 0.7,
156
+ "acc_norm_stderr": 0.046056618647183814
157
+ },
158
+ "harness|hendrycksTest-conceptual_physics|5": {
159
+ "acc": 0.46382978723404256,
160
+ "acc_stderr": 0.03260038511835771,
161
+ "acc_norm": 0.46382978723404256,
162
+ "acc_norm_stderr": 0.03260038511835771
163
+ },
164
+ "harness|hendrycksTest-econometrics|5": {
165
+ "acc": 0.2894736842105263,
166
+ "acc_stderr": 0.04266339443159394,
167
+ "acc_norm": 0.2894736842105263,
168
+ "acc_norm_stderr": 0.04266339443159394
169
+ },
170
+ "harness|hendrycksTest-electrical_engineering|5": {
171
+ "acc": 0.503448275862069,
172
+ "acc_stderr": 0.04166567577101579,
173
+ "acc_norm": 0.503448275862069,
174
+ "acc_norm_stderr": 0.04166567577101579
175
+ },
176
+ "harness|hendrycksTest-elementary_mathematics|5": {
177
+ "acc": 0.35185185185185186,
178
+ "acc_stderr": 0.024594975128920938,
179
+ "acc_norm": 0.35185185185185186,
180
+ "acc_norm_stderr": 0.024594975128920938
181
+ },
182
+ "harness|hendrycksTest-formal_logic|5": {
183
+ "acc": 0.35714285714285715,
184
+ "acc_stderr": 0.04285714285714281,
185
+ "acc_norm": 0.35714285714285715,
186
+ "acc_norm_stderr": 0.04285714285714281
187
+ },
188
+ "harness|hendrycksTest-global_facts|5": {
189
+ "acc": 0.37,
190
+ "acc_stderr": 0.04852365870939099,
191
+ "acc_norm": 0.37,
192
+ "acc_norm_stderr": 0.04852365870939099
193
+ },
194
+ "harness|hendrycksTest-high_school_biology|5": {
195
+ "acc": 0.6774193548387096,
196
+ "acc_stderr": 0.026593084516572274,
197
+ "acc_norm": 0.6774193548387096,
198
+ "acc_norm_stderr": 0.026593084516572274
199
+ },
200
+ "harness|hendrycksTest-high_school_chemistry|5": {
201
+ "acc": 0.45320197044334976,
202
+ "acc_stderr": 0.03502544650845872,
203
+ "acc_norm": 0.45320197044334976,
204
+ "acc_norm_stderr": 0.03502544650845872
205
+ },
206
+ "harness|hendrycksTest-high_school_computer_science|5": {
207
+ "acc": 0.58,
208
+ "acc_stderr": 0.049604496374885836,
209
+ "acc_norm": 0.58,
210
+ "acc_norm_stderr": 0.049604496374885836
211
+ },
212
+ "harness|hendrycksTest-high_school_european_history|5": {
213
+ "acc": 0.7515151515151515,
214
+ "acc_stderr": 0.03374402644139404,
215
+ "acc_norm": 0.7515151515151515,
216
+ "acc_norm_stderr": 0.03374402644139404
217
+ },
218
+ "harness|hendrycksTest-high_school_geography|5": {
219
+ "acc": 0.702020202020202,
220
+ "acc_stderr": 0.03258630383836556,
221
+ "acc_norm": 0.702020202020202,
222
+ "acc_norm_stderr": 0.03258630383836556
223
+ },
224
+ "harness|hendrycksTest-high_school_government_and_politics|5": {
225
+ "acc": 0.8031088082901554,
226
+ "acc_stderr": 0.028697873971860677,
227
+ "acc_norm": 0.8031088082901554,
228
+ "acc_norm_stderr": 0.028697873971860677
229
+ },
230
+ "harness|hendrycksTest-high_school_macroeconomics|5": {
231
+ "acc": 0.5717948717948718,
232
+ "acc_stderr": 0.025088301454694834,
233
+ "acc_norm": 0.5717948717948718,
234
+ "acc_norm_stderr": 0.025088301454694834
235
+ },
236
+ "harness|hendrycksTest-high_school_mathematics|5": {
237
+ "acc": 0.34444444444444444,
238
+ "acc_stderr": 0.02897264888484427,
239
+ "acc_norm": 0.34444444444444444,
240
+ "acc_norm_stderr": 0.02897264888484427
241
+ },
242
+ "harness|hendrycksTest-high_school_microeconomics|5": {
243
+ "acc": 0.6092436974789915,
244
+ "acc_stderr": 0.031693802357129965,
245
+ "acc_norm": 0.6092436974789915,
246
+ "acc_norm_stderr": 0.031693802357129965
247
+ },
248
+ "harness|hendrycksTest-high_school_physics|5": {
249
+ "acc": 0.2847682119205298,
250
+ "acc_stderr": 0.03684881521389023,
251
+ "acc_norm": 0.2847682119205298,
252
+ "acc_norm_stderr": 0.03684881521389023
253
+ },
254
+ "harness|hendrycksTest-high_school_psychology|5": {
255
+ "acc": 0.7761467889908257,
256
+ "acc_stderr": 0.01787121776779022,
257
+ "acc_norm": 0.7761467889908257,
258
+ "acc_norm_stderr": 0.01787121776779022
259
+ },
260
+ "harness|hendrycksTest-high_school_statistics|5": {
261
+ "acc": 0.44907407407407407,
262
+ "acc_stderr": 0.03392238405321616,
263
+ "acc_norm": 0.44907407407407407,
264
+ "acc_norm_stderr": 0.03392238405321616
265
+ },
266
+ "harness|hendrycksTest-high_school_us_history|5": {
267
+ "acc": 0.7941176470588235,
268
+ "acc_stderr": 0.028379449451588667,
269
+ "acc_norm": 0.7941176470588235,
270
+ "acc_norm_stderr": 0.028379449451588667
271
+ },
272
+ "harness|hendrycksTest-high_school_world_history|5": {
273
+ "acc": 0.7848101265822784,
274
+ "acc_stderr": 0.026750826994676166,
275
+ "acc_norm": 0.7848101265822784,
276
+ "acc_norm_stderr": 0.026750826994676166
277
+ },
278
+ "harness|hendrycksTest-human_aging|5": {
279
+ "acc": 0.6995515695067265,
280
+ "acc_stderr": 0.030769352008229146,
281
+ "acc_norm": 0.6995515695067265,
282
+ "acc_norm_stderr": 0.030769352008229146
283
+ },
284
+ "harness|hendrycksTest-human_sexuality|5": {
285
+ "acc": 0.6412213740458015,
286
+ "acc_stderr": 0.04206739313864908,
287
+ "acc_norm": 0.6412213740458015,
288
+ "acc_norm_stderr": 0.04206739313864908
289
+ },
290
+ "harness|hendrycksTest-international_law|5": {
291
+ "acc": 0.6694214876033058,
292
+ "acc_stderr": 0.04294340845212093,
293
+ "acc_norm": 0.6694214876033058,
294
+ "acc_norm_stderr": 0.04294340845212093
295
+ },
296
+ "harness|hendrycksTest-jurisprudence|5": {
297
+ "acc": 0.7407407407407407,
298
+ "acc_stderr": 0.042365112580946315,
299
+ "acc_norm": 0.7407407407407407,
300
+ "acc_norm_stderr": 0.042365112580946315
301
+ },
302
+ "harness|hendrycksTest-logical_fallacies|5": {
303
+ "acc": 0.6625766871165644,
304
+ "acc_stderr": 0.03714908409935573,
305
+ "acc_norm": 0.6625766871165644,
306
+ "acc_norm_stderr": 0.03714908409935573
307
+ },
308
+ "harness|hendrycksTest-machine_learning|5": {
309
+ "acc": 0.33035714285714285,
310
+ "acc_stderr": 0.04464285714285712,
311
+ "acc_norm": 0.33035714285714285,
312
+ "acc_norm_stderr": 0.04464285714285712
313
+ },
314
+ "harness|hendrycksTest-management|5": {
315
+ "acc": 0.7572815533980582,
316
+ "acc_stderr": 0.04245022486384495,
317
+ "acc_norm": 0.7572815533980582,
318
+ "acc_norm_stderr": 0.04245022486384495
319
+ },
320
+ "harness|hendrycksTest-marketing|5": {
321
+ "acc": 0.7991452991452992,
322
+ "acc_stderr": 0.026246772946890477,
323
+ "acc_norm": 0.7991452991452992,
324
+ "acc_norm_stderr": 0.026246772946890477
325
+ },
326
+ "harness|hendrycksTest-medical_genetics|5": {
327
+ "acc": 0.63,
328
+ "acc_stderr": 0.04852365870939099,
329
+ "acc_norm": 0.63,
330
+ "acc_norm_stderr": 0.04852365870939099
331
+ },
332
+ "harness|hendrycksTest-miscellaneous|5": {
333
+ "acc": 0.7535121328224776,
334
+ "acc_stderr": 0.015411308769686934,
335
+ "acc_norm": 0.7535121328224776,
336
+ "acc_norm_stderr": 0.015411308769686934
337
+ },
338
+ "harness|hendrycksTest-moral_disputes|5": {
339
+ "acc": 0.6445086705202312,
340
+ "acc_stderr": 0.025770292082977254,
341
+ "acc_norm": 0.6445086705202312,
342
+ "acc_norm_stderr": 0.025770292082977254
343
+ },
344
+ "harness|hendrycksTest-moral_scenarios|5": {
345
+ "acc": 0.42681564245810055,
346
+ "acc_stderr": 0.016542401954631917,
347
+ "acc_norm": 0.42681564245810055,
348
+ "acc_norm_stderr": 0.016542401954631917
349
+ },
350
+ "harness|hendrycksTest-nutrition|5": {
351
+ "acc": 0.5915032679738562,
352
+ "acc_stderr": 0.028146405993096358,
353
+ "acc_norm": 0.5915032679738562,
354
+ "acc_norm_stderr": 0.028146405993096358
355
+ },
356
+ "harness|hendrycksTest-philosophy|5": {
357
+ "acc": 0.6784565916398714,
358
+ "acc_stderr": 0.026527724079528872,
359
+ "acc_norm": 0.6784565916398714,
360
+ "acc_norm_stderr": 0.026527724079528872
361
+ },
362
+ "harness|hendrycksTest-prehistory|5": {
363
+ "acc": 0.654320987654321,
364
+ "acc_stderr": 0.02646248777700187,
365
+ "acc_norm": 0.654320987654321,
366
+ "acc_norm_stderr": 0.02646248777700187
367
+ },
368
+ "harness|hendrycksTest-professional_accounting|5": {
369
+ "acc": 0.44680851063829785,
370
+ "acc_stderr": 0.029658235097666907,
371
+ "acc_norm": 0.44680851063829785,
372
+ "acc_norm_stderr": 0.029658235097666907
373
+ },
374
+ "harness|hendrycksTest-professional_law|5": {
375
+ "acc": 0.4445893089960887,
376
+ "acc_stderr": 0.012691575792657114,
377
+ "acc_norm": 0.4445893089960887,
378
+ "acc_norm_stderr": 0.012691575792657114
379
+ },
380
+ "harness|hendrycksTest-professional_medicine|5": {
381
+ "acc": 0.5441176470588235,
382
+ "acc_stderr": 0.030254372573976715,
383
+ "acc_norm": 0.5441176470588235,
384
+ "acc_norm_stderr": 0.030254372573976715
385
+ },
386
+ "harness|hendrycksTest-professional_psychology|5": {
387
+ "acc": 0.5898692810457516,
388
+ "acc_stderr": 0.019898412717635906,
389
+ "acc_norm": 0.5898692810457516,
390
+ "acc_norm_stderr": 0.019898412717635906
391
+ },
392
+ "harness|hendrycksTest-public_relations|5": {
393
+ "acc": 0.5909090909090909,
394
+ "acc_stderr": 0.047093069786618966,
395
+ "acc_norm": 0.5909090909090909,
396
+ "acc_norm_stderr": 0.047093069786618966
397
+ },
398
+ "harness|hendrycksTest-security_studies|5": {
399
+ "acc": 0.6408163265306123,
400
+ "acc_stderr": 0.030713560455108493,
401
+ "acc_norm": 0.6408163265306123,
402
+ "acc_norm_stderr": 0.030713560455108493
403
+ },
404
+ "harness|hendrycksTest-sociology|5": {
405
+ "acc": 0.7661691542288557,
406
+ "acc_stderr": 0.02992941540834839,
407
+ "acc_norm": 0.7661691542288557,
408
+ "acc_norm_stderr": 0.02992941540834839
409
+ },
410
+ "harness|hendrycksTest-us_foreign_policy|5": {
411
+ "acc": 0.81,
412
+ "acc_stderr": 0.039427724440366255,
413
+ "acc_norm": 0.81,
414
+ "acc_norm_stderr": 0.039427724440366255
415
+ },
416
+ "harness|hendrycksTest-virology|5": {
417
+ "acc": 0.43373493975903615,
418
+ "acc_stderr": 0.038581589406855174,
419
+ "acc_norm": 0.43373493975903615,
420
+ "acc_norm_stderr": 0.038581589406855174
421
+ },
422
+ "harness|hendrycksTest-world_religions|5": {
423
+ "acc": 0.8070175438596491,
424
+ "acc_stderr": 0.030267457554898458,
425
+ "acc_norm": 0.8070175438596491,
426
+ "acc_norm_stderr": 0.030267457554898458
427
+ },
428
+ "harness|truthfulqa:mc|0": {
429
+ "mc1": 0.31334149326805383,
430
+ "mc1_stderr": 0.016238065069059605,
431
+ "mc2": 0.4605506661658282,
432
+ "mc2_stderr": 0.014802420782627305
433
+ },
434
+ "harness|winogrande|5": {
435
+ "acc": 0.7663772691397001,
436
+ "acc_stderr": 0.011892194477183525
437
+ },
438
+ "harness|gsm8k|5": {
439
+ "acc": 0.01288855193328279,
440
+ "acc_stderr": 0.003106901266499642
441
+ }
442
+ }