Muennighoff's picture
Merge eval
5443e66
raw
history blame
1.05 kB
task,metric,value,err,version
anli_r1,acc,0.343,0.015019206922356953,0
anli_r2,acc,0.318,0.014734079309311901,0
anli_r3,acc,0.325,0.013526454480351028,0
arc_challenge,acc,0.2901023890784983,0.013261573677520759,0
arc_challenge,acc_norm,0.31313993174061433,0.013552671543623496,0
arc_easy,acc,0.6325757575757576,0.009892552616211558,0
arc_easy,acc_norm,0.617003367003367,0.009974920384536479,0
boolq,acc,0.5489296636085627,0.008703080962379622,1
cb,acc,0.42857142857142855,0.06672848092813058,1
cb,f1,0.3058470764617691,,1
copa,acc,0.78,0.04163331998932263,0
hellaswag,acc,0.45727942640908187,0.004971534874389935,0
hellaswag,acc_norm,0.602867954590719,0.004883037758919964,0
piqa,acc,0.7540805223068553,0.010047331865625194,0
piqa,acc_norm,0.7698585418933623,0.009820832826839796,0
rte,acc,0.48736462093862815,0.030086851767188564,0
sciq,acc,0.906,0.009233052000787738,0
sciq,acc_norm,0.891,0.009859828407037186,0
storycloze_2016,acc,0.7215392838054516,0.010365521460604415,0
winogrande,acc,0.5808997632202052,0.013867325192210116,0