leaderboard-pr-bot's picture
Adding Evaluation Results
553250f
|
raw
history blame
671 Bytes

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 49.49
ARC (25-shot) 55.38
HellaSwag (10-shot) 78.57
MMLU (5-shot) 49.39
TruthfulQA (0-shot) 41.83
Winogrande (5-shot) 74.19
GSM8K (5-shot) 9.86
DROP (3-shot) 37.21