Running
on
CPU Upgrade
181
🥇
MMLU Pro
More advanced and challenging multi-task evaluation
More advanced and challenging multi-task evaluation
Compact LLM Battle Arena: Frugal AI Face-Off!
VLMEvalKit Eval Results in video understanding benchmark
Track, rank and evaluate open LLMs and chatbots
Vote on the top HF TTS models!