@m-ric on Hugging Face: "Our new Agentic leaderboard is now live!💥 If you ever asked which LLM is…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

m-ric

posted an update Mar 10

Post

1027

Our new Agentic leaderboard is now live!💥

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. ✅

🏆 GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. 💪

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)

DataSoul

Mar 11

For such immediate response tasks, models that require longer thinking processes don't seem to have an advantage. Perhaps more different series of general-purpose models should be added?

In this post