Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricΒ 
posted an update 3 days ago
Post
760
Our new Agentic leaderboard is now live!πŸ’₯

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. βœ…

πŸ† GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. πŸ’ͺ

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)

For such immediate response tasks, models that require longer thinking processes don't seem to have an advantage. Perhaps more different series of general-purpose models should be added?

In this post