In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸
It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.
This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.