Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper β’ 2404.18796 β’ Published Apr 29, 2024 β’ 71
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper β’ 2405.01535 β’ Published May 2, 2024 β’ 123
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 65 items β’ Updated Mar 20 β’ 582