Running 43 43 Stick To Your Role! Leaderboard 🎭 Benchmarking LLMs on the stability of simulated populations
Running on CPU Upgrade 12.9k 12.9k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots