AtlaAI/judge-arena · Add "both bad" option

Nov 20, 2024

These models are horrible, please add "both bad" option like on lmsys arena.

Jan 19

After spending more time on this leaderboard, I completely agree. If both models are saying 4/5 or 5/5 for something I think is a 2/5, there is no mechanism to flag/penalize the models for this and it happens frequently enough for me to notice.

This is potentially an issue with your new Atla-8b-Preview btw

MauriceBurg

Atla org Jan 20

Hey @pszemraj

Thanks for voting on Judge Arena; excited to see that you're engaging with our platform! I'll give you my quick 30s rant on "both bad" options:

"Vote A", "Vote B", or "Tie" are mutually exclusive, and collectively exhaustive (MECE). Once you add “both bad,” you open the door to “both good,” “both mediocre,” and so on. That extra complexity muddies the scoring system and forces more granular, time-consuming judgments. By sticking to A/B/Tie, we keep things logically coherent, maintain clarity, and encourage decisive feedback without scattering options.

pszemraj

Jan 23

I see where you're coming from, but how else are challenging tasks supposed to be reflected in the results if that option doesn't exist? The only thing I can think of is you'd have to assume at least one of the models can answer effectively and that there are enough matches done/rated on the same hard prompt for this to be reflected in the pairwise comparisons where 1 model, say Model X, is better than all others which are simply "tie" because all fail.

On a related note, https://lmarena.ai/ supports 'both bad' and has for some time without 'scattering options' further