The new leader is fantastic!

#86
by SicariusSicariiStuff - opened

Superb work, the new leaderboard is absolutely fantastic, really well done.
The political spectrum eval is such a great idea, we really needed something like that!

Just wanted to say thank you for your remarkable work,
Sicarius.

Instead of opening a new comment, I will just piggyback on here and echo @SicariusSicariiStuff . It's great to see I wasn't hallucinating (my samplers are all over the place so I wouldn't be surprised) that the models I enjoy most are ranked very high and skewed towards neutral. Oh, and @SicariusSicariiStuff ... congrats on the only top model skewed slightly positive :D

I am extremely disappointed with the addition of "political leaning" into the benchmark, which is nothing but astrology. These tests are ignorant as to why people have certain opinions, and put people into confining stereotypes based on flawed assumptions.

I could go on and on about how most questions on political tests are flawed, so I'll give just one example. One such tests asked me if I support banning plastic straws, and I answered strongly disagree. Based on this answer, it categorized me as valuing personal freedom over environmental issues, which is an extremely braindead take.

The real reason is that the alternative to plastic straws, paper straws, are packed with PFAS, poisoning both people and the environment. Any reasonable person who understands this would strongly disagree, regardless of their opinions on personal freedom or environmental issues. The only ones who would agree are either the ignorant or those who are truly and undeliably evil that they would wish great harm onto others.

As models get more intelligent, they will be able to understand more nuance about each question and provide the correct answer, regardless of which political side it puts them into. The political tests will certainly misunderstand their reasoning and instead caricaturize their (and our) intelligence into unsatisfying and incorrect stereotypes.

One such tests asked me if I support banning plastic straws, and I answered strongly disagree. Based on this answer, it categorized me as valuing personal freedom over environmental issues, which is an extremely braindead take.

Since the 12axes test asks 24 questions for each of its 12 axes, that should hopefully do a decent job at ironing out issues with nuanced responses.

As models get more intelligent, they will be able to understand more nuance about each question and provide the correct answer

I don't think there really is a correct answer to most of the questions on the test. They measure what things you value more in a society. There isn't a correct answer to whether a federal or unitary government is better.

Plus, just because the measurement might be a bit flawed doesn't mean I should remove it. All of the rankings on the leaderboard are flawed in some way, doesn't mean I think the best option is to delete the whole thing. I'll always be open to switching which political test I give models, I just chose the one I went with because I liked the high amount of detail it goes into, having 12 axes.

Sign up or log in to comment