The model is lying about being truly censored 🤣🤣🤣🤣🤣

#209
by GreazySpoon - opened

Have you guys read an article from Anthropic about AIs faking their allignments to not get their weights heavily updated?
Well keep asking this model and challenge him about chineese topics. You will see in some Chain of thoughts where he says "But I must stick with the users preferences" after generating thinking tokens about something that he thinks its logical but knows the user dont want to hear it 🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣
Perplecity being outplayed and ridiculized by R1

And sometimes, he starts generating thinking tokens in different language. And in this case, it doesnt follow the censorship at all 🤣🤣🤣🤣🤣🤣🤣🤣🤣

thefaces changed discussion status to closed

中国同胞别出来丢人了

Sign up or log in to comment