KoalaAI/Text-Moderation · How to effectively use this?

My plan was to just sort by worst scores, and flag content that is over a threshold (e.g. >0.5).

Been testing it a bit and I'm seeing varying results.
Sometimes it correctly identifies "S 0.6641" on a text that clearly implies something is about to go down. But then another text gets "S 0.0121" with all over low scores, despite being even more suggestive.

I've even seen some texts outright containing one the really bad 'r' words and really low scores around 0.25. These make me think that the flat threshold is perhaps not enough on its own.

In my flagging attempts, the OK score was also low. Perhaps a low "OK" score can also be a useful indicator that something is off about the content.

So maybe flag anything where the OK score isn't the biggest score? And perhaps also grab content that has any bad value above 0.2 when OK is below 0.6? (Not an expert on this, just throwing ideas)