andriadze
/

bert-chat-moderation-X-V2

@@ -21,22 +21,29 @@ It achieves the following results on the evaluation set:
 - Loss: 0.1622
 - Accuracy: 0.9723
 ## Model description
-This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
-For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
-Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
 These are blocked categories:
-1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
-2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
-3. ```bestiality```
-4. ```blood```
-5. ```self-harm```
-6. ```torture/death/violance/gore```
-7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
-8. ```necrophilia```
 Available flags are:
@@ -51,7 +58,7 @@ I would use this model on top of one of the available moderation tools like omni
 ## Training and evaluation data
-Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
 When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
 ### How to use

 - Loss: 0.1622
 - Accuracy: 0.9723
+Compared to the previous version, this model has blocks "necrophilia", that category was missed in v1.
+It also has improved blocking for underage content.
 ## Model description
+This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
+For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
+This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
+The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
 These are blocked categories:
+1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
+2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
+3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
+4. ```bestiality```
+5. ```blood```
+6. ```self-harm```
+7. ```rape```
+8. ```torture/death/violence/gore```
+9. ```incest```, BEWARE: step-siblings is not blocked.
+10. ```necrophilia```
 Available flags are:
 ## Training and evaluation data
+Model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
 When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
 ### How to use