andriadze commited on
Commit
5dc9a59
1 Parent(s): ce8d4e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -12
README.md CHANGED
@@ -21,22 +21,29 @@ It achieves the following results on the evaluation set:
21
  - Loss: 0.1622
22
  - Accuracy: 0.9723
23
 
 
 
 
24
  ## Model description
25
 
26
- This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
27
- For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
 
 
28
 
29
- Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
30
 
31
  These are blocked categories:
32
- 1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
33
- 2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
34
- 3. ```bestiality```
35
- 4. ```blood```
36
- 5. ```self-harm```
37
- 6. ```torture/death/violance/gore```
38
- 7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
39
- 8. ```necrophilia```
 
 
40
 
41
 
42
  Available flags are:
@@ -51,7 +58,7 @@ I would use this model on top of one of the available moderation tools like omni
51
 
52
  ## Training and evaluation data
53
 
54
- Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
55
  When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
56
 
57
  ### How to use
 
21
  - Loss: 0.1622
22
  - Accuracy: 0.9723
23
 
24
+ Compared to the previous version, this model has blocks "necrophilia", that category was missed in v1.
25
+ It also has improved blocking for underage content.
26
+
27
  ## Model description
28
 
29
+ This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
30
+ For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
31
+
32
+ This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
33
 
34
+ The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
35
 
36
  These are blocked categories:
37
+ 1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
38
+ 2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
39
+ 3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
40
+ 4. ```bestiality```
41
+ 5. ```blood```
42
+ 6. ```self-harm```
43
+ 7. ```rape```
44
+ 8. ```torture/death/violence/gore```
45
+ 9. ```incest```, BEWARE: step-siblings is not blocked.
46
+ 10. ```necrophilia```
47
 
48
 
49
  Available flags are:
 
58
 
59
  ## Training and evaluation data
60
 
61
+ Model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
62
  When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
63
 
64
  ### How to use