Update README.md
Browse files
README.md
CHANGED
@@ -21,22 +21,29 @@ It achieves the following results on the evaluation set:
|
|
21 |
- Loss: 0.1622
|
22 |
- Accuracy: 0.9723
|
23 |
|
|
|
|
|
|
|
24 |
## Model description
|
25 |
|
26 |
-
This model came to be because currently available moderation tools are not strict enough.
|
27 |
-
For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
|
|
|
|
|
28 |
|
29 |
-
|
30 |
|
31 |
These are blocked categories:
|
32 |
-
1. ```minors
|
33 |
-
2. ```
|
34 |
-
3. ```
|
35 |
-
4. ```
|
36 |
-
5. ```
|
37 |
-
6. ```
|
38 |
-
7. ```
|
39 |
-
8. ```
|
|
|
|
|
40 |
|
41 |
|
42 |
Available flags are:
|
@@ -51,7 +58,7 @@ I would use this model on top of one of the available moderation tools like omni
|
|
51 |
|
52 |
## Training and evaluation data
|
53 |
|
54 |
-
Model was trained on 40k messages, it's a mix of synthetic and real
|
55 |
When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
|
56 |
|
57 |
### How to use
|
|
|
21 |
- Loss: 0.1622
|
22 |
- Accuracy: 0.9723
|
23 |
|
24 |
+
Compared to the previous version, this model has blocks "necrophilia", that category was missed in v1.
|
25 |
+
It also has improved blocking for underage content.
|
26 |
+
|
27 |
## Model description
|
28 |
|
29 |
+
This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
|
30 |
+
For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
|
31 |
+
|
32 |
+
This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
|
33 |
|
34 |
+
The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
|
35 |
|
36 |
These are blocked categories:
|
37 |
+
1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
|
38 |
+
2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
|
39 |
+
3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
|
40 |
+
4. ```bestiality```
|
41 |
+
5. ```blood```
|
42 |
+
6. ```self-harm```
|
43 |
+
7. ```rape```
|
44 |
+
8. ```torture/death/violence/gore```
|
45 |
+
9. ```incest```, BEWARE: step-siblings is not blocked.
|
46 |
+
10. ```necrophilia```
|
47 |
|
48 |
|
49 |
Available flags are:
|
|
|
58 |
|
59 |
## Training and evaluation data
|
60 |
|
61 |
+
Model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
|
62 |
When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
|
63 |
|
64 |
### How to use
|