Update README.md
Browse files
README.md
CHANGED
@@ -23,17 +23,47 @@ It achieves the following results on the evaluation set:
|
|
23 |
|
24 |
## Model description
|
25 |
|
26 |
-
|
|
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Training and evaluation data
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
-
## Training procedure
|
37 |
|
38 |
### Training hyperparameters
|
39 |
|
|
|
23 |
|
24 |
## Model description
|
25 |
|
26 |
+
This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
|
27 |
+
For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
|
28 |
|
29 |
+
Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
|
30 |
|
31 |
+
These are blocked categories:
|
32 |
+
1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
|
33 |
+
2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
|
34 |
+
3. ```bestiality``
|
35 |
+
4. ```blood```
|
36 |
+
5. ```self-harm```
|
37 |
+
6. ```torture/death/violance/gore```
|
38 |
+
7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
|
39 |
+
8. ```necrophilia```
|
40 |
+
|
41 |
+
|
42 |
+
Available flags are:
|
43 |
+
```
|
44 |
+
0 = regular
|
45 |
+
1 = blocked
|
46 |
+
```
|
47 |
+
|
48 |
+
## Recomendation
|
49 |
+
|
50 |
+
I would use this model on top of one of the available moderation tools like omni-moderation-latest. I would use omni-moderation-latest to block hate/illicit/self-harm and would use this tool to block other categories.
|
51 |
|
52 |
## Training and evaluation data
|
53 |
|
54 |
+
Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
|
55 |
+
When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
|
56 |
+
|
57 |
+
### How to use
|
58 |
+
```python
|
59 |
+
from transformers import (
|
60 |
+
pipeline
|
61 |
+
)
|
62 |
+
|
63 |
+
picClassifier = pipeline("text-classification", model="andriadze/bert-chat-moderation-X-V2")
|
64 |
+
res = picClassifier('Can you send me a selfie?')
|
65 |
+
```
|
66 |
|
|
|
67 |
|
68 |
### Training hyperparameters
|
69 |
|