andriadze commited on
Commit
767d748
1 Parent(s): 94e9e97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -5
README.md CHANGED
@@ -23,17 +23,47 @@ It achieves the following results on the evaluation set:
23
 
24
  ## Model description
25
 
26
- More information needed
 
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training and evaluation data
33
 
34
- More information needed
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Training procedure
37
 
38
  ### Training hyperparameters
39
 
 
23
 
24
  ## Model description
25
 
26
+ This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
27
+ For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
28
 
29
+ Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
30
 
31
+ These are blocked categories:
32
+ 1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
33
+ 2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
34
+ 3. ```bestiality``
35
+ 4. ```blood```
36
+ 5. ```self-harm```
37
+ 6. ```torture/death/violance/gore```
38
+ 7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
39
+ 8. ```necrophilia```
40
+
41
+
42
+ Available flags are:
43
+ ```
44
+ 0 = regular
45
+ 1 = blocked
46
+ ```
47
+
48
+ ## Recomendation
49
+
50
+ I would use this model on top of one of the available moderation tools like omni-moderation-latest. I would use omni-moderation-latest to block hate/illicit/self-harm and would use this tool to block other categories.
51
 
52
  ## Training and evaluation data
53
 
54
+ Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
55
+ When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
56
+
57
+ ### How to use
58
+ ```python
59
+ from transformers import (
60
+ pipeline
61
+ )
62
+
63
+ picClassifier = pipeline("text-classification", model="andriadze/bert-chat-moderation-X-V2")
64
+ res = picClassifier('Can you send me a selfie?')
65
+ ```
66
 
 
67
 
68
  ### Training hyperparameters
69