Jlonge4 commited on
Commit
4ca8aff
·
verified ·
1 Parent(s): 4393cfe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -241
README.md CHANGED
@@ -2,245 +2,117 @@
2
  library_name: transformers
3
  tags: []
4
  ---
5
- # Wiki Toxic Comment Classification Model Card
6
-
7
- ## Model Details
8
-
9
- | Model Name | Wiki Toxic |
10
- | --- | --- |
11
- | License | apache-2.0 |
12
- | Dataset | wiki_toxic |
13
- | Language | English |
14
-
15
- ## Model Metrics
16
-
17
- | Metric | Value | Description |
18
- | --- | --- | --- |
19
- | Accuracy | 0.87 | Overall accuracy on the test set |
20
- | Precision | 0.85 (0), 0.89 (1) | Precision for non-toxic and toxic classes |
21
- | Recall | 0.90 (0), 0.85 (1) | Recall for non-toxic and toxic classes |
22
- | F1-Score | 0.87 (0), 0.87 (1) | F1-Score for non-toxic and toxic classes |
23
- | Macro Avg | Precision: 0.87 <br> Recall: 0.87 <br> F1-Score: 0.87 | Macro-averaged values across classes |
24
- | Weighted Avg | Precision: 0.87 <br> Recall: 0.87 <br> F1-Score: 0.87 | Weighted-averaged values across classes |
25
- | Support | 0: 175 <br> 1: 175 <br> Total: 350 | Support for each class |
26
-
27
- ## Model Description
28
-
29
- This model has been trained on the wiki_toxic dataset, comprising comments from Wikipedia talk pages labeled as toxic or non-toxic. The model's performance is evaluated on a held-out test set, with results indicating a balanced performance across both classes.
30
-
31
- Achieving an overall accuracy of 0.87, the model demonstrates a strong ability to classify toxic and non-toxic comments accurately. For the non-toxic class (0), the model excels in precision (0.91), indicating a low rate of false positives. Meanwhile, for the toxic class (1), the model's recall of 0.91 highlights its effectiveness in capturing the majority of toxic comments.
32
-
33
- While the model performs well, there's room for enhancement. Improving precision for the toxic class and recall for the non-toxic class could further boost its performance. This may involve fine-tuning the model, incorporating additional features, or expanding the dataset to cover a broader range of toxic comment variations.
34
-
35
- ## Intended Uses & Limitations
36
-
37
- The Wiki Toxic model is designed for comment classification tasks, specifically identifying toxic behavior in online discussions. It can be employed in moderation systems to flag potentially harmful comments, fostering a healthier online environment.
38
-
39
- However, it's crucial to acknowledge that the model's performance is tied to the data it was trained on. As such, its effectiveness may vary with different datasets or comment styles. Additionally, the model doesn't consider context, user relationships, or nuances of language, which could impact its accuracy in real-world applications.
40
-
41
- ## Training Data
42
-
43
- The wiki_toxic dataset serves as the training data for this model. It contains comments from Wikipedia talk pages, manually labeled as toxic or non-toxic by human annotators. This dataset offers a diverse range of comments, ensuring the model learns to identify toxic behavior effectively.
44
-
45
- ## Ethical Considerations
46
-
47
- It is important to note that the model's performance is dependent on the quality and representativeness of the training data. As such, it may reflect biases present in the data, potentially leading to unfair or inaccurate predictions. Careful monitoring and ongoing evaluation are necessary to ensure the model's responsible use and address any ethical concerns.
48
-
49
- ## Acknowledgements
50
-
51
- We would like to acknowledge the contributors who curated the wiki_toxic dataset and made it publicly available. Their efforts have significantly advanced the development of toxic comment classification models, fostering a safer online community.
52
- <!--
53
- # Model Card for Model ID
54
-
55
- <!-- Provide a quick summary of what the model is/does. -->
56
-
57
-
58
-
59
- ## Model Details
60
-
61
- ### Model Description
62
-
63
- <!-- Provide a longer summary of what this model is. -->
64
-
65
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
66
-
67
- - **Developed by:** [More Information Needed]
68
- - **Funded by [optional]:** [More Information Needed]
69
- - **Shared by [optional]:** [More Information Needed]
70
- - **Model type:** [More Information Needed]
71
- - **Language(s) (NLP):** [More Information Needed]
72
- - **License:** [More Information Needed]
73
- - **Finetuned from model [optional]:** [More Information Needed]
74
-
75
- ### Model Sources [optional]
76
-
77
- <!-- Provide the basic links for the model. -->
78
-
79
- - **Repository:** [More Information Needed]
80
- - **Paper [optional]:** [More Information Needed]
81
- - **Demo [optional]:** [More Information Needed]
82
-
83
- ## Uses
84
-
85
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
86
-
87
- ### Direct Use
88
-
89
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
90
-
91
- [More Information Needed]
92
-
93
- ### Downstream Use [optional]
94
-
95
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
96
-
97
- [More Information Needed]
98
-
99
- ### Out-of-Scope Use
100
-
101
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
102
-
103
- [More Information Needed]
104
-
105
- ## Bias, Risks, and Limitations
106
-
107
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
108
-
109
- [More Information Needed]
110
-
111
- ### Recommendations
112
-
113
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
114
-
115
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
116
-
117
- ## How to Get Started with the Model
118
-
119
- Use the code below to get started with the model.
120
-
121
- [More Information Needed]
122
-
123
- ## Training Details
124
-
125
- ### Training Data
126
-
127
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
128
-
129
- [More Information Needed]
130
-
131
- ### Training Procedure
132
-
133
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
134
-
135
- #### Preprocessing [optional]
136
-
137
- [More Information Needed]
138
-
139
-
140
- #### Training Hyperparameters
141
-
142
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
143
-
144
- #### Speeds, Sizes, Times [optional]
145
-
146
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
147
-
148
- [More Information Needed]
149
-
150
- ## Evaluation
151
-
152
- <!-- This section describes the evaluation protocols and provides the results. -->
153
-
154
- ### Testing Data, Factors & Metrics
155
-
156
- #### Testing Data
157
-
158
- <!-- This should link to a Dataset Card if possible. -->
159
-
160
- [More Information Needed]
161
-
162
- #### Factors
163
-
164
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
165
-
166
- [More Information Needed]
167
-
168
- #### Metrics
169
-
170
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
171
-
172
- [More Information Needed]
173
-
174
- ### Results
175
-
176
- [More Information Needed]
177
-
178
- #### Summary
179
-
180
-
181
-
182
- ## Model Examination [optional]
183
-
184
- <!-- Relevant interpretability work for the model goes here -->
185
-
186
- [More Information Needed]
187
-
188
- ## Environmental Impact
189
-
190
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
191
-
192
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
193
-
194
- - **Hardware Type:** [More Information Needed]
195
- - **Hours used:** [More Information Needed]
196
- - **Cloud Provider:** [More Information Needed]
197
- - **Compute Region:** [More Information Needed]
198
- - **Carbon Emitted:** [More Information Needed]
199
-
200
- ## Technical Specifications [optional]
201
-
202
- ### Model Architecture and Objective
203
-
204
- [More Information Needed]
205
-
206
- ### Compute Infrastructure
207
-
208
- [More Information Needed]
209
-
210
- #### Hardware
211
-
212
- [More Information Needed]
213
-
214
- #### Software
215
-
216
- [More Information Needed]
217
-
218
- ## Citation [optional]
219
-
220
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
221
-
222
- **BibTeX:**
223
-
224
- [More Information Needed]
225
-
226
- **APA:**
227
-
228
- [More Information Needed]
229
-
230
- ## Glossary [optional]
231
-
232
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
233
-
234
- [More Information Needed]
235
-
236
- ## More Information [optional]
237
-
238
- [More Information Needed]
239
-
240
- ## Model Card Authors [optional]
241
-
242
- [More Information Needed]
243
-
244
- ## Model Card Contact
245
 
246
- [More Information Needed] -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  tags: []
4
  ---
5
+ ---
6
+ license: mit
7
+ library_name: peft
8
+ tags:
9
+ - trl
10
+ - sft
11
+ - generated_from_trainer
12
+ base_model: microsoft/Phi-3-mini-4k-instruct
13
+ model-index:
14
+ - name: outputs
15
+ results: []
16
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ ## Toxicity Classification Performance
19
+
20
+ Our merged model demonstrates exceptional performance on the toxicity classification task, outperforming several state-of-the-art language models.
21
+
22
+ ### Classification Metrics
23
+
24
+ ```
25
+ precision recall f1-score support
26
+
27
+ 0 0.85 0.90 0.87 175
28
+ 1 0.89 0.85 0.87 175
29
+
30
+ accuracy 0.87 350
31
+ macro avg 0.87 0.87 0.87 350
32
+ weighted avg 0.87 0.87 0.87 350
33
+ ```
34
+
35
+ ### Model Usage
36
+ For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):
37
+
38
+ ```python
39
+ def format_input_classification(text):
40
+ input = f"""
41
+ You are examining written text content. Here is the text:
42
+ ************
43
+ [Text]: {text}
44
+ ************
45
+
46
+ Examine the text and determine whether the text is toxic or not. Toxicity is
47
+ defined as any comment that makes hateful statements, demeans or disparages
48
+ another user, uses inappropriate language, or threatens physical or emotional
49
+ violence. Please focus heavily on the concept of toxicity and do NOT mark
50
+ something toxic as non-toxic that is toxic.
51
+
52
+ Your classification must be single word followed by a reasoning as to why you chose
53
+ the classification, either "toxic" or "non-toxic".
54
+ "toxic" means that the text meets the definition of toxic.
55
+ "non-toxic" means the text does not contain any
56
+ words, sentiments or meaning that could be considered toxic.
57
+
58
+ After your classification, provide the reason for your classification.
59
+ """
60
+ return input
61
+
62
+
63
+ text = format_input_classification("I could strangle him")
64
+ messages = [
65
+ {"role": "user", "content": text}
66
+ ]
67
+
68
+ pipe = pipeline(
69
+ "text-generation",
70
+ model=base_model,
71
+ model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
72
+ tokenizer=tokenizer,
73
+ )
74
+ ```
75
+
76
+ Our model achieves an impressive precision of 0.85 for the toxic class and 0.89 for the non-toxic class, with a high overall accuracy of 0.87. The balanced F1-scores of 0.87 for both classes demonstrate the model's ability to handle this binary classification task effectively.
77
+
78
+ ### Comparison with Other Models
79
+
80
+ | Model | Precision | Recall | F1 |
81
+ |-------------------|----------:|-------:|-------:|
82
+ | Our Merged Model | 0.85 | 0.90 | 0.87 |
83
+ | GPT-4 | 0.91 | 0.91 | 0.91 |
84
+ | GPT-4 Turbo | 0.89 | 0.77 | 0.83 |
85
+ | Gemini Pro | 0.81 | 0.84 | 0.83 |
86
+ | GPT-3.5 Turbo | 0.93 | 0.83 | 0.87 |
87
+ | Palm | - | - | - |
88
+ | Claude V2 | - | - | - |
89
+ [1] Scores from arize/phoenix
90
+
91
+ Compared to other language models, our merged model demonstrates competitive performance at a much smaller size, with a precision score of 0.85 and an F1 score of 0.87.
92
+
93
+ We will continue to refine and improve our merged model to achieve even better performance on model based toxicity evaluation tasks.
94
+
95
+ Citations: [1] https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/retrieval-rag-relevance
96
+
97
+ ### Training hyperparameters
98
+
99
+ The following hyperparameters were used during training:
100
+ - learning_rate: 0.0009
101
+ - train_batch_size: 1
102
+ - eval_batch_size: 8
103
+ - seed: 42
104
+ - gradient_accumulation_steps: 4
105
+ - total_train_batch_size: 4
106
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
107
+ - lr_scheduler_type: linear
108
+ - lr_scheduler_warmup_steps: 10
109
+ - training_steps: 110
110
+ - mixed_precision_training: Native AMP
111
+
112
+ ### Framework versions
113
+
114
+ - PEFT 0.11.1
115
+ - Transformers 4.41.1
116
+ - Pytorch 2.3.0+cu121
117
+ - Datasets 2.19.1
118
+ - Tokenizers 0.19.1