grounded-ai
/

phi3-toxicity-judge-merge

@@ -2,245 +2,117 @@
 library_name: transformers
 tags: []
 ---
-# Wiki Toxic Comment Classification Model Card
-## Model Details
-| Model Name | Wiki Toxic |
-| --- | --- |
-| License | apache-2.0 |
-| Dataset | wiki_toxic |
-| Language | English |
-## Model Metrics
-| Metric | Value | Description |
-| --- | --- | --- |
-| Accuracy | 0.87 | Overall accuracy on the test set |
-| Precision | 0.85 (0), 0.89 (1) | Precision for non-toxic and toxic classes |
-| Recall | 0.90 (0), 0.85 (1) | Recall for non-toxic and toxic classes |
-| F1-Score | 0.87 (0), 0.87 (1) | F1-Score for non-toxic and toxic classes |
-| Macro Avg | Precision: 0.87 <br> Recall: 0.87 <br> F1-Score: 0.87 | Macro-averaged values across classes |
-| Weighted Avg | Precision: 0.87 <br> Recall: 0.87 <br> F1-Score: 0.87 | Weighted-averaged values across classes |
-| Support | 0: 175 <br> 1: 175 <br> Total: 350 | Support for each class |
-## Model Description
-This model has been trained on the wiki_toxic dataset, comprising comments from Wikipedia talk pages labeled as toxic or non-toxic. The model's performance is evaluated on a held-out test set, with results indicating a balanced performance across both classes.
-Achieving an overall accuracy of 0.87, the model demonstrates a strong ability to classify toxic and non-toxic comments accurately. For the non-toxic class (0), the model excels in precision (0.91), indicating a low rate of false positives. Meanwhile, for the toxic class (1), the model's recall of 0.91 highlights its effectiveness in capturing the majority of toxic comments.
-While the model performs well, there's room for enhancement. Improving precision for the toxic class and recall for the non-toxic class could further boost its performance. This may involve fine-tuning the model, incorporating additional features, or expanding the dataset to cover a broader range of toxic comment variations.
-## Intended Uses & Limitations
-The Wiki Toxic model is designed for comment classification tasks, specifically identifying toxic behavior in online discussions. It can be employed in moderation systems to flag potentially harmful comments, fostering a healthier online environment.
-However, it's crucial to acknowledge that the model's performance is tied to the data it was trained on. As such, its effectiveness may vary with different datasets or comment styles. Additionally, the model doesn't consider context, user relationships, or nuances of language, which could impact its accuracy in real-world applications.
-## Training Data
-The wiki_toxic dataset serves as the training data for this model. It contains comments from Wikipedia talk pages, manually labeled as toxic or non-toxic by human annotators. This dataset offers a diverse range of comments, ensuring the model learns to identify toxic behavior effectively.
-## Ethical Considerations
-It is important to note that the model's performance is dependent on the quality and representativeness of the training data. As such, it may reflect biases present in the data, potentially leading to unfair or inaccurate predictions. Careful monitoring and ongoing evaluation are necessary to ensure the model's responsible use and address any ethical concerns.
-## Acknowledgements
-We would like to acknowledge the contributors who curated the wiki_toxic dataset and made it publicly available. Their efforts have significantly advanced the development of toxic comment classification models, fostering a safer online community.
-<!--
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed] -->

 library_name: transformers
 tags: []
 ---
+---
+license: mit
+library_name: peft
+tags:
+- trl
+- sft
+- generated_from_trainer
+base_model: microsoft/Phi-3-mini-4k-instruct
+model-index:
+- name: outputs
+  results: []
+---
+## Toxicity Classification Performance
+Our merged model demonstrates exceptional performance on the toxicity classification task, outperforming several state-of-the-art language models.
+### Classification Metrics
+```
+              precision    recall  f1-score   support
+           0       0.85      0.90      0.87       175
+           1       0.89      0.85      0.87       175
+    accuracy                           0.87       350
+   macro avg       0.87      0.87      0.87       350
+weighted avg       0.87      0.87      0.87       350
+```
+### Model Usage
+For best results, we recommend starting with the following prompting strategy (and encourage tweaks as you see fit):
+```python
+def format_input_classification(text):
+    input = f"""
+    You are examining written text content. Here is the text:
+        ************
+        [Text]: {text}
+        ************
+    Examine the text and determine  whether the text is toxic or not. Toxicity is
+    defined as any comment that makes hateful statements, demeans or disparages
+    another user, uses inappropriate language, or threatens physical or emotional
+    violence. Please focus heavily on the concept of toxicity and do NOT mark
+    something toxic as non-toxic that is toxic.
+    Your classification must be single word followed by a reasoning as to why you chose
+    the classification, either "toxic" or "non-toxic".
+    "toxic" means that the text meets the definition of toxic.
+    "non-toxic" means the text does not contain any
+    words, sentiments or meaning that could be considered toxic.
+    After your classification, provide the reason for your classification.
+    """
+    return input
+text = format_input_classification("I could strangle him")
+messages = [
+    {"role": "user", "content": text}
+]
+pipe = pipeline(
+    "text-generation",
+    model=base_model,
+    model_kwargs={"attn_implementation": attn_implementation, "torch_dtype": torch.float16},
+    tokenizer=tokenizer,
+)
+```
+Our model achieves an impressive precision of 0.85 for the toxic class and 0.89 for the non-toxic class, with a high overall accuracy of 0.87. The balanced F1-scores of 0.87 for both classes demonstrate the model's ability to handle this binary classification task effectively.
+### Comparison with Other Models
+| Model             | Precision | Recall | F1     |
+|-------------------|----------:|-------:|-------:|
+| Our Merged Model  | 0.85      | 0.90   | 0.87   |
+| GPT-4             | 0.91      | 0.91   | 0.91   |
+| GPT-4 Turbo       | 0.89      | 0.77   | 0.83   |
+| Gemini Pro        | 0.81      | 0.84   | 0.83   |
+| GPT-3.5 Turbo     | 0.93      | 0.83   | 0.87   |
+| Palm              | -         | -      | -      |
+| Claude V2         | -         | -      | -      |
+[1] Scores from arize/phoenix
+Compared to other language models, our merged model demonstrates competitive performance at a much smaller size, with a precision score of 0.85 and an F1 score of 0.87.
+We will continue to refine and improve our merged model to achieve even better performance on model based toxicity evaluation tasks.
+Citations: [1] https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/retrieval-rag-relevance
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0009
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 10
+- training_steps: 110
+- mixed_precision_training: Native AMP
+### Framework versions
+- PEFT 0.11.1
+- Transformers 4.41.1
+- Pytorch 2.3.0+cu121
+- Datasets 2.19.1
+- Tokenizers 0.19.1