dougtrajano
/

toxic-comment-classification

@@ -1,12 +1,6 @@
 ---
-language:
-- pt
-license: apache-2.0
 tags:
-- toxicity
-- portuguese
-- hate speech
-- offensive language
 - generated_from_trainer
 metrics:
 - accuracy
@@ -14,72 +8,40 @@ metrics:
 - precision
 - recall
 model-index:
-- name: dougtrajano/toxic-comment-classification
   results: []
-datasets:
-- dougtrajano/olid-br
-library_name: transformers
 ---
-# dougtrajano/toxic-comment-classification
-Toxic Comment Classification is a model that detects if the text is toxic or not.
-This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
-## Overview
-**Input:** Text in Brazilian Portuguese
-**Output:** Binary classification (toxic or not toxic)
-## Usage
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxic-comment-classification")
-model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxic-comment-classification")
-```
-## Limitations and bias
-The following factors may degrade the model’s performance.
-**Text Language**:  The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
-**Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
-## Trade-offs
-Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
-**Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
-## Performance
-The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
-**Accuracy:** 0.8578
-**Precision:** 0.8594
-**Recall:** 0.8578
-**F1-Score:** 0.8580
-| Class | Precision | Recall | F1-Score | Support |
-| :---: | :-------: | :----: | :------: | :-----: |
-| `NOT-OFFENSIVE` | 0.8886 | 0.8490 | 0.8683 | 1,775 |
-| `OFFENSIVE` | 0.8233 | 0.8686 | 0.8453 | 1,438 |
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 3.255788747459486e-05
 - train_batch_size: 8
 - eval_batch_size: 8
@@ -89,13 +51,21 @@ The following hyperparameters were used during training:
 - num_epochs: 30
 - label_smoothing_factor: 0.07158711257743958
 ### Framework versions
-- Transformers 4.26.0
 - Pytorch 1.10.2+cu113
 - Datasets 2.9.0
 - Tokenizers 0.13.2
-## Provide Feedback
-If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.

 ---
+license: mit
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
 - precision
 - recall
 model-index:
+- name: toxic-comment-classification
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# toxic-comment-classification
+This model is a fine-tuned version of [neuralmind/bert-large-portuguese-cased](https://huggingface.co/neuralmind/bert-large-portuguese-cased) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.4102
+- Accuracy: 0.8547
+- F1: 0.8549
+- Precision: 0.8669
+- Recall: 0.8547
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 3.255788747459486e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - num_epochs: 30
 - label_smoothing_factor: 0.07158711257743958
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
+| 0.4465        | 1.0   | 1408 | 0.4102          | 0.8547   | 0.8549 | 0.8669    | 0.8547 |
+| 0.3839        | 2.0   | 2816 | 0.4814          | 0.8509   | 0.8497 | 0.8532    | 0.8509 |
+| 0.3945        | 3.0   | 4224 | 0.6362          | 0.8002   | 0.7918 | 0.8258    | 0.8002 |
+| 0.3643        | 4.0   | 5632 | 0.4961          | 0.8248   | 0.8211 | 0.8349    | 0.8248 |
+| 0.3345        | 5.0   | 7040 | 0.5267          | 0.8528   | 0.8532 | 0.8570    | 0.8528 |
+| 0.3053        | 6.0   | 8448 | 0.5902          | 0.8002   | 0.7911 | 0.8292    | 0.8002 |
 ### Framework versions
+- Transformers 4.26.1
 - Pytorch 1.10.2+cu113
 - Datasets 2.9.0
 - Tokenizers 0.13.2