|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-classification |
|
--- |
|
# Model Card: Fine-Tuned DistilBERT for Offensive/Hate Speech Detection |
|
|
|
## Model Description |
|
|
|
The **Fine-Tuned DistilBERT** is a variant of the BERT transformer model, |
|
distilled for efficient performance while maintaining high accuracy. |
|
It has been adapted and fine-tuned for the specific task of offensive/hate speech detection in text data. |
|
|
|
The model, named "distilbert-base-uncased," is pre-trained on a substantial amount of text data, |
|
which allows it to capture semantic nuances and contextual information present in natural language text. |
|
It has been fine-tuned with meticulous attention to hyperparameter settings, including batch size and learning rate, to ensure optimal model performance for the offensive/hate speech detection task. |
|
|
|
During the fine-tuning process, a batch size of 16 for efficient computation and learning was chosen. |
|
Additionally, a learning rate (2e-5) was selected to strike a balance between rapid convergence and steady optimization, |
|
ensuring the model not only learns quickly but also steadily refines its capabilities throughout training. |
|
|
|
This model has been trained on a proprietary dataset < 100k, specifically designed for offensive/hate speech detection. |
|
The dataset consists of text samples, each labeled as "non-offensive" or "offensive." |
|
The diversity within the dataset allowed the model to learn to identify offensive content accurately. |
|
|
|
The goal of this meticulous training process is to equip the model with the ability to detect offensive and hate speech in text data effectively. The result is a model ready to contribute significantly to content moderation and safety, while maintaining high standards of accuracy and reliability. |
|
|
|
## Intended Uses & Limitations |
|
|
|
### Intended Uses |
|
- **Offensive/Hate Speech Detection**: The primary intended use of this model is to detect offensive or hate speech in text data. It is well-suited for filtering and identifying inappropriate content in various applications. |
|
|
|
### How to Use |
|
To use this model for offensive/hate speech detection, you can follow these steps: |
|
```markdown |
|
from transformers import pipeline |
|
|
|
classifier = pipeline("text-classification", model="Falconsai/offensive_speech_detection") |
|
text = "Your text to classify here." |
|
result = classifier(text) |
|
|
|
``` |
|
|
|
|
|
### Limitations |
|
- **Specialized Task Fine-Tuning**: While the model is adept at offensive/hate speech detection, its performance may vary when applied to other natural language processing tasks. |
|
- Users interested in employing this model for different tasks should explore fine-tuned versions available in the model hub for optimal results. |
|
|
|
## Training Data |
|
|
|
The model's training data includes a proprietary dataset designed for offensive/hate speech detection. This dataset comprises a diverse collection of text samples, categorized into "non-offensive" and "offensive" classes. The training process aimed to equip the model with the ability to distinguish between offensive and non-offensive content effectively. |
|
|
|
### Training Stats |
|
- Evaluation Loss: 0.018403256312012672 |
|
- Evaluation Accuracy: 0.9973234886940471 |
|
- Evaluation Runtime: 85.0789 |
|
- Evaluation Samples per Second: 127.352 |
|
- Evaluation Steps per Second: 7.969 |
|
|
|
**Note:** Specific evaluation statistics should be provided based on the model's performance. |
|
|
|
## Responsible Usage |
|
|
|
It is essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content. |
|
|
|
## References |
|
|
|
- [Hugging Face Model Hub](https://huggingface.co/models) |
|
- [DistilBERT Paper](https://arxiv.org/abs/1910.01108) |
|
|
|
**Disclaimer:** The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets. |